Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Solr, mail # user - The index speed in the solr


Copy link to this message
-
Re: The index speed in the solr
Erick Erickson 2012-04-23, 13:27
Hard to say. Here's the basic approach I'd use to try to narrow it down:
1> take out ngrams. What does that do to your speed?
2> are you committing very often? Lengthen the time here if so.
3> Posting is probably not the more performant thing in world.
     Consider using SolrJ.
4> What does a document look like? Are they structured docs
     (Word, PDF, etc). If so, try offloading that to client machines.

Basically, you haven't given enough information to make much
of a guess here...

50 hours is a really long time for 2M docs though, so something
doesn't seem right unless the docs are really unusual.

If you need to offload the structured docs, here's a way to
get started:

http://www.lucidimagination.com/blog/2012/02/14/indexing-with-solrj/

Best
Erick

On Sun, Apr 22, 2012 at 9:58 PM, neosky <[EMAIL PROTECTED]> wrote:
> It takes me 50 hours to index a total 9 G file(about 2,000,000 documents)
> with n-gram filter from min=6,max=10, my token before ngram filter is
> long(not a word, at most 300,000 bytes with white space). I split into 4
> files and use the post.sh to update at the same time. I also tried to write
> a lucene to do the index myself(single thread). The time is almost the same.
> I would like to know what's the general bottleneck for the index in solr?
> Doesn't the solr handle the index update request concurrently?
>
> 1.
> Posting file /ngram_678910/file1.xml to http://localhost:8988/solr/update
>  % Total    % Received % Xferd  Average Speed   Time    Time     Time
> Current
>                                 Dload  Upload   Total   Spent    Left
> Speed
>  51 3005M    0     0   51 1557M      0  18902 46:19:14 23:59:46 22:19:28
> 0
> 2.
> Posting file /ngram_678910/file2.xml to http://localhost:8988/solr/update
>  % Total    % Received % Xferd  Average Speed   Time    Time     Time
> Current
>                                 Dload  Upload   Total   Spent    Left
> Speed
>  62 2623M    0     0   62 1632M      0  19839 38:31:16 23:58:01 14:33:15
> 76629
> 3.
> Posting file /ngram_678910/file3.xml to http://localhost:8988/solr/update
>  % Total    % Received % Xferd  Average Speed   Time    Time     Time
> Current
>                                 Dload  Upload   Total   Spent    Left
> Speed
>  65 2667M    0     0   65 1737M      0  21113 36:48:23 23:58:06 12:50:17
> 25537
> 4.
> Posting file /ngram_678910/file4.xml to http://localhost:8988/solr/update
>  % Total    % Received % Xferd  Average Speed   Time    Time     Time
> Current
>                                 Dload  Upload   Total   Spent    Left
> Speed
>  58 2766M    0     0   58 1625M      0  19752 40:47:34 23:58:28 16:49:06
> 81435
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/The-index-speed-in-the-solr-tp3931338p3931338.html
> Sent from the Solr - User mailing list archive at Nabble.com.