Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Lucene and all its subprojects:

Switch to Threaded View
Solr >> mail # user >> The index speed in the solr


Copy link to this message
-
Re: The index speed in the solr
Hard to say. Here's the basic approach I'd use to try to narrow it down:
1> take out ngrams. What does that do to your speed?
2> are you committing very often? Lengthen the time here if so.
3> Posting is probably not the more performant thing in world.
     Consider using SolrJ.
4> What does a document look like? Are they structured docs
     (Word, PDF, etc). If so, try offloading that to client machines.

Basically, you haven't given enough information to make much
of a guess here...

50 hours is a really long time for 2M docs though, so something
doesn't seem right unless the docs are really unusual.

If you need to offload the structured docs, here's a way to
get started:

http://www.lucidimagination.com/blog/2012/02/14/indexing-with-solrj/

Best
Erick

On Sun, Apr 22, 2012 at 9:58 PM, neosky <[EMAIL PROTECTED]> wrote:
> It takes me 50 hours to index a total 9 G file(about 2,000,000 documents)
> with n-gram filter from min=6,max=10, my token before ngram filter is
> long(not a word, at most 300,000 bytes with white space). I split into 4
> files and use the post.sh to update at the same time. I also tried to write
> a lucene to do the index myself(single thread). The time is almost the same.
> I would like to know what's the general bottleneck for the index in solr?
> Doesn't the solr handle the index update request concurrently?
>
> 1.
> Posting file /ngram_678910/file1.xml to http://localhost:8988/solr/update
>  % Total    % Received % Xferd  Average Speed   Time    Time     Time
> Current
>                                 Dload  Upload   Total   Spent    Left
> Speed
>  51 3005M    0     0   51 1557M      0  18902 46:19:14 23:59:46 22:19:28
> 0
> 2.
> Posting file /ngram_678910/file2.xml to http://localhost:8988/solr/update
>  % Total    % Received % Xferd  Average Speed   Time    Time     Time
> Current
>                                 Dload  Upload   Total   Spent    Left
> Speed
>  62 2623M    0     0   62 1632M      0  19839 38:31:16 23:58:01 14:33:15
> 76629
> 3.
> Posting file /ngram_678910/file3.xml to http://localhost:8988/solr/update
>  % Total    % Received % Xferd  Average Speed   Time    Time     Time
> Current
>                                 Dload  Upload   Total   Spent    Left
> Speed
>  65 2667M    0     0   65 1737M      0  21113 36:48:23 23:58:06 12:50:17
> 25537
> 4.
> Posting file /ngram_678910/file4.xml to http://localhost:8988/solr/update
>  % Total    % Received % Xferd  Average Speed   Time    Time     Time
> Current
>                                 Dload  Upload   Total   Spent    Left
> Speed
>  58 2766M    0     0   58 1625M      0  19752 40:47:34 23:58:28 16:49:06
> 81435
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/The-index-speed-in-the-solr-tp3931338p3931338.html
> Sent from the Solr - User mailing list archive at Nabble.com.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB