Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Plain View
Solr, mail # user - Machine utilization while indexing


+
Thijs 2010-05-20, 15:02
+
Chris Hostetter 2010-05-20, 19:14
+
Nagelberg, Kallin 2010-05-20, 19:17
+
Chris Hostetter 2010-05-20, 19:34
+
Thijs 2010-05-25, 11:42
+
Chris Hostetter 2010-05-27, 04:41
+
Thijs 2010-05-27, 08:12
+
Nagelberg, Kallin 2010-05-20, 15:16
Copy link to this message
-
RE: Machine utilization while indexing
Dennis Gearon 2010-05-20, 15:45
Here is a good article from IBM, with code, on how to do hybrid/cloud computing.

http://www.ibm.com/developerworks/library/x-cloudpt1/
Dennis Gearon

Signature Warning
----------------
EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php
--- On Thu, 5/20/10, Nagelberg, Kallin <[EMAIL PROTECTED]> wrote:

> From: Nagelberg, Kallin <[EMAIL PROTECTED]>
> Subject: RE: Machine utilization while indexing
> To: "'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]>
> Date: Thursday, May 20, 2010, 8:16 AM
> How about throwing a blockingqueue,
> http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/BlockingQueue.html,
> between your document-creator and solrserver? Give it a size
> of 10,000 or something, with one thread trying to feed it,
> and one thread waiting for it to get near full then draining
> it. Take the drained results and add them to the server
> (maybe try not using streamingsolrserver). Something like
> that worked well for me with about 5,000,000 documents each
> ~5k taking about 8 hours.
>
> -Kallin Nagelberg
>
> -----Original Message-----
> From: Thijs [mailto:[EMAIL PROTECTED]]
>
> Sent: Thursday, May 20, 2010 11:02 AM
> To: [EMAIL PROTECTED]
> Subject: Machine utilization while indexing
>
> Hi.
>
> I have a question about how I can get solr to index quicker
> then it does
> at the moment.
>
> I have to index (and re-index) some 3-5 million documents.
> These
> documents are preprocessed by a java application that
> effectively
> combines multiple database tables with each-other to form
> the
> SolrInputDocument.
>
> What I'm seeing however is that the queue of documents that
> are ready to
> be send to the solr server exceeds my preset limit. Telling
> me that Solr
> somehow can't process the documents fast enough.
>
> (I have created my own queue in front of
> Solrj.StreamingUpdateSolrServer
> as it would not process the documents fast enough causing
> OutOfMemoryExceptions due to the large amount of documents
> building up
> in it's queue)
>
> I have an index that for 95% consist of ID's (Long). We
> don't do any
> analysis on the fields that are being indexed. The schema
> is rather
> straight forward.
>
> most fields look like
> <fieldType name="long" class="solr.LongField"
> omitNorms="true"/>
> <field name="objectId" type="long" stored="true"
> indexed="true"
> required="true" />
> <field name="listId" type="long" stored="false"
> indexed="true"
> multiValued="true"/>
>
> the relevant solrconfig.xml
> <indexDefaults>
>  
>    <useCompoundFile>false</useCompoundFile>
>  
>    <mergeFactor>100</mergeFactor>
>  
>    <RAMBufferSizeMB>256</RAMBufferSizeMB>
>  
>    <maxMergeDocs>2147483647</maxMergeDocs>
>  
>    <maxFieldLength>10000</maxFieldLength>
>  
>    <writeLockTimeout>1000</writeLockTimeout>
>  
>    <commitLockTimeout>10000</commitLockTimeout>
>  
>    <lockType>single</lockType>
> </indexDefaults>
>
>
> The machines I'm testing on have a:
> Intel(R) Core(TM)2 Quad CPU    Q9550  @
> 2.83GHz
> With 4GB of ram.
> Running on linux java version 1.6.0_17, tomcat 6 and solr
> version 1.4
>
> What I'm seeing is that the network almost never reaches
> more then 10%
> of the 1GB/s connection.
> That the CPU utilization is always below 25% (1 core is
> used, not the
> others)
> I don't see heavy disk-io.
> Also while indexing the memory consumption is:
> Free memory: 212.15 MB Total memory: 509.12 MB Max memory:
> 2730.68 MB
>
> And that in the beginning (with a empty index) I get 2ms
> per insert but
> this slows to 18-19ms per insert.
>
> Are there any tips/tricks I can use to speed up my
> indexing? Because I
> have a feeling that my machine is capable of doing more
> (use more
> cpu's). I just can't figure-out how.
>
> Thijs
>
+
Dennis Gearon 2010-05-20, 15:25
+
Thijs 2010-05-20, 15:29
+
Nagelberg, Kallin 2010-05-20, 15:33
+
Thijs 2010-05-20, 15:25
+
Nagelberg, Kallin 2010-05-20, 15:36