Thijs 2010-05-20, 15:02
Chris Hostetter 2010-05-20, 19:14
Nagelberg, Kallin 2010-05-20, 19:17
Chris Hostetter 2010-05-20, 19:34
Thijs 2010-05-25, 11:42
Chris Hostetter 2010-05-27, 04:41
Thijs 2010-05-27, 08:12
Nagelberg, Kallin 2010-05-20, 15:16
-RE: Machine utilization while indexing
Dennis Gearon 2010-05-20, 15:45
Here is a good article from IBM, with code, on how to do hybrid/cloud computing.
EARTH has a Right To Life,
otherwise we all die.
Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php
--- On Thu, 5/20/10, Nagelberg, Kallin <[EMAIL PROTECTED]> wrote:
> From: Nagelberg, Kallin <[EMAIL PROTECTED]>
> Subject: RE: Machine utilization while indexing
> To: "'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]>
> Date: Thursday, May 20, 2010, 8:16 AM
> How about throwing a blockingqueue,
> between your document-creator and solrserver? Give it a size
> of 10,000 or something, with one thread trying to feed it,
> and one thread waiting for it to get near full then draining
> it. Take the drained results and add them to the server
> (maybe try not using streamingsolrserver). Something like
> that worked well for me with about 5,000,000 documents each
> ~5k taking about 8 hours.
> -Kallin Nagelberg
> -----Original Message-----
> From: Thijs [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, May 20, 2010 11:02 AM
> To: [EMAIL PROTECTED]
> Subject: Machine utilization while indexing
> I have a question about how I can get solr to index quicker
> then it does
> at the moment.
> I have to index (and re-index) some 3-5 million documents.
> documents are preprocessed by a java application that
> combines multiple database tables with each-other to form
> What I'm seeing however is that the queue of documents that
> are ready to
> be send to the solr server exceeds my preset limit. Telling
> me that Solr
> somehow can't process the documents fast enough.
> (I have created my own queue in front of
> as it would not process the documents fast enough causing
> OutOfMemoryExceptions due to the large amount of documents
> building up
> in it's queue)
> I have an index that for 95% consist of ID's (Long). We
> don't do any
> analysis on the fields that are being indexed. The schema
> is rather
> straight forward.
> most fields look like
> <fieldType name="long" class="solr.LongField"
> <field name="objectId" type="long" stored="true"
> required="true" />
> <field name="listId" type="long" stored="false"
> the relevant solrconfig.xml
> The machines I'm testing on have a:
> Intel(R) Core(TM)2 Quad CPU Q9550 @
> With 4GB of ram.
> Running on linux java version 1.6.0_17, tomcat 6 and solr
> version 1.4
> What I'm seeing is that the network almost never reaches
> more then 10%
> of the 1GB/s connection.
> That the CPU utilization is always below 25% (1 core is
> used, not the
> I don't see heavy disk-io.
> Also while indexing the memory consumption is:
> Free memory: 212.15 MB Total memory: 509.12 MB Max memory:
> 2730.68 MB
> And that in the beginning (with a empty index) I get 2ms
> per insert but
> this slows to 18-19ms per insert.
> Are there any tips/tricks I can use to speed up my
> indexing? Because I
> have a feeling that my machine is capable of doing more
> (use more
> cpu's). I just can't figure-out how.
Dennis Gearon 2010-05-20, 15:25
Thijs 2010-05-20, 15:29
Nagelberg, Kallin 2010-05-20, 15:33
Thijs 2010-05-20, 15:25
Nagelberg, Kallin 2010-05-20, 15:36