Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Plain View
Solr, mail # user - Commit Strategy for SolrCloud when Talking about 200 million records.


+
I-Chiang Chen 2012-03-22, 01:37
+
Mark Miller 2012-03-22, 21:15
+
I-Chiang Chen 2012-03-23, 03:44
Copy link to this message
-
Re: Commit Strategy for SolrCloud when Talking about 200 million records.
Mark Miller 2012-03-23, 11:20
What issues? It really shouldn't be a problem.
On Mar 22, 2012, at 11:44 PM, I-Chiang Chen <[EMAIL PROTECTED]> wrote:

> At this time we are not leveraging the NRT functionality. This is the
> initial data load process where the idea is to just add all 200 millions
> records first. Than do a single commit at the end to make them searchable.
> We actually disabled auto commit at this time.
>
> We have tried to leave auto commit enabled during the initial data load
> process and ran into multiple issues that leads to botched loading process.
>
> On Thu, Mar 22, 2012 at 2:15 PM, Mark Miller <[EMAIL PROTECTED]> wrote:
>
>>
>> On Mar 21, 2012, at 9:37 PM, I-Chiang Chen wrote:
>>
>>> We are currently experimenting with SolrCloud functionality in Solr 4.0.
>>> The goal is to see if Solr 4.0 trunk with is current state is able to
>>> handle roughly 200million documents. The document size is not big around
>> 40
>>> fields no more than a KB, most of which are empty majority of times.
>>>
>>> The setup we have is 4 servers w/ 2 shards w/ 2 servers per shard. We are
>>> running in Tomcat.
>>>
>>> The questions are giving the approximate data volume, is it a realistic
>> to
>>> expect above setup can handle it.
>>
>> So 100 million docs per machine essentially? Totally depends on the
>> hardware and what features you are using - but def in the realm of
>> possibility.
>>
>>> Giving the number of documents should
>>> commit every x documents or rely on auto commits?
>>
>> The number of docs shouldn't really matter here. Do you need near real
>> time search?
>>
>> You should be able to commit about as frequently as you'd like with NRT
>> (eg every 1 second if you'd like) - either using soft auto commit or
>> commitWithin.
>>
>> Then you want to do a hard commit less frequently - every minute (or more
>> or less) with openSearcher=false.
>>
>> eg
>>
>>    <autoCommit>
>>      <maxTime>15000</maxTime>
>>      <openSearcher>false</openSearcher>
>>    </autoCommit>
>>
>>>
>>> --
>>> -IC
>>
>> - Mark Miller
>> lucidimagination.com
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
> --
> -IC
+
Markus Jelsma 2012-03-23, 11:19
+
I-Chiang Chen 2012-03-23, 16:49
+
Mark Miller 2012-03-23, 17:02