Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Plain View
Solr, mail # user - Re: Hardware Specs Question


+
scott chu 2010-09-03, 15:40
+
Amit Nithian 2010-08-30, 23:52
+
Lance Norskog 2010-08-31, 00:00
+
Amit Nithian 2010-08-31, 00:48
+
Lance Norskog 2010-08-31, 01:14
+
scott chu 2010-08-31, 03:28
+
Amit Nithian 2010-08-31, 03:34
+
Lance Norskog 2010-08-31, 05:01
+
scott chu 2010-08-31, 08:35
Copy link to this message
-
Re: Hardware Specs Question
Lance Norskog 2010-09-02, 01:37
I was just reading about configuring mass computation grids: hardware
writes on 2 striped disks take 10% than writes on a single disk,
because you have to wait for the slower disk to finish. So, single
disks without RAID are faster.

I don't know how much SSD disks cost, but they will certainly cure the
disk i/o problem.

On Tue, Aug 31, 2010 at 1:35 AM, scott chu (朱炎詹) <[EMAIL PROTECTED]> wrote:
> In our current lab project, we already built a Chinese newspaper index with
> 18 millions documents. The index size is around 51GB. So I am very concerned
> about the memory issue you guys mentioned.
>
> I also look up the Hathitrust report on SolrPerformanceData page:
> http://wiki.apache.org/solr/SolrPerformanceData. They said their main
> bottleneck is Disk-I/O even they have 10 shards spread over 4 servers.
>
> Can you guys give me some helpful suggestion about hardward spec & memory
> configuration on our project?
>
> Thanks in advance.
>
> Scott
>
> ----- Original Message ----- From: "Lance Norskog" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Sent: Tuesday, August 31, 2010 1:01 PM
> Subject: Re: Hardware Specs Question
>
>
> There are synchronization points, which become chokepoints at some
> number of cores. I don't know where they cause Lucene to top out.
> Lucene apps are generally disk-bound, not CPU-bound, but yours will
> be. There are so many variables that it's really not possible to give
> any numbers.
>
> Lance
>
> On Mon, Aug 30, 2010 at 8:34 PM, Amit Nithian <[EMAIL PROTECTED]> wrote:
>>
>> Lance,
>>
>> makes sense and I have heard about the long GC times on large heaps but I
>> personally haven't experienced a slowdown but that doesn't mean anything
>> either :-). Agreed that tuning the SOLR caching is the way to go.
>>
>> I haven't followed all the solr/lucene changes but from what I remember
>> there are synchronization points that could be a bottleneck where adding
>> more cores won't help this problem? Or am I completely missing something.
>>
>> Thanks again
>> Amit
>>
>> On Mon, Aug 30, 2010 at 8:28 PM, scott chu (朱炎詹)
>> <[EMAIL PROTECTED]>wrote:
>>
>>> I am also curious as Amit does. Can you make an example about the garbage
>>> collection problem you mentioned?
>>>
>>> ----- Original Message ----- From: "Lance Norskog" <[EMAIL PROTECTED]>
>>> To: <[EMAIL PROTECTED]>
>>> Sent: Tuesday, August 31, 2010 9:14 AM
>>> Subject: Re: Hardware Specs Question
>>>
>>>
>>>
>>> It generally works best to tune the Solr caches and allocate enough
>>>>
>>>> RAM to run comfortably. Linux & Windows et. al. have their own cache
>>>> of disk blocks. They use very good algorithms for managing this cache.
>>>> Also, they do not make long garbage collection passes.
>>>>
>>>> On Mon, Aug 30, 2010 at 5:48 PM, Amit Nithian <[EMAIL PROTECTED]>
>>>> wrote:
>>>>
>>>>> Lance,
>>>>>
>>>>> Thanks for your help. What do you mean by that the OS can keep the
>>>>> index
>>>>> in
>>>>> memory better than Solr? Do you mean that you should use another means
>>>>> to
>>>>> keep the index in memory (i.e. ramdisk)? Is there a generally accepted
>>>>> heap
>>>>> size/index size that you follow?
>>>>>
>>>>> Thanks
>>>>> Amit
>>>>>
>>>>> On Mon, Aug 30, 2010 at 5:00 PM, Lance Norskog <[EMAIL PROTECTED]>
>>>>> wrote:
>>>>>
>>>>> The price-performance knee for small servers is 32G ram, 2-6 SATA
>>>>>>
>>>>>> disks on a raid, 8/16 cores. You can buy these servers and half-fill
>>>>>> them, leaving room for expansion.
>>>>>>
>>>>>> I have not done benchmarks about the max # of processors that can be
>>>>>> kept busy during indexing or querying, and the total numbers: QPS,
>>>>>> response time averages & variability, etc.
>>>>>>
>>>>>> If your index file size is 8G, and your Java heap is 8G, you will do
>>>>>> long garbage collection cycles. The operating system is very good at
>>>>>> keeping your index in memory- better than Solr can.
>>>>>>
>>>>>> Lance
>>>>>>
>>>>>> On Mon, Aug 30, 2010 at 4:52 PM, Amit Nithian <[EMAIL PROTECTED]>

Lance Norskog
[EMAIL PROTECTED]
+
Toke Eskildsen 2010-09-02, 08:54
+
Shawn Heisey 2010-09-03, 01:45
+
Dennis Gearon 2010-09-03, 09:07
+
Toke Eskildsen 2010-09-03, 10:43
+
Dennis Gearon 2010-09-03, 15:54
+
Toke Eskildsen 2010-09-06, 19:35
+
Dennis Gearon 2010-09-06, 20:01
+
Toke Eskildsen 2010-09-03, 09:39
+
Shawn Heisey 2010-09-03, 18:14