|
Yury Kats
2011-12-15, 17:58
Robert Stewart
2011-12-15, 18:07
Yury Kats
2011-12-15, 18:31
Robert Petersen
2011-12-15, 18:41
Robert Stewart
2011-12-15, 19:14
Robert Stewart
2011-12-15, 19:16
Yury Kats
2011-12-15, 19:46
Robert Petersen
2011-12-15, 21:46
Yury Kats
2011-12-15, 22:16
Robert Petersen
2011-12-15, 23:28
Ted Dunning
2011-12-16, 01:21
Otis Gospodnetic
2011-12-16, 15:29
Otis Gospodnetic
2011-12-16, 15:32
Jason Rutherglen
2011-12-16, 18:02
Ted Dunning
2011-12-16, 19:45
Jason Rutherglen
2011-12-16, 19:56
Ted Dunning
2011-12-16, 20:00
Jason Rutherglen
2011-12-16, 20:29
Ted Dunning
2011-12-16, 21:00
Chris Hostetter
2011-12-16, 23:42
|
-
Core overheadYury Kats 2011-12-15, 17:58
Does anybody have an idea, or better yet, measured data,
to see what the overhead of a core is, both in memory and speed? For example, what would be the difference between having 1 core with 100M documents versus having 10 cores with 10M documents?
-
Re: Core overheadRobert Stewart 2011-12-15, 18:07
I dont have any measured data, but here are my thoughts.
I think overall memory usage would be close to the same. Speed will be slower in general, because if search speed is approx log(n) then 10 * log(n/10) > log(n), and also if merging results you have overhead in the merge step and also if fetching results beyond the first page since you would generally need page_size * page_number from each core. Of course if you search many cores in parallel over many CPU cores you would mitigate that overhead. There are other considerations such as caching - for example if you are adding new documents on one core only, the other cores get to keep there filter caches, etc. in RAM much longer than if you are always committing to one single large core. And then of course if you have some client logic to pick a sub-set of cores based on some query data (such as only searching newer cores, etc.) then you could end up with faster search over many cores. 2011/12/15 Yury Kats <[EMAIL PROTECTED]>: > Does anybody have an idea, or better yet, measured data, > to see what the overhead of a core is, both in memory and speed? > > For example, what would be the difference between having 1 core > with 100M documents versus having 10 cores with 10M documents?
-
Re: Core overheadYury Kats 2011-12-15, 18:31
On 12/15/2011 1:07 PM, Robert Stewart wrote:
> I think overall memory usage would be close to the same. Is this really so? I suspect that the consumed memory is in direct proportion to the number of terms in the index. I also suspect that if I divided 1 core with N terms into 10 smaller cores, each smaller core would have much more than N/10 terms. Let's say I'm indexing English texts, it's likely that all smaller cores would have almost the same number of terms, close to the original N. Not so?
-
RE: Core overheadRobert Petersen 2011-12-15, 18:41
I am running eight cores, each core serves up different types of
searches so there is no overlap in their function. Some cores have millions of documents. My search times are quite fast. I don't see any real slowdown from multiple cores, but you just have to have enough memory for them. Memory simply has to be big enough to hold what you are loading. Try it out, but make sure that the functionality you are actually looking for isn't sharding instead of multiple cores... http://wiki.apache.org/solr/DistributedSearch -----Original Message----- From: Yury Kats [mailto:[EMAIL PROTECTED]] Sent: Thursday, December 15, 2011 10:31 AM To: [EMAIL PROTECTED] Subject: Re: Core overhead On 12/15/2011 1:07 PM, Robert Stewart wrote: > I think overall memory usage would be close to the same. Is this really so? I suspect that the consumed memory is in direct proportion to the number of terms in the index. I also suspect that if I divided 1 core with N terms into 10 smaller cores, each smaller core would have much more than N/10 terms. Let's say I'm indexing English texts, it's likely that all smaller cores would have almost the same number of terms, close to the original N. Not so?
-
Re: Core overheadRobert Stewart 2011-12-15, 19:14
It is true number of terms may be much more than N/10 (or even N for
each core), but it is the number of docs per term that will really matter. So you can have N terms in each core but each term has 1/10 number of docs on avg. 2011/12/15 Yury Kats <[EMAIL PROTECTED]>: > On 12/15/2011 1:07 PM, Robert Stewart wrote: > >> I think overall memory usage would be close to the same. > > Is this really so? I suspect that the consumed memory is in direct > proportion to the number of terms in the index. I also suspect that > if I divided 1 core with N terms into 10 smaller cores, each smaller > core would have much more than N/10 terms. Let's say I'm indexing > English texts, it's likely that all smaller cores would have almost > the same number of terms, close to the original N. Not so?
-
Re: Core overheadRobert Stewart 2011-12-15, 19:16
One other thing I did not mention is GC pauses. If you have smaller
heap sizes, you would have less very long GC pauses, so that can be an advantage having many cores (if cores are distributed into seperate SOLR instances, as seperate processes). I think you can expect 1 second pause for each GB of heap size in worst case. On Thu, Dec 15, 2011 at 2:14 PM, Robert Stewart <[EMAIL PROTECTED]> wrote: > It is true number of terms may be much more than N/10 (or even N for > each core), but it is the number of docs per term that will really > matter. So you can have N terms in each core but each term has 1/10 > number of docs on avg. > > > > > 2011/12/15 Yury Kats <[EMAIL PROTECTED]>: >> On 12/15/2011 1:07 PM, Robert Stewart wrote: >> >>> I think overall memory usage would be close to the same. >> >> Is this really so? I suspect that the consumed memory is in direct >> proportion to the number of terms in the index. I also suspect that >> if I divided 1 core with N terms into 10 smaller cores, each smaller >> core would have much more than N/10 terms. Let's say I'm indexing >> English texts, it's likely that all smaller cores would have almost >> the same number of terms, close to the original N. Not so?
-
Re: Core overheadYury Kats 2011-12-15, 19:46
On 12/15/2011 1:41 PM, Robert Petersen wrote:
> loading. Try it out, but make sure that the functionality you are > actually looking for isn't sharding instead of multiple cores... Yes, but the way to achieve sharding is to have multiple cores. The question is then becomes -- how many cores (shards)?
-
RE: Core overheadRobert Petersen 2011-12-15, 21:46
Sure that is possible, but doesn't that defeat the purpose of sharding?
Why distribute across one machine? Just keep all in one index in that case is my thought there... -----Original Message----- From: Yury Kats [mailto:[EMAIL PROTECTED]] Sent: Thursday, December 15, 2011 11:47 AM To: [EMAIL PROTECTED] Subject: Re: Core overhead On 12/15/2011 1:41 PM, Robert Petersen wrote: > loading. Try it out, but make sure that the functionality you are > actually looking for isn't sharding instead of multiple cores... Yes, but the way to achieve sharding is to have multiple cores. The question is then becomes -- how many cores (shards)?
-
Re: Core overheadYury Kats 2011-12-15, 22:16
On 12/15/2011 4:46 PM, Robert Petersen wrote:
> Sure that is possible, but doesn't that defeat the purpose of sharding? > Why distribute across one machine? Just keep all in one index in that > case is my thought there... To be able to scale w/o re-indexing. Also often referred to as "micro-sharding".
-
RE: Core overheadRobert Petersen 2011-12-15, 23:28
I see there is a lot of discussions about "micro-sharding", I'll have to
read them. I'm on an older version of solr and just use master index replicating out to a farm of slaves. It always seemed like sharding causes a lot of background traffic to me when I read about it, but I never tried it out. Thanks for the heads up on that topic... :) -----Original Message----- From: Yury Kats [mailto:[EMAIL PROTECTED]] Sent: Thursday, December 15, 2011 2:16 PM To: [EMAIL PROTECTED] Subject: Re: Core overhead On 12/15/2011 4:46 PM, Robert Petersen wrote: > Sure that is possible, but doesn't that defeat the purpose of sharding? > Why distribute across one machine? Just keep all in one index in that > case is my thought there... To be able to scale w/o re-indexing. Also often referred to as "micro-sharding".
-
Re: Core overheadTed Dunning 2011-12-16, 01:21
Here is a talk I did on this topic at HPTS a few years ago.
On Thu, Dec 15, 2011 at 4:28 PM, Robert Petersen <[EMAIL PROTECTED]> wrote: > I see there is a lot of discussions about "micro-sharding", I'll have to > read them. I'm on an older version of solr and just use master index > replicating out to a farm of slaves. It always seemed like sharding > causes a lot of background traffic to me when I read about it, but I > never tried it out. Thanks for the heads up on that topic... :) > > -----Original Message----- > From: Yury Kats [mailto:[EMAIL PROTECTED]] > Sent: Thursday, December 15, 2011 2:16 PM > To: [EMAIL PROTECTED] > Subject: Re: Core overhead > > On 12/15/2011 4:46 PM, Robert Petersen wrote: > > Sure that is possible, but doesn't that defeat the purpose of > sharding? > > Why distribute across one machine? Just keep all in one index in that > > case is my thought there... > > To be able to scale w/o re-indexing. Also often referred to as > "micro-sharding". >
-
Re: Core overheadOtis Gospodnetic 2011-12-16, 15:29
Hi,
I used to think this, too, but have learned this not to be entirely true. We had a customer with a query rate of a few hundred QPS and 32 or 64 GB RAM (don't recall which any more) and a pretty large JVM heap. Most queries were very fast, but once in a while a query would be very slow. GC, we thought! So the initial thinking was was - must be that big heap of theirs. But.... long story short, instead of making the heap smaller we just tuned the JVM and took care of those slow queries. Using SPM (link in sig) and seeing GC info (collection counts, times, heap size, etc.) was invaluable! Otis ---- Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html - FREE! >________________________________ > From: Robert Stewart <[EMAIL PROTECTED]> >To: [EMAIL PROTECTED] >Sent: Thursday, December 15, 2011 2:16 PM >Subject: Re: Core overhead > >One other thing I did not mention is GC pauses. If you have smaller >heap sizes, you would have less very long GC pauses, so that can be an >advantage having many cores (if cores are distributed into seperate >SOLR instances, as seperate processes). I think you can expect 1 >second pause for each GB of heap size in worst case. > > > >On Thu, Dec 15, 2011 at 2:14 PM, Robert Stewart <[EMAIL PROTECTED]> wrote: >> It is true number of terms may be much more than N/10 (or even N for >> each core), but it is the number of docs per term that will really >> matter. So you can have N terms in each core but each term has 1/10 >> number of docs on avg. >> >> >> >> >> 2011/12/15 Yury Kats <[EMAIL PROTECTED]>: >>> On 12/15/2011 1:07 PM, Robert Stewart wrote: >>> >>>> I think overall memory usage would be close to the same. >>> >>> Is this really so? I suspect that the consumed memory is in direct >>> proportion to the number of terms in the index. I also suspect that >>> if I divided 1 core with N terms into 10 smaller cores, each smaller >>> core would have much more than N/10 terms. Let's say I'm indexing >>> English texts, it's likely that all smaller cores would have almost >>> the same number of terms, close to the original N. Not so? > > >
-
Re: Core overheadOtis Gospodnetic 2011-12-16, 15:32
Hi Yury,
Not sure if this was already covered in this thread, but with N smaller cores on a single N-CPU-core box you could run N queries in parallel over smaller indices, which may be faster than a single query going against a single big index, depending on how many concurrent query requests the box is handling (i.e. how busy or idle the CPU cores are). Otis ---- Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html >________________________________ > From: Yury Kats <[EMAIL PROTECTED]> >To: [EMAIL PROTECTED] >Sent: Thursday, December 15, 2011 12:58 PM >Subject: Core overhead > >Does anybody have an idea, or better yet, measured data, >to see what the overhead of a core is, both in memory and speed? > >For example, what would be the difference between having 1 core >with 100M documents versus having 10 cores with 10M documents? > > >
-
Re: Core overheadJason Rutherglen 2011-12-16, 18:02
Wow the shameless plugging of product (footer) has hit a new low Otis.
On Fri, Dec 16, 2011 at 7:32 AM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Hi Yury, > > Not sure if this was already covered in this thread, but with N smaller cores on a single N-CPU-core box you could run N queries in parallel over smaller indices, which may be faster than a single query going against a single big index, depending on how many concurrent query requests the box is handling (i.e. how busy or idle the CPU cores are). > > Otis > ---- > > Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html > > > >>________________________________ >> From: Yury Kats <[EMAIL PROTECTED]> >>To: [EMAIL PROTECTED] >>Sent: Thursday, December 15, 2011 12:58 PM >>Subject: Core overhead >> >>Does anybody have an idea, or better yet, measured data, >>to see what the overhead of a core is, both in memory and speed? >> >>For example, what would be the difference between having 1 core >>with 100M documents versus having 10 cores with 10M documents? >> >> >>
-
Re: Core overheadTed Dunning 2011-12-16, 19:45
I thought it was slightly clumsy, but it was informative. It seemed like a
fine thing to say. Effectively it was "I/we have developed a tool that will help you solve your problem". That is responsive to the OP and it is clear that it is a commercial deal. On Fri, Dec 16, 2011 at 10:02 AM, Jason Rutherglen < [EMAIL PROTECTED]> wrote: > Wow the shameless plugging of product (footer) has hit a new low Otis. > > On Fri, Dec 16, 2011 at 7:32 AM, Otis Gospodnetic > <[EMAIL PROTECTED]> wrote: > > Hi Yury, > > > > Not sure if this was already covered in this thread, but with N smaller > cores on a single N-CPU-core box you could run N queries in parallel over > smaller indices, which may be faster than a single query going against a > single big index, depending on how many concurrent query requests the box > is handling (i.e. how busy or idle the CPU cores are). > > > > Otis > > ---- > > > > Performance Monitoring SaaS for Solr - > http://sematext.com/spm/solr-performance-monitoring/index.html > > > > > > > >>________________________________ > >> From: Yury Kats <[EMAIL PROTECTED]> > >>To: [EMAIL PROTECTED] > >>Sent: Thursday, December 15, 2011 12:58 PM > >>Subject: Core overhead > >> > >>Does anybody have an idea, or better yet, measured data, > >>to see what the overhead of a core is, both in memory and speed? > >> > >>For example, what would be the difference between having 1 core > >>with 100M documents versus having 10 cores with 10M documents? > >> > >> > >> >
-
Re: Core overheadJason Rutherglen 2011-12-16, 19:56
Ted,
"...- FREE!" is stupid idiot spam. It's annoying and not suitable. On Fri, Dec 16, 2011 at 11:45 AM, Ted Dunning <[EMAIL PROTECTED]> wrote: > I thought it was slightly clumsy, but it was informative. It seemed like a > fine thing to say. Effectively it was "I/we have developed a tool that > will help you solve your problem". That is responsive to the OP and it is > clear that it is a commercial deal. > > On Fri, Dec 16, 2011 at 10:02 AM, Jason Rutherglen < > [EMAIL PROTECTED]> wrote: > >> Wow the shameless plugging of product (footer) has hit a new low Otis. >> >> On Fri, Dec 16, 2011 at 7:32 AM, Otis Gospodnetic >> <[EMAIL PROTECTED]> wrote: >> > Hi Yury, >> > >> > Not sure if this was already covered in this thread, but with N smaller >> cores on a single N-CPU-core box you could run N queries in parallel over >> smaller indices, which may be faster than a single query going against a >> single big index, depending on how many concurrent query requests the box >> is handling (i.e. how busy or idle the CPU cores are). >> > >> > Otis >> > ---- >> > >> > Performance Monitoring SaaS for Solr - >> http://sematext.com/spm/solr-performance-monitoring/index.html >> > >> > >> > >> >>________________________________ >> >> From: Yury Kats <[EMAIL PROTECTED]> >> >>To: [EMAIL PROTECTED] >> >>Sent: Thursday, December 15, 2011 12:58 PM >> >>Subject: Core overhead >> >> >> >>Does anybody have an idea, or better yet, measured data, >> >>to see what the overhead of a core is, both in memory and speed? >> >> >> >>For example, what would be the difference between having 1 core >> >>with 100M documents versus having 10 cores with 10M documents? >> >> >> >> >> >> >>
-
Re: Core overheadTed Dunning 2011-12-16, 20:00
Sounds like we disagree.
On Fri, Dec 16, 2011 at 11:56 AM, Jason Rutherglen < [EMAIL PROTECTED]> wrote: > Ted, > > "...- FREE!" is stupid idiot spam. It's annoying and not suitable. > > On Fri, Dec 16, 2011 at 11:45 AM, Ted Dunning <[EMAIL PROTECTED]> > wrote: > > I thought it was slightly clumsy, but it was informative. It seemed > like a > > fine thing to say. Effectively it was "I/we have developed a tool that > > will help you solve your problem". That is responsive to the OP and it > is > > clear that it is a commercial deal. > > > > On Fri, Dec 16, 2011 at 10:02 AM, Jason Rutherglen < > > [EMAIL PROTECTED]> wrote: > > > >> Wow the shameless plugging of product (footer) has hit a new low Otis. > >> > >> On Fri, Dec 16, 2011 at 7:32 AM, Otis Gospodnetic > >> <[EMAIL PROTECTED]> wrote: > >> > Hi Yury, > >> > > >> > Not sure if this was already covered in this thread, but with N > smaller > >> cores on a single N-CPU-core box you could run N queries in parallel > over > >> smaller indices, which may be faster than a single query going against a > >> single big index, depending on how many concurrent query requests the > box > >> is handling (i.e. how busy or idle the CPU cores are). > >> > > >> > Otis > >> > ---- > >> > > >> > Performance Monitoring SaaS for Solr - > >> http://sematext.com/spm/solr-performance-monitoring/index.html > >> > > >> > > >> > > >> >>________________________________ > >> >> From: Yury Kats <[EMAIL PROTECTED]> > >> >>To: [EMAIL PROTECTED] > >> >>Sent: Thursday, December 15, 2011 12:58 PM > >> >>Subject: Core overhead > >> >> > >> >>Does anybody have an idea, or better yet, measured data, > >> >>to see what the overhead of a core is, both in memory and speed? > >> >> > >> >>For example, what would be the difference between having 1 core > >> >>with 100M documents versus having 10 cores with 10M documents? > >> >> > >> >> > >> >> > >> >
-
Re: Core overheadJason Rutherglen 2011-12-16, 20:29
Ted,
The list would be unreadable if everyone spammed at the bottom their email like Otis'. It's just bad form. Jason On Fri, Dec 16, 2011 at 12:00 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > Sounds like we disagree. > > On Fri, Dec 16, 2011 at 11:56 AM, Jason Rutherglen < > [EMAIL PROTECTED]> wrote: > >> Ted, >> >> "...- FREE!" is stupid idiot spam. It's annoying and not suitable. >> >> On Fri, Dec 16, 2011 at 11:45 AM, Ted Dunning <[EMAIL PROTECTED]> >> wrote: >> > I thought it was slightly clumsy, but it was informative. It seemed >> like a >> > fine thing to say. Effectively it was "I/we have developed a tool that >> > will help you solve your problem". That is responsive to the OP and it >> is >> > clear that it is a commercial deal. >> > >> > On Fri, Dec 16, 2011 at 10:02 AM, Jason Rutherglen < >> > [EMAIL PROTECTED]> wrote: >> > >> >> Wow the shameless plugging of product (footer) has hit a new low Otis. >> >> >> >> On Fri, Dec 16, 2011 at 7:32 AM, Otis Gospodnetic >> >> <[EMAIL PROTECTED]> wrote: >> >> > Hi Yury, >> >> > >> >> > Not sure if this was already covered in this thread, but with N >> smaller >> >> cores on a single N-CPU-core box you could run N queries in parallel >> over >> >> smaller indices, which may be faster than a single query going against a >> >> single big index, depending on how many concurrent query requests the >> box >> >> is handling (i.e. how busy or idle the CPU cores are). >> >> > >> >> > Otis >> >> > ---- >> >> > >> >> > Performance Monitoring SaaS for Solr - >> >> http://sematext.com/spm/solr-performance-monitoring/index.html >> >> > >> >> > >> >> > >> >> >>________________________________ >> >> >> From: Yury Kats <[EMAIL PROTECTED]> >> >> >>To: [EMAIL PROTECTED] >> >> >>Sent: Thursday, December 15, 2011 12:58 PM >> >> >>Subject: Core overhead >> >> >> >> >> >>Does anybody have an idea, or better yet, measured data, >> >> >>to see what the overhead of a core is, both in memory and speed? >> >> >> >> >> >>For example, what would be the difference between having 1 core >> >> >>with 100M documents versus having 10 cores with 10M documents? >> >> >> >> >> >> >> >> >> >> >> >>
-
Re: Core overheadTed Dunning 2011-12-16, 21:00
We still disagree.
On Fri, Dec 16, 2011 at 12:29 PM, Jason Rutherglen < [EMAIL PROTECTED]> wrote: > Ted, > > The list would be unreadable if everyone spammed at the bottom their > email like Otis'. It's just bad form. > > Jason > > On Fri, Dec 16, 2011 at 12:00 PM, Ted Dunning <[EMAIL PROTECTED]> > wrote: > > Sounds like we disagree. > > > > On Fri, Dec 16, 2011 at 11:56 AM, Jason Rutherglen < > > [EMAIL PROTECTED]> wrote: > > > >> Ted, > >> > >> "...- FREE!" is stupid idiot spam. It's annoying and not suitable. > >> > >> On Fri, Dec 16, 2011 at 11:45 AM, Ted Dunning <[EMAIL PROTECTED]> > >> wrote: > >> > I thought it was slightly clumsy, but it was informative. It seemed > >> like a > >> > fine thing to say. Effectively it was "I/we have developed a tool > that > >> > will help you solve your problem". That is responsive to the OP and > it > >> is > >> > clear that it is a commercial deal. > >> > > >> > On Fri, Dec 16, 2011 at 10:02 AM, Jason Rutherglen < > >> > [EMAIL PROTECTED]> wrote: > >> > > >> >> Wow the shameless plugging of product (footer) has hit a new low > Otis. > >> >> > >> >> On Fri, Dec 16, 2011 at 7:32 AM, Otis Gospodnetic > >> >> <[EMAIL PROTECTED]> wrote: > >> >> > Hi Yury, > >> >> > > >> >> > Not sure if this was already covered in this thread, but with N > >> smaller > >> >> cores on a single N-CPU-core box you could run N queries in parallel > >> over > >> >> smaller indices, which may be faster than a single query going > against a > >> >> single big index, depending on how many concurrent query requests the > >> box > >> >> is handling (i.e. how busy or idle the CPU cores are). > >> >> > > >> >> > Otis > >> >> > ---- > >> >> > > >> >> > Performance Monitoring SaaS for Solr - > >> >> http://sematext.com/spm/solr-performance-monitoring/index.html > >> >> > > >> >> > > >> >> > > >> >> >>________________________________ > >> >> >> From: Yury Kats <[EMAIL PROTECTED]> > >> >> >>To: [EMAIL PROTECTED] > >> >> >>Sent: Thursday, December 15, 2011 12:58 PM > >> >> >>Subject: Core overhead > >> >> >> > >> >> >>Does anybody have an idea, or better yet, measured data, > >> >> >>to see what the overhead of a core is, both in memory and speed? > >> >> >> > >> >> >>For example, what would be the difference between having 1 core > >> >> >>with 100M documents versus having 10 cores with 10M documents? > >> >> >> > >> >> >> > >> >> >> > >> >> > >> >
-
Re: Core overheadChris Hostetter 2011-12-16, 23:42
: The list would be unreadable if everyone spammed at the bottom their : email like Otis'. It's just bad form. If you'd like to debate project policy on what is/isn't acceptible on any of the Lucene mailing lists, please start a new thread on general@lucene (the list that exists precisely for the purpose of discussing meta-issues related to the Project/Community) instead of spamming the substantial solr-user@lucene subscriber base who probably subscribed to this list because they were interested in getting emails about using solr, not debating email etiquite. -Hoss |