|
Peter Sturge
2010-09-12, 16:26
Dennis Gearon
2010-09-12, 18:34
Jason Rutherglen
2010-09-12, 23:05
Peter Sturge
2010-09-12, 23:18
Jason Rutherglen
2010-09-13, 02:52
Dennis Gearon
2010-09-13, 06:02
Peter Sturge
2010-09-13, 08:27
Simon Willnauer
2010-09-13, 08:33
Dennis Gearon
2010-09-13, 16:46
Lance Norskog
2010-09-13, 01:20
Peter Sturge
2010-09-13, 07:56
Erick Erickson
2010-09-12, 16:43
Peter Sturge
2010-09-13, 08:19
Peter Karich
2010-11-15, 20:31
Jonathan Rochkind
2010-11-15, 21:24
Peter Karich
2010-11-15, 21:37
Dennis Gearon
2010-11-15, 21:43
Peter Karich
2010-11-15, 22:19
Koji Sekiguchi
2010-11-15, 22:29
Jonathan Rochkind
2010-11-15, 23:36
Koji Sekiguchi
2010-11-15, 23:56
Peter Sturge
2010-11-16, 09:40
stockii
2010-12-02, 12:51
Peter Sturge
2010-12-02, 13:28
Jonathan Rochkind
2010-11-15, 21:46
Peter Karich
2010-09-12, 19:46
Peter Sturge
2010-09-13, 08:09
Peter Karich
2010-09-14, 07:37
Peter Karich
2010-09-14, 13:00
Peter Sturge
2010-09-17, 09:18
Dennis Gearon
2010-09-17, 16:55
Erick Erickson
2010-09-17, 17:05
Dennis Gearon
2010-09-17, 17:59
Andy
2010-09-17, 19:06
Peter Sturge
2010-09-17, 22:48
Bruce Ritchie
2010-09-30, 15:26
Anders Melchiorsen
2010-10-11, 10:01
Chris Haggstrom
2010-09-13, 01:45
|
-
Tuning Solr caches with high commit rates (NRT)Peter Sturge 2010-09-12, 16:26
Hi,
Below are some notes regarding Solr cache tuning that should prove useful for anyone who uses Solr with frequent commits (e.g. <5min). Environment: Solr 1.4.1 or branch_3x trunk. Note the 4.x trunk has lots of neat new features, so the notes here are likely less relevant to the 4.x environment. Overview: Our Solr environment makes extensive use of faceting, we perform commits every 30secs, and the indexes tend be on the large-ish side (>20million docs). Note: For our data, when we commit, we are always adding new data, never changing existing data. This type of environment can be tricky to tune, as Solr is more geared toward fast reads than frequent writes. Symptoms: If anyone has used faceting in searches where you are also performing frequent commits, you've likely encountered the dreaded OutOfMemory or GC Overhead Exeeded errors. In high commit rate environments, this is almost always due to multiple 'onDeck' searchers and autowarming - i.e. new searchers don't finish autowarming their caches before the next commit() comes along and invalidates them. Once this starts happening on a regular basis, it is likely your Solr's JVM will run out of memory eventually, as the number of searchers (and their cache arrays) will keep growing until the JVM dies of thirst. To check if your Solr environment is suffering from this, turn on INFO level logging, and look for: 'PERFORMANCE WARNING: Overlapping onDeckSearchers=x'. In tests, we've only ever seen this problem when using faceting, and facet.method=fc. Some solutions to this are: Reduce the commit rate to allow searchers to fully warm before the next commit Reduce or eliminate the autowarming in caches Both of the above The trouble is, if you're doing NRT commits, you likely have a good reason for it, and reducing/elimintating autowarming will very significantly impact search performance in high commit rate environments. Solution: Here are some setup steps we've used that allow lots of faceting (we typically search with at least 20-35 different facet fields, and date faceting/sorting) on large indexes, and still keep decent search performance: 1. Firstly, you should consider using the enum method for facet searches (facet.method=enum) unless you've got A LOT of memory on your machine. In our tests, this method uses a lot less memory and autowarms more quickly than fc. (Note, I've not tried the new segement-based 'fcs' option, as I can't find support for it in branch_3x - looks nice for 4.x though) Admittedly, for our data, enum is not quite as fast for searching as fc, but short of purchsing a Thaiwanese RAM factory, it's a worthwhile tradeoff. If you do have access to LOTS of memory, AND you can guarantee that the index won't grow beyond the memory capacity (i.e. you have some sort of deletion policy in place), fc can be a lot faster than enum when searching with lots of facets across many terms. 2. Secondly, we've found that LRUCache is faster at autowarming than FastLRUCache - in our tests, about 20% faster. Maybe this is just our environment - your mileage may vary. So, our filterCache section in solrconfig.xml looks like this: <filterCache class="solr.LRUCache" size="3600" initialSize="1400" autowarmCount="3600"/> For a 28GB index, running in a quad-core x64 VMWare instance, 30 warmed facet fields, Solr is running at ~4GB. Stats filterCache size shows usually in the region of ~2400. 3. It's also a good idea to have some sort of firstSearcher/newSearcher event listener queries to allow new data to populate the caches. Of course, what you put in these is dependent on the facets you need/use. We've found a good combination is a firstSearcher with as many facets in the search as your environment can handle, then a subset of the most common facets for the newSearcher. 4. We also set: <useColdSearcher>true</useColdSearcher> just in case. 5. Another key area for search performance with high commits is to use 2 Solr instances - one for the high commit rate indexing, and one for searching. The read-only searching instance can be a remote replica, or a local read-only instance that reads the same core as the indexing instance (for the latter, you'll need something that periodically refreshes - i.e. runs commit()). This way, you can tune the indexing instance for writing performance and the searching instance as above for max read performance. Using the setup above, we get fantastic searching speed for small facet sets (well under 1sec), and really good searching for large facet sets (a couple of secs depending on index size, number of facets, unique terms etc. etc.), even when searching against largeish indexes (>20million docs). We have yet to see any OOM or GC errors using the techniques above, even in low memory conditions. I hope there are people that find this useful. I know I've spent a lot of time looking for stuff like this, so hopefullly, this will save someone some time. Peter +
Peter Sturge 2010-09-12, 16:26
-
Re: Tuning Solr caches with high commit rates (NRT)Dennis Gearon 2010-09-12, 18:34
Wow! Thanks for that. This email is DEFINITELY being filed.
Dennis Gearon Signature Warning ---------------- EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Sun, 9/12/10, Peter Sturge <[EMAIL PROTECTED]> wrote: > From: Peter Sturge <[EMAIL PROTECTED]> > Subject: Tuning Solr caches with high commit rates (NRT) > To: solr[EMAIL PROTECTED] > Date: Sunday, September 12, 2010, 9:26 AM > Hi, > > Below are some notes regarding Solr cache tuning that > should prove > useful for anyone who uses Solr with frequent commits (e.g. > <5min). > > Environment: > Solr 1.4.1 or branch_3x trunk. > Note the 4.x trunk has lots of neat new features, so the > notes here > are likely less relevant to the 4.x environment. > > Overview: > Our Solr environment makes extensive use of faceting, we > perform > commits every 30secs, and the indexes tend be on the > large-ish side > (>20million docs). > Note: For our data, when we commit, we are always adding > new data, > never changing existing data. > This type of environment can be tricky to tune, as Solr is > more geared > toward fast reads than frequent writes. > > Symptoms: > If anyone has used faceting in searches where you are also > performing > frequent commits, you've likely encountered the dreaded > OutOfMemory or > GC Overhead Exeeded errors. > In high commit rate environments, this is almost always due > to > multiple 'onDeck' searchers and autowarming - i.e. new > searchers don't > finish autowarming their caches before the next commit() > comes along and invalidates them. > Once this starts happening on a regular basis, it is likely > your > Solr's JVM will run out of memory eventually, as the number > of > searchers (and their cache arrays) will keep growing until > the JVM > dies of thirst. > To check if your Solr environment is suffering from this, > turn on INFO > level logging, and look for: 'PERFORMANCE WARNING: > Overlapping > onDeckSearchers=x'. > > In tests, we've only ever seen this problem when using > faceting, and > facet.method=fc. > > Some solutions to this are: > Reduce the commit rate to allow searchers to > fully warm before the > next commit > Reduce or eliminate the autowarming in > caches > Both of the above > > The trouble is, if you're doing NRT commits, you likely > have a good > reason for it, and reducing/elimintating autowarming will > very > significantly impact search performance in high commit > rate > environments. > > Solution: > Here are some setup steps we've used that allow lots of > faceting (we > typically search with at least 20-35 different facet > fields, and date > faceting/sorting) on large indexes, and still keep decent > search > performance: > > 1. Firstly, you should consider using the enum method for > facet > searches (facet.method=enum) unless you've got A LOT of > memory on your > machine. In our tests, this method uses a lot less memory > and > autowarms more quickly than fc. (Note, I've not tried the > new > segement-based 'fcs' option, as I can't find support for it > in > branch_3x - looks nice for 4.x though) > Admittedly, for our data, enum is not quite as fast for > searching as > fc, but short of purchsing a Thaiwanese RAM factory, it's a > worthwhile > tradeoff. > If you do have access to LOTS of memory, AND you can > guarantee that > the index won't grow beyond the memory capacity (i.e. you > have some > sort of deletion policy in place), fc can be a lot faster > than enum > when searching with lots of facets across many terms. > > 2. Secondly, we've found that LRUCache is faster at > autowarming than +
Dennis Gearon 2010-09-12, 18:34
-
Re: Tuning Solr caches with high commit rates (NRT)Jason Rutherglen 2010-09-12, 23:05
Peter,
Are you using per-segment faceting, eg, SOLR-1617? That could help your situation. On Sun, Sep 12, 2010 at 12:26 PM, Peter Sturge <[EMAIL PROTECTED]> wrote: > Hi, > > Below are some notes regarding Solr cache tuning that should prove > useful for anyone who uses Solr with frequent commits (e.g. <5min). > > Environment: > Solr 1.4.1 or branch_3x trunk. > Note the 4.x trunk has lots of neat new features, so the notes here > are likely less relevant to the 4.x environment. > > Overview: > Our Solr environment makes extensive use of faceting, we perform > commits every 30secs, and the indexes tend be on the large-ish side > (>20million docs). > Note: For our data, when we commit, we are always adding new data, > never changing existing data. > This type of environment can be tricky to tune, as Solr is more geared > toward fast reads than frequent writes. > > Symptoms: > If anyone has used faceting in searches where you are also performing > frequent commits, you've likely encountered the dreaded OutOfMemory or > GC Overhead Exeeded errors. > In high commit rate environments, this is almost always due to > multiple 'onDeck' searchers and autowarming - i.e. new searchers don't > finish autowarming their caches before the next commit() > comes along and invalidates them. > Once this starts happening on a regular basis, it is likely your > Solr's JVM will run out of memory eventually, as the number of > searchers (and their cache arrays) will keep growing until the JVM > dies of thirst. > To check if your Solr environment is suffering from this, turn on INFO > level logging, and look for: 'PERFORMANCE WARNING: Overlapping > onDeckSearchers=x'. > > In tests, we've only ever seen this problem when using faceting, and > facet.method=fc. > > Some solutions to this are: > Reduce the commit rate to allow searchers to fully warm before the > next commit > Reduce or eliminate the autowarming in caches > Both of the above > > The trouble is, if you're doing NRT commits, you likely have a good > reason for it, and reducing/elimintating autowarming will very > significantly impact search performance in high commit rate > environments. > > Solution: > Here are some setup steps we've used that allow lots of faceting (we > typically search with at least 20-35 different facet fields, and date > faceting/sorting) on large indexes, and still keep decent search > performance: > > 1. Firstly, you should consider using the enum method for facet > searches (facet.method=enum) unless you've got A LOT of memory on your > machine. In our tests, this method uses a lot less memory and > autowarms more quickly than fc. (Note, I've not tried the new > segement-based 'fcs' option, as I can't find support for it in > branch_3x - looks nice for 4.x though) > Admittedly, for our data, enum is not quite as fast for searching as > fc, but short of purchsing a Thaiwanese RAM factory, it's a worthwhile > tradeoff. > If you do have access to LOTS of memory, AND you can guarantee that > the index won't grow beyond the memory capacity (i.e. you have some > sort of deletion policy in place), fc can be a lot faster than enum > when searching with lots of facets across many terms. > > 2. Secondly, we've found that LRUCache is faster at autowarming than > FastLRUCache - in our tests, about 20% faster. Maybe this is just our > environment - your mileage may vary. > > So, our filterCache section in solrconfig.xml looks like this: > <filterCache > class="solr.LRUCache" > size="3600" > initialSize="1400" > autowarmCount="3600"/> > > For a 28GB index, running in a quad-core x64 VMWare instance, 30 > warmed facet fields, Solr is running at ~4GB. Stats filterCache size +
Jason Rutherglen 2010-09-12, 23:05
-
Re: Tuning Solr caches with high commit rates (NRT)Peter Sturge 2010-09-12, 23:18
Hi Jason,
I've tried some limited testing with the 4.x trunk using fcs, and I must say, I really like the idea of per-segment faceting. I was hoping to see it in 3.x, but I don't see this option in the branch_3x trunk. Is your SOLR-1606 patch referred to in SOLR-1617 the one to use with 3.1? There seems to be a number of Solr issues tied to this - one of them being Lucene-1785. Can the per-segment faceting patch work with Lucene 2.9/branch_3x? Thanks, Peter On Mon, Sep 13, 2010 at 12:05 AM, Jason Rutherglen <[EMAIL PROTECTED]> wrote: > Peter, > > Are you using per-segment faceting, eg, SOLR-1617? That could help > your situation. > > On Sun, Sep 12, 2010 at 12:26 PM, Peter Sturge <[EMAIL PROTECTED]> wrote: >> Hi, >> >> Below are some notes regarding Solr cache tuning that should prove >> useful for anyone who uses Solr with frequent commits (e.g. <5min). >> >> Environment: >> Solr 1.4.1 or branch_3x trunk. >> Note the 4.x trunk has lots of neat new features, so the notes here >> are likely less relevant to the 4.x environment. >> >> Overview: >> Our Solr environment makes extensive use of faceting, we perform >> commits every 30secs, and the indexes tend be on the large-ish side >> (>20million docs). >> Note: For our data, when we commit, we are always adding new data, >> never changing existing data. >> This type of environment can be tricky to tune, as Solr is more geared >> toward fast reads than frequent writes. >> >> Symptoms: >> If anyone has used faceting in searches where you are also performing >> frequent commits, you've likely encountered the dreaded OutOfMemory or >> GC Overhead Exeeded errors. >> In high commit rate environments, this is almost always due to >> multiple 'onDeck' searchers and autowarming - i.e. new searchers don't >> finish autowarming their caches before the next commit() >> comes along and invalidates them. >> Once this starts happening on a regular basis, it is likely your >> Solr's JVM will run out of memory eventually, as the number of >> searchers (and their cache arrays) will keep growing until the JVM >> dies of thirst. >> To check if your Solr environment is suffering from this, turn on INFO >> level logging, and look for: 'PERFORMANCE WARNING: Overlapping >> onDeckSearchers=x'. >> >> In tests, we've only ever seen this problem when using faceting, and >> facet.method=fc. >> >> Some solutions to this are: >> Reduce the commit rate to allow searchers to fully warm before the >> next commit >> Reduce or eliminate the autowarming in caches >> Both of the above >> >> The trouble is, if you're doing NRT commits, you likely have a good >> reason for it, and reducing/elimintating autowarming will very >> significantly impact search performance in high commit rate >> environments. >> >> Solution: >> Here are some setup steps we've used that allow lots of faceting (we >> typically search with at least 20-35 different facet fields, and date >> faceting/sorting) on large indexes, and still keep decent search >> performance: >> >> 1. Firstly, you should consider using the enum method for facet >> searches (facet.method=enum) unless you've got A LOT of memory on your >> machine. In our tests, this method uses a lot less memory and >> autowarms more quickly than fc. (Note, I've not tried the new >> segement-based 'fcs' option, as I can't find support for it in >> branch_3x - looks nice for 4.x though) >> Admittedly, for our data, enum is not quite as fast for searching as >> fc, but short of purchsing a Thaiwanese RAM factory, it's a worthwhile >> tradeoff. >> If you do have access to LOTS of memory, AND you can guarantee that >> the index won't grow beyond the memory capacity (i.e. you have some +
Peter Sturge 2010-09-12, 23:18
-
Re: Tuning Solr caches with high commit rates (NRT)Jason Rutherglen 2010-09-13, 02:52
Yeah there's no patch... I think Yonik can write it. :-) Yah... The
Lucene version shouldn't matter. The distributed faceting theoretically can easily be applied to multiple segments, however the way it's written for me is a challenge to untangle and apply successfully to a working patch. Also I don't have this as an itch to scratch at the moment. On Sun, Sep 12, 2010 at 7:18 PM, Peter Sturge <[EMAIL PROTECTED]> wrote: > Hi Jason, > > I've tried some limited testing with the 4.x trunk using fcs, and I > must say, I really like the idea of per-segment faceting. > I was hoping to see it in 3.x, but I don't see this option in the > branch_3x trunk. Is your SOLR-1606 patch referred to in SOLR-1617 the > one to use with 3.1? > There seems to be a number of Solr issues tied to this - one of them > being Lucene-1785. Can the per-segment faceting patch work with Lucene > 2.9/branch_3x? > > Thanks, > Peter > > > > On Mon, Sep 13, 2010 at 12:05 AM, Jason Rutherglen > <[EMAIL PROTECTED]> wrote: >> Peter, >> >> Are you using per-segment faceting, eg, SOLR-1617? That could help >> your situation. >> >> On Sun, Sep 12, 2010 at 12:26 PM, Peter Sturge <[EMAIL PROTECTED]> wrote: >>> Hi, >>> >>> Below are some notes regarding Solr cache tuning that should prove >>> useful for anyone who uses Solr with frequent commits (e.g. <5min). >>> >>> Environment: >>> Solr 1.4.1 or branch_3x trunk. >>> Note the 4.x trunk has lots of neat new features, so the notes here >>> are likely less relevant to the 4.x environment. >>> >>> Overview: >>> Our Solr environment makes extensive use of faceting, we perform >>> commits every 30secs, and the indexes tend be on the large-ish side >>> (>20million docs). >>> Note: For our data, when we commit, we are always adding new data, >>> never changing existing data. >>> This type of environment can be tricky to tune, as Solr is more geared >>> toward fast reads than frequent writes. >>> >>> Symptoms: >>> If anyone has used faceting in searches where you are also performing >>> frequent commits, you've likely encountered the dreaded OutOfMemory or >>> GC Overhead Exeeded errors. >>> In high commit rate environments, this is almost always due to >>> multiple 'onDeck' searchers and autowarming - i.e. new searchers don't >>> finish autowarming their caches before the next commit() >>> comes along and invalidates them. >>> Once this starts happening on a regular basis, it is likely your >>> Solr's JVM will run out of memory eventually, as the number of >>> searchers (and their cache arrays) will keep growing until the JVM >>> dies of thirst. >>> To check if your Solr environment is suffering from this, turn on INFO >>> level logging, and look for: 'PERFORMANCE WARNING: Overlapping >>> onDeckSearchers=x'. >>> >>> In tests, we've only ever seen this problem when using faceting, and >>> facet.method=fc. >>> >>> Some solutions to this are: >>> Reduce the commit rate to allow searchers to fully warm before the >>> next commit >>> Reduce or eliminate the autowarming in caches >>> Both of the above >>> >>> The trouble is, if you're doing NRT commits, you likely have a good >>> reason for it, and reducing/elimintating autowarming will very >>> significantly impact search performance in high commit rate >>> environments. >>> >>> Solution: >>> Here are some setup steps we've used that allow lots of faceting (we >>> typically search with at least 20-35 different facet fields, and date >>> faceting/sorting) on large indexes, and still keep decent search >>> performance: >>> >>> 1. Firstly, you should consider using the enum method for facet >>> searches (facet.method=enum) unless you've got A LOT of memory on your +
Jason Rutherglen 2010-09-13, 02:52
-
Re: Tuning Solr caches with high commit rates (NRT)Dennis Gearon 2010-09-13, 06:02
BTW, what is a segment?
I've only heard about them in the last 2 weeks here on the list. Dennis Gearon Signature Warning ---------------- EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Sun, 9/12/10, Jason Rutherglen <[EMAIL PROTECTED]> wrote: > From: Jason Rutherglen <[EMAIL PROTECTED]> > Subject: Re: Tuning Solr caches with high commit rates (NRT) > To: solr[EMAIL PROTECTED] > Date: Sunday, September 12, 2010, 7:52 PM > Yeah there's no patch... I think > Yonik can write it. :-) Yah... The > Lucene version shouldn't matter. The distributed > faceting > theoretically can easily be applied to multiple segments, > however the > way it's written for me is a challenge to untangle and > apply > successfully to a working patch. Also I don't have > this as an itch to > scratch at the moment. > > On Sun, Sep 12, 2010 at 7:18 PM, Peter Sturge <[EMAIL PROTECTED]> > wrote: > > Hi Jason, > > > > I've tried some limited testing with the 4.x trunk > using fcs, and I > > must say, I really like the idea of per-segment > faceting. > > I was hoping to see it in 3.x, but I don't see this > option in the > > branch_3x trunk. Is your SOLR-1606 patch referred to > in SOLR-1617 the > > one to use with 3.1? > > There seems to be a number of Solr issues tied to this > - one of them > > being Lucene-1785. Can the per-segment faceting patch > work with Lucene > > 2.9/branch_3x? > > > > Thanks, > > Peter > > > > > > > > On Mon, Sep 13, 2010 at 12:05 AM, Jason Rutherglen > > <[EMAIL PROTECTED]> > wrote: > >> Peter, > >> > >> Are you using per-segment faceting, eg, SOLR-1617? > That could help > >> your situation. > >> > >> On Sun, Sep 12, 2010 at 12:26 PM, Peter Sturge > <[EMAIL PROTECTED]> > wrote: > >>> Hi, > >>> > >>> Below are some notes regarding Solr cache > tuning that should prove > >>> useful for anyone who uses Solr with frequent > commits (e.g. <5min). > >>> > >>> Environment: > >>> Solr 1.4.1 or branch_3x trunk. > >>> Note the 4.x trunk has lots of neat new > features, so the notes here > >>> are likely less relevant to the 4.x > environment. > >>> > >>> Overview: > >>> Our Solr environment makes extensive use of > faceting, we perform > >>> commits every 30secs, and the indexes tend be > on the large-ish side > >>> (>20million docs). > >>> Note: For our data, when we commit, we are > always adding new data, > >>> never changing existing data. > >>> This type of environment can be tricky to > tune, as Solr is more geared > >>> toward fast reads than frequent writes. > >>> > >>> Symptoms: > >>> If anyone has used faceting in searches where > you are also performing > >>> frequent commits, you've likely encountered > the dreaded OutOfMemory or > >>> GC Overhead Exeeded errors. > >>> In high commit rate environments, this is > almost always due to > >>> multiple 'onDeck' searchers and autowarming - > i.e. new searchers don't > >>> finish autowarming their caches before the > next commit() > >>> comes along and invalidates them. > >>> Once this starts happening on a regular basis, > it is likely your > >>> Solr's JVM will run out of memory eventually, > as the number of > >>> searchers (and their cache arrays) will keep > growing until the JVM > >>> dies of thirst. > >>> To check if your Solr environment is suffering > from this, turn on INFO > >>> level logging, and look for: 'PERFORMANCE > WARNING: Overlapping > >>> onDeckSearchers=x'. > >>> > >>> In tests, we've only ever seen this problem > when using faceting, and > >>> facet.method=fc. > >>> > >>> Some solutions to this are: > >>> Reduce the commit rate to allow searchers +
Dennis Gearon 2010-09-13, 06:02
-
Re: Tuning Solr caches with high commit rates (NRT)Peter Sturge 2010-09-13, 08:27
Hi Dennis,
These are the Lucene file segments that hold the index data on the file system. Have a look at: http://wiki.apache.org/solr/SolrPerformanceFactors Peter On Mon, Sep 13, 2010 at 7:02 AM, Dennis Gearon <[EMAIL PROTECTED]> wrote: > BTW, what is a segment? > > I've only heard about them in the last 2 weeks here on the list. > Dennis Gearon > > Signature Warning > ---------------- > EARTH has a Right To Life, > otherwise we all die. > > Read 'Hot, Flat, and Crowded' > Laugh at http://www.yert.com/film.php > > > --- On Sun, 9/12/10, Jason Rutherglen <[EMAIL PROTECTED]> wrote: > >> From: Jason Rutherglen <[EMAIL PROTECTED]> >> Subject: Re: Tuning Solr caches with high commit rates (NRT) >> To: solr[EMAIL PROTECTED] >> Date: Sunday, September 12, 2010, 7:52 PM >> Yeah there's no patch... I think >> Yonik can write it. :-) Yah... The >> Lucene version shouldn't matter. The distributed >> faceting >> theoretically can easily be applied to multiple segments, >> however the >> way it's written for me is a challenge to untangle and >> apply >> successfully to a working patch. Also I don't have >> this as an itch to >> scratch at the moment. >> >> On Sun, Sep 12, 2010 at 7:18 PM, Peter Sturge <[EMAIL PROTECTED]> >> wrote: >> > Hi Jason, >> > >> > I've tried some limited testing with the 4.x trunk >> using fcs, and I >> > must say, I really like the idea of per-segment >> faceting. >> > I was hoping to see it in 3.x, but I don't see this >> option in the >> > branch_3x trunk. Is your SOLR-1606 patch referred to >> in SOLR-1617 the >> > one to use with 3.1? >> > There seems to be a number of Solr issues tied to this >> - one of them >> > being Lucene-1785. Can the per-segment faceting patch >> work with Lucene >> > 2.9/branch_3x? >> > >> > Thanks, >> > Peter >> > >> > >> > >> > On Mon, Sep 13, 2010 at 12:05 AM, Jason Rutherglen >> > <[EMAIL PROTECTED]> >> wrote: >> >> Peter, >> >> >> >> Are you using per-segment faceting, eg, SOLR-1617? >> That could help >> >> your situation. >> >> >> >> On Sun, Sep 12, 2010 at 12:26 PM, Peter Sturge >> <[EMAIL PROTECTED]> >> wrote: >> >>> Hi, >> >>> >> >>> Below are some notes regarding Solr cache >> tuning that should prove >> >>> useful for anyone who uses Solr with frequent >> commits (e.g. <5min). >> >>> >> >>> Environment: >> >>> Solr 1.4.1 or branch_3x trunk. >> >>> Note the 4.x trunk has lots of neat new >> features, so the notes here >> >>> are likely less relevant to the 4.x >> environment. >> >>> >> >>> Overview: >> >>> Our Solr environment makes extensive use of >> faceting, we perform >> >>> commits every 30secs, and the indexes tend be >> on the large-ish side >> >>> (>20million docs). >> >>> Note: For our data, when we commit, we are >> always adding new data, >> >>> never changing existing data. >> >>> This type of environment can be tricky to >> tune, as Solr is more geared >> >>> toward fast reads than frequent writes. >> >>> >> >>> Symptoms: >> >>> If anyone has used faceting in searches where >> you are also performing >> >>> frequent commits, you've likely encountered >> the dreaded OutOfMemory or >> >>> GC Overhead Exeeded errors. >> >>> In high commit rate environments, this is >> almost always due to >> >>> multiple 'onDeck' searchers and autowarming - >> i.e. new searchers don't >> >>> finish autowarming their caches before the >> next commit() >> >>> comes along and invalidates them. >> >>> Once this starts happening on a regular basis, >> it is likely your >> >>> Solr's JVM will run out of memory eventually, >> as the number of >> >>> searchers (and their cache arrays) will keep >> growing until the JVM >> >>> dies of thirst. >> >>> To check if your Solr environment is suffering +
Peter Sturge 2010-09-13, 08:27
-
Re: Tuning Solr caches with high commit rates (NRT)Simon Willnauer 2010-09-13, 08:33
On Mon, Sep 13, 2010 at 8:02 AM, Dennis Gearon <[EMAIL PROTECTED]> wrote:
> BTW, what is a segment? On the Lucene level an index is composed of one or more index segments. Each segment is an index by itself and consists of several files like doc stores, proximity data, term dictionaries etc. During indexing Lucene / Solr creates those segments depending on ram buffer / document buffer settings and flushes them to disk (if you index to disk). Once a segment has been flushed Lucene will never change the segments (well up to a certain level - lets keep this simple) but write new ones for new added documents. Since segments have a write-once policy Lucene merges multiple segments into a new segment (how and when this happens is different story) from time to time to get rid of deleted documents and to reduce the number of overall segments in the index. Generally a higher number of segments will also influence you search performance since Lucene performs almost all operations on a per-segment level. If you want to reduce the number of segment to one you need to call optimize and lucene will merge all existing ones into one single segment. hope that answers your question simon > > I've only heard about them in the last 2 weeks here on the list. > Dennis Gearon > > Signature Warning > ---------------- > EARTH has a Right To Life, > otherwise we all die. > > Read 'Hot, Flat, and Crowded' > Laugh at http://www.yert.com/film.php > > > --- On Sun, 9/12/10, Jason Rutherglen <[EMAIL PROTECTED]> wrote: > >> From: Jason Rutherglen <[EMAIL PROTECTED]> >> Subject: Re: Tuning Solr caches with high commit rates (NRT) >> To: solr[EMAIL PROTECTED] >> Date: Sunday, September 12, 2010, 7:52 PM >> Yeah there's no patch... I think >> Yonik can write it. :-) Yah... The >> Lucene version shouldn't matter. The distributed >> faceting >> theoretically can easily be applied to multiple segments, >> however the >> way it's written for me is a challenge to untangle and >> apply >> successfully to a working patch. Also I don't have >> this as an itch to >> scratch at the moment. >> >> On Sun, Sep 12, 2010 at 7:18 PM, Peter Sturge <[EMAIL PROTECTED]> >> wrote: >> > Hi Jason, >> > >> > I've tried some limited testing with the 4.x trunk >> using fcs, and I >> > must say, I really like the idea of per-segment >> faceting. >> > I was hoping to see it in 3.x, but I don't see this >> option in the >> > branch_3x trunk. Is your SOLR-1606 patch referred to >> in SOLR-1617 the >> > one to use with 3.1? >> > There seems to be a number of Solr issues tied to this >> - one of them >> > being Lucene-1785. Can the per-segment faceting patch >> work with Lucene >> > 2.9/branch_3x? >> > >> > Thanks, >> > Peter >> > >> > >> > >> > On Mon, Sep 13, 2010 at 12:05 AM, Jason Rutherglen >> > <[EMAIL PROTECTED]> >> wrote: >> >> Peter, >> >> >> >> Are you using per-segment faceting, eg, SOLR-1617? >> That could help >> >> your situation. >> >> >> >> On Sun, Sep 12, 2010 at 12:26 PM, Peter Sturge >> <[EMAIL PROTECTED]> >> wrote: >> >>> Hi, >> >>> >> >>> Below are some notes regarding Solr cache >> tuning that should prove >> >>> useful for anyone who uses Solr with frequent >> commits (e.g. <5min). >> >>> >> >>> Environment: >> >>> Solr 1.4.1 or branch_3x trunk. >> >>> Note the 4.x trunk has lots of neat new >> features, so the notes here >> >>> are likely less relevant to the 4.x >> environment. >> >>> >> >>> Overview: >> >>> Our Solr environment makes extensive use of >> faceting, we perform >> >>> commits every 30secs, and the indexes tend be >> on the large-ish side >> >>> (>20million docs). >> >>> Note: For our data, when we commit, we are >> always adding new data, >> >>> never changing existing data. >> >>> This type of environment can be tricky to +
Simon Willnauer 2010-09-13, 08:33
-
Re: Tuning Solr caches with high commit rates (NRT)Dennis Gearon 2010-09-13, 16:46
Thanks guys for the explanation.
Dennis Gearon Signature Warning ---------------- EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Mon, 9/13/10, Simon Willnauer <[EMAIL PROTECTED]> wrote: > From: Simon Willnauer <[EMAIL PROTECTED]> > Subject: Re: Tuning Solr caches with high commit rates (NRT) > To: solr[EMAIL PROTECTED] > Date: Monday, September 13, 2010, 1:33 AM > On Mon, Sep 13, 2010 at 8:02 AM, > Dennis Gearon <[EMAIL PROTECTED]> > wrote: > > BTW, what is a segment? > > On the Lucene level an index is composed of one or more > index > segments. Each segment is an index by itself and consists > of several > files like doc stores, proximity data, term dictionaries > etc. During > indexing Lucene / Solr creates those segments depending on > ram buffer > / document buffer settings and flushes them to disk (if you > index to > disk). Once a segment has been flushed Lucene will never > change the > segments (well up to a certain level - lets keep this > simple) but > write new ones for new added documents. Since segments have > a > write-once policy Lucene merges multiple segments into a > new segment > (how and when this happens is different story) from time to > time to > get rid of deleted documents and to reduce the number of > overall > segments in the index. > Generally a higher number of segments will also influence > you search > performance since Lucene performs almost all operations on > a > per-segment level. If you want to reduce the number of > segment to one > you need to call optimize and lucene will merge all > existing ones into > one single segment. > > hope that answers your question > > simon > > > > I've only heard about them in the last 2 weeks here on > the list. > > Dennis Gearon > > > > Signature Warning > > ---------------- > > EARTH has a Right To Life, > > otherwise we all die. > > > > Read 'Hot, Flat, and Crowded' > > Laugh at http://www.yert.com/film.php > > > > > > --- On Sun, 9/12/10, Jason Rutherglen <[EMAIL PROTECTED]> > wrote: > > > >> From: Jason Rutherglen <[EMAIL PROTECTED]> > >> Subject: Re: Tuning Solr caches with high commit > rates (NRT) > >> To: solr[EMAIL PROTECTED] > >> Date: Sunday, September 12, 2010, 7:52 PM > >> Yeah there's no patch... I think > >> Yonik can write it. :-) Yah... The > >> Lucene version shouldn't matter. The > distributed > >> faceting > >> theoretically can easily be applied to multiple > segments, > >> however the > >> way it's written for me is a challenge to untangle > and > >> apply > >> successfully to a working patch. Also I don't > have > >> this as an itch to > >> scratch at the moment. > >> > >> On Sun, Sep 12, 2010 at 7:18 PM, Peter Sturge > <[EMAIL PROTECTED]> > >> wrote: > >> > Hi Jason, > >> > > >> > I've tried some limited testing with the 4.x > trunk > >> using fcs, and I > >> > must say, I really like the idea of > per-segment > >> faceting. > >> > I was hoping to see it in 3.x, but I don't > see this > >> option in the > >> > branch_3x trunk. Is your SOLR-1606 patch > referred to > >> in SOLR-1617 the > >> > one to use with 3.1? > >> > There seems to be a number of Solr issues > tied to this > >> - one of them > >> > being Lucene-1785. Can the per-segment > faceting patch > >> work with Lucene > >> > 2.9/branch_3x? > >> > > >> > Thanks, > >> > Peter > >> > > >> > > >> > > >> > On Mon, Sep 13, 2010 at 12:05 AM, Jason > Rutherglen > >> > <[EMAIL PROTECTED]> > >> wrote: > >> >> Peter, > >> >> > >> >> Are you using per-segment faceting, eg, > SOLR-1617? > >> That could help > >> >> your situation. > >> >> > >> >> On Sun, Sep 12, 2010 at 12:26 PM, Peter > Sturge > >> <[EMAIL PROTECTED]> > >> wrote: +
Dennis Gearon 2010-09-13, 16:46
-
Re: Tuning Solr caches with high commit rates (NRT)Lance Norskog 2010-09-13, 01:20
Bravo!
Other tricks: here is a policy for deciding when to merge segments that attempts to balance merging with performance. It was contributed by LinkedIn- they also run index&search in the same instance (not Solr, a different Lucene app). lucene/contrib/misc/src/java/org/apache/lucene/index/BalancedSegmentMergePolicy.java The optimize command now includes a partial optimize option, so you can do larger controlled merges. Peter Sturge wrote: > Hi, > > Below are some notes regarding Solr cache tuning that should prove > useful for anyone who uses Solr with frequent commits (e.g.<5min). > > Environment: > Solr 1.4.1 or branch_3x trunk. > Note the 4.x trunk has lots of neat new features, so the notes here > are likely less relevant to the 4.x environment. > > Overview: > Our Solr environment makes extensive use of faceting, we perform > commits every 30secs, and the indexes tend be on the large-ish side > (>20million docs). > Note: For our data, when we commit, we are always adding new data, > never changing existing data. > This type of environment can be tricky to tune, as Solr is more geared > toward fast reads than frequent writes. > > Symptoms: > If anyone has used faceting in searches where you are also performing > frequent commits, you've likely encountered the dreaded OutOfMemory or > GC Overhead Exeeded errors. > In high commit rate environments, this is almost always due to > multiple 'onDeck' searchers and autowarming - i.e. new searchers don't > finish autowarming their caches before the next commit() > comes along and invalidates them. > Once this starts happening on a regular basis, it is likely your > Solr's JVM will run out of memory eventually, as the number of > searchers (and their cache arrays) will keep growing until the JVM > dies of thirst. > To check if your Solr environment is suffering from this, turn on INFO > level logging, and look for: 'PERFORMANCE WARNING: Overlapping > onDeckSearchers=x'. > > In tests, we've only ever seen this problem when using faceting, and > facet.method=fc. > > Some solutions to this are: > Reduce the commit rate to allow searchers to fully warm before the > next commit > Reduce or eliminate the autowarming in caches > Both of the above > > The trouble is, if you're doing NRT commits, you likely have a good > reason for it, and reducing/elimintating autowarming will very > significantly impact search performance in high commit rate > environments. > > Solution: > Here are some setup steps we've used that allow lots of faceting (we > typically search with at least 20-35 different facet fields, and date > faceting/sorting) on large indexes, and still keep decent search > performance: > > 1. Firstly, you should consider using the enum method for facet > searches (facet.method=enum) unless you've got A LOT of memory on your > machine. In our tests, this method uses a lot less memory and > autowarms more quickly than fc. (Note, I've not tried the new > segement-based 'fcs' option, as I can't find support for it in > branch_3x - looks nice for 4.x though) > Admittedly, for our data, enum is not quite as fast for searching as > fc, but short of purchsing a Thaiwanese RAM factory, it's a worthwhile > tradeoff. > If you do have access to LOTS of memory, AND you can guarantee that > the index won't grow beyond the memory capacity (i.e. you have some > sort of deletion policy in place), fc can be a lot faster than enum > when searching with lots of facets across many terms. > > 2. Secondly, we've found that LRUCache is faster at autowarming than > FastLRUCache - in our tests, about 20% faster. Maybe this is just our > environment - your mileage may vary. > > So, our filterCache section in solrconfig.xml looks like this: +
Lance Norskog 2010-09-13, 01:20
-
Re: Tuning Solr caches with high commit rates (NRT)Peter Sturge 2010-09-13, 07:56
The balanced segment merging is a really cool idea. I'll definetely
have a look at this, thanks! One thing I forgot to mention in the original post is we use a mergeFactor of 25. Somewhat on the high side, so that incoming commits aren't trying to merge new data into large segments. 25 is a good balance for us between number of files and search performance. This LinkedIn patch could come in very handy for handling merges. On Mon, Sep 13, 2010 at 2:20 AM, Lance Norskog <[EMAIL PROTECTED]> wrote: > Bravo! > > Other tricks: here is a policy for deciding when to merge segments that > attempts to balance merging with performance. It was contributed by > LinkedIn- they also run index&search in the same instance (not Solr, a > different Lucene app). > > lucene/contrib/misc/src/java/org/apache/lucene/index/BalancedSegmentMergePolicy.java > > The optimize command now includes a partial optimize option, so you can do > larger controlled merges. > > Peter Sturge wrote: >> >> Hi, >> >> Below are some notes regarding Solr cache tuning that should prove >> useful for anyone who uses Solr with frequent commits (e.g.<5min). >> >> Environment: >> Solr 1.4.1 or branch_3x trunk. >> Note the 4.x trunk has lots of neat new features, so the notes here >> are likely less relevant to the 4.x environment. >> >> Overview: >> Our Solr environment makes extensive use of faceting, we perform >> commits every 30secs, and the indexes tend be on the large-ish side >> (>20million docs). >> Note: For our data, when we commit, we are always adding new data, >> never changing existing data. >> This type of environment can be tricky to tune, as Solr is more geared >> toward fast reads than frequent writes. >> >> Symptoms: >> If anyone has used faceting in searches where you are also performing >> frequent commits, you've likely encountered the dreaded OutOfMemory or >> GC Overhead Exeeded errors. >> In high commit rate environments, this is almost always due to >> multiple 'onDeck' searchers and autowarming - i.e. new searchers don't >> finish autowarming their caches before the next commit() >> comes along and invalidates them. >> Once this starts happening on a regular basis, it is likely your >> Solr's JVM will run out of memory eventually, as the number of >> searchers (and their cache arrays) will keep growing until the JVM >> dies of thirst. >> To check if your Solr environment is suffering from this, turn on INFO >> level logging, and look for: 'PERFORMANCE WARNING: Overlapping >> onDeckSearchers=x'. >> >> In tests, we've only ever seen this problem when using faceting, and >> facet.method=fc. >> >> Some solutions to this are: >> Reduce the commit rate to allow searchers to fully warm before the >> next commit >> Reduce or eliminate the autowarming in caches >> Both of the above >> >> The trouble is, if you're doing NRT commits, you likely have a good >> reason for it, and reducing/elimintating autowarming will very >> significantly impact search performance in high commit rate >> environments. >> >> Solution: >> Here are some setup steps we've used that allow lots of faceting (we >> typically search with at least 20-35 different facet fields, and date >> faceting/sorting) on large indexes, and still keep decent search >> performance: >> >> 1. Firstly, you should consider using the enum method for facet >> searches (facet.method=enum) unless you've got A LOT of memory on your >> machine. In our tests, this method uses a lot less memory and >> autowarms more quickly than fc. (Note, I've not tried the new >> segement-based 'fcs' option, as I can't find support for it in >> branch_3x - looks nice for 4.x though) >> Admittedly, for our data, enum is not quite as fast for searching as +
Peter Sturge 2010-09-13, 07:56
-
Re: Tuning Solr caches with high commit rates (NRT)Erick Erickson 2010-09-12, 16:43
Peter:
This kind of information is extremely useful to document, thanks! Do you have the time/energy to put it up on the Wiki? Anyone can edit it by creating a logon. If you don't, would it be OK if someone else did it (with attribution, of course)? I guess that by bringing it up I'm volunteering :)... Best Erick On Sun, Sep 12, 2010 at 12:26 PM, Peter Sturge <[EMAIL PROTECTED]>wrote: > Hi, > > Below are some notes regarding Solr cache tuning that should prove > useful for anyone who uses Solr with frequent commits (e.g. <5min). > > Environment: > Solr 1.4.1 or branch_3x trunk. > Note the 4.x trunk has lots of neat new features, so the notes here > are likely less relevant to the 4.x environment. > > Overview: > Our Solr environment makes extensive use of faceting, we perform > commits every 30secs, and the indexes tend be on the large-ish side > (>20million docs). > Note: For our data, when we commit, we are always adding new data, > never changing existing data. > This type of environment can be tricky to tune, as Solr is more geared > toward fast reads than frequent writes. > > Symptoms: > If anyone has used faceting in searches where you are also performing > frequent commits, you've likely encountered the dreaded OutOfMemory or > GC Overhead Exeeded errors. > In high commit rate environments, this is almost always due to > multiple 'onDeck' searchers and autowarming - i.e. new searchers don't > finish autowarming their caches before the next commit() > comes along and invalidates them. > Once this starts happening on a regular basis, it is likely your > Solr's JVM will run out of memory eventually, as the number of > searchers (and their cache arrays) will keep growing until the JVM > dies of thirst. > To check if your Solr environment is suffering from this, turn on INFO > level logging, and look for: 'PERFORMANCE WARNING: Overlapping > onDeckSearchers=x'. > > In tests, we've only ever seen this problem when using faceting, and > facet.method=fc. > > Some solutions to this are: > Reduce the commit rate to allow searchers to fully warm before the > next commit > Reduce or eliminate the autowarming in caches > Both of the above > > The trouble is, if you're doing NRT commits, you likely have a good > reason for it, and reducing/elimintating autowarming will very > significantly impact search performance in high commit rate > environments. > > Solution: > Here are some setup steps we've used that allow lots of faceting (we > typically search with at least 20-35 different facet fields, and date > faceting/sorting) on large indexes, and still keep decent search > performance: > > 1. Firstly, you should consider using the enum method for facet > searches (facet.method=enum) unless you've got A LOT of memory on your > machine. In our tests, this method uses a lot less memory and > autowarms more quickly than fc. (Note, I've not tried the new > segement-based 'fcs' option, as I can't find support for it in > branch_3x - looks nice for 4.x though) > Admittedly, for our data, enum is not quite as fast for searching as > fc, but short of purchsing a Thaiwanese RAM factory, it's a worthwhile > tradeoff. > If you do have access to LOTS of memory, AND you can guarantee that > the index won't grow beyond the memory capacity (i.e. you have some > sort of deletion policy in place), fc can be a lot faster than enum > when searching with lots of facets across many terms. > > 2. Secondly, we've found that LRUCache is faster at autowarming than > FastLRUCache - in our tests, about 20% faster. Maybe this is just our > environment - your mileage may vary. > > So, our filterCache section in solrconfig.xml looks like this: > <filterCache > class="solr.LRUCache" +
Erick Erickson 2010-09-12, 16:43
-
Re: Tuning Solr caches with high commit rates (NRT)Peter Sturge 2010-09-13, 08:19
Hi Erik,
I thought this would be good for the wiki, but I've not submitted to the wiki before, so I thought I'd put this info out there first, then add it if it was deemed useful. If you could let me know the procedure for submitting, it probably would be worth getting it into the wiki (couldn't do it straightaway, as I have a lot of projects on at the moment). If you're able/willing to put it on there for me, that would be very kind of you! Thanks! Peter On Sun, Sep 12, 2010 at 5:43 PM, Erick Erickson <[EMAIL PROTECTED]> wrote: > Peter: > > This kind of information is extremely useful to document, thanks! Do you > have the time/energy to put it up on the Wiki? Anyone can edit it by > creating > a logon. If you don't, would it be OK if someone else did it (with > attribution, > of course)? I guess that by bringing it up I'm volunteering :)... > > Best > Erick > > On Sun, Sep 12, 2010 at 12:26 PM, Peter Sturge <[EMAIL PROTECTED]>wrote: > >> Hi, >> >> Below are some notes regarding Solr cache tuning that should prove >> useful for anyone who uses Solr with frequent commits (e.g. <5min). >> >> Environment: >> Solr 1.4.1 or branch_3x trunk. >> Note the 4.x trunk has lots of neat new features, so the notes here >> are likely less relevant to the 4.x environment. >> >> Overview: >> Our Solr environment makes extensive use of faceting, we perform >> commits every 30secs, and the indexes tend be on the large-ish side >> (>20million docs). >> Note: For our data, when we commit, we are always adding new data, >> never changing existing data. >> This type of environment can be tricky to tune, as Solr is more geared >> toward fast reads than frequent writes. >> >> Symptoms: >> If anyone has used faceting in searches where you are also performing >> frequent commits, you've likely encountered the dreaded OutOfMemory or >> GC Overhead Exeeded errors. >> In high commit rate environments, this is almost always due to >> multiple 'onDeck' searchers and autowarming - i.e. new searchers don't >> finish autowarming their caches before the next commit() >> comes along and invalidates them. >> Once this starts happening on a regular basis, it is likely your >> Solr's JVM will run out of memory eventually, as the number of >> searchers (and their cache arrays) will keep growing until the JVM >> dies of thirst. >> To check if your Solr environment is suffering from this, turn on INFO >> level logging, and look for: 'PERFORMANCE WARNING: Overlapping >> onDeckSearchers=x'. >> >> In tests, we've only ever seen this problem when using faceting, and >> facet.method=fc. >> >> Some solutions to this are: >> Reduce the commit rate to allow searchers to fully warm before the >> next commit >> Reduce or eliminate the autowarming in caches >> Both of the above >> >> The trouble is, if you're doing NRT commits, you likely have a good >> reason for it, and reducing/elimintating autowarming will very >> significantly impact search performance in high commit rate >> environments. >> >> Solution: >> Here are some setup steps we've used that allow lots of faceting (we >> typically search with at least 20-35 different facet fields, and date >> faceting/sorting) on large indexes, and still keep decent search >> performance: >> >> 1. Firstly, you should consider using the enum method for facet >> searches (facet.method=enum) unless you've got A LOT of memory on your >> machine. In our tests, this method uses a lot less memory and >> autowarms more quickly than fc. (Note, I've not tried the new >> segement-based 'fcs' option, as I can't find support for it in >> branch_3x - looks nice for 4.x though) >> Admittedly, for our data, enum is not quite as fast for searching as >> fc, but short of purchsing a Thaiwanese RAM factory, it's a worthwhile +
Peter Sturge 2010-09-13, 08:19
-
Re: Tuning Solr caches with high commit rates (NRT)Peter Karich 2010-11-15, 20:31
Just in case someone is interested:
I put the emails of Peter Sturge with some minor edits in the wiki: http://wiki.apache.org/solr/NearRealtimeSearchTuning I found myself search the thread again and again ;-) Feel free to add and edit content! Regards, Peter. > Hi Erik, > > I thought this would be good for the wiki, but I've not submitted to > the wiki before, so I thought I'd put this info out there first, then > add it if it was deemed useful. > If you could let me know the procedure for submitting, it probably > would be worth getting it into the wiki (couldn't do it straightaway, > as I have a lot of projects on at the moment). If you're able/willing > to put it on there for me, that would be very kind of you! > > Thanks! > Peter > > > On Sun, Sep 12, 2010 at 5:43 PM, Erick Erickson<[EMAIL PROTECTED]> wrote: >> Peter: >> >> This kind of information is extremely useful to document, thanks! Do you >> have the time/energy to put it up on the Wiki? Anyone can edit it by >> creating >> a logon. If you don't, would it be OK if someone else did it (with >> attribution, >> of course)? I guess that by bringing it up I'm volunteering :)... >> >> Best >> Erick >> >> On Sun, Sep 12, 2010 at 12:26 PM, Peter Sturge<[EMAIL PROTECTED]>wrote: >> >>> Hi, >>> >>> Below are some notes regarding Solr cache tuning that should prove >>> useful for anyone who uses Solr with frequent commits (e.g.<5min). >>> >>> Environment: >>> Solr 1.4.1 or branch_3x trunk. >>> Note the 4.x trunk has lots of neat new features, so the notes here >>> are likely less relevant to the 4.x environment. >>> >>> Overview: >>> Our Solr environment makes extensive use of faceting, we perform >>> commits every 30secs, and the indexes tend be on the large-ish side >>> (>20million docs). >>> Note: For our data, when we commit, we are always adding new data, >>> never changing existing data. >>> This type of environment can be tricky to tune, as Solr is more geared >>> toward fast reads than frequent writes. >>> >>> Symptoms: >>> If anyone has used faceting in searches where you are also performing >>> frequent commits, you've likely encountered the dreaded OutOfMemory or >>> GC Overhead Exeeded errors. >>> In high commit rate environments, this is almost always due to >>> multiple 'onDeck' searchers and autowarming - i.e. new searchers don't >>> finish autowarming their caches before the next commit() >>> comes along and invalidates them. >>> Once this starts happening on a regular basis, it is likely your >>> Solr's JVM will run out of memory eventually, as the number of >>> searchers (and their cache arrays) will keep growing until the JVM >>> dies of thirst. >>> To check if your Solr environment is suffering from this, turn on INFO >>> level logging, and look for: 'PERFORMANCE WARNING: Overlapping >>> onDeckSearchers=x'. >>> >>> In tests, we've only ever seen this problem when using faceting, and >>> facet.method=fc. >>> >>> Some solutions to this are: >>> Reduce the commit rate to allow searchers to fully warm before the >>> next commit >>> Reduce or eliminate the autowarming in caches >>> Both of the above >>> >>> The trouble is, if you're doing NRT commits, you likely have a good >>> reason for it, and reducing/elimintating autowarming will very >>> significantly impact search performance in high commit rate >>> environments. >>> >>> Solution: >>> Here are some setup steps we've used that allow lots of faceting (we >>> typically search with at least 20-35 different facet fields, and date >>> faceting/sorting) on large indexes, and still keep decent search >>> performance: >>> >>> 1. Firstly, you should consider using the enum method for facet >>> searches (facet.method=enum) unless you've got A LOT of memory on your http://jetwick.com twitter search prototype +
Peter Karich 2010-11-15, 20:31
-
Re: Tuning Solr caches with high commit rates (NRT)Jonathan Rochkind 2010-11-15, 21:24
Awesome. I'm not sure his point 1 about facet.method=enum is still valid
in Solr 1.4+. The "fc" facet.method was changed significantly in 1.4, and generally no longer takes a lot of memory -- for facets with "many" unique values, method fc in fact should take less than enum, I think? Peter Karich wrote: > Just in case someone is interested: > > I put the emails of Peter Sturge with some minor edits in the wiki: > > http://wiki.apache.org/solr/NearRealtimeSearchTuning > > I found myself search the thread again and again ;-) > > Feel free to add and edit content! > > Regards, > Peter. > > >> Hi Erik, >> >> I thought this would be good for the wiki, but I've not submitted to >> the wiki before, so I thought I'd put this info out there first, then >> add it if it was deemed useful. >> If you could let me know the procedure for submitting, it probably >> would be worth getting it into the wiki (couldn't do it straightaway, >> as I have a lot of projects on at the moment). If you're able/willing >> to put it on there for me, that would be very kind of you! >> >> Thanks! >> Peter >> >> >> On Sun, Sep 12, 2010 at 5:43 PM, Erick Erickson<[EMAIL PROTECTED]> wrote: >> >>> Peter: >>> >>> This kind of information is extremely useful to document, thanks! Do you >>> have the time/energy to put it up on the Wiki? Anyone can edit it by >>> creating >>> a logon. If you don't, would it be OK if someone else did it (with >>> attribution, >>> of course)? I guess that by bringing it up I'm volunteering :)... >>> >>> Best >>> Erick >>> >>> On Sun, Sep 12, 2010 at 12:26 PM, Peter Sturge<[EMAIL PROTECTED]>wrote: >>> >>> >>>> Hi, >>>> >>>> Below are some notes regarding Solr cache tuning that should prove >>>> useful for anyone who uses Solr with frequent commits (e.g.<5min). >>>> >>>> Environment: >>>> Solr 1.4.1 or branch_3x trunk. >>>> Note the 4.x trunk has lots of neat new features, so the notes here >>>> are likely less relevant to the 4.x environment. >>>> >>>> Overview: >>>> Our Solr environment makes extensive use of faceting, we perform >>>> commits every 30secs, and the indexes tend be on the large-ish side >>>> (>20million docs). >>>> Note: For our data, when we commit, we are always adding new data, >>>> never changing existing data. >>>> This type of environment can be tricky to tune, as Solr is more geared >>>> toward fast reads than frequent writes. >>>> >>>> Symptoms: >>>> If anyone has used faceting in searches where you are also performing >>>> frequent commits, you've likely encountered the dreaded OutOfMemory or >>>> GC Overhead Exeeded errors. >>>> In high commit rate environments, this is almost always due to >>>> multiple 'onDeck' searchers and autowarming - i.e. new searchers don't >>>> finish autowarming their caches before the next commit() >>>> comes along and invalidates them. >>>> Once this starts happening on a regular basis, it is likely your >>>> Solr's JVM will run out of memory eventually, as the number of >>>> searchers (and their cache arrays) will keep growing until the JVM >>>> dies of thirst. >>>> To check if your Solr environment is suffering from this, turn on INFO >>>> level logging, and look for: 'PERFORMANCE WARNING: Overlapping >>>> onDeckSearchers=x'. >>>> >>>> In tests, we've only ever seen this problem when using faceting, and >>>> facet.method=fc. >>>> >>>> Some solutions to this are: >>>> Reduce the commit rate to allow searchers to fully warm before the >>>> next commit >>>> Reduce or eliminate the autowarming in caches >>>> Both of the above >>>> >>>> The trouble is, if you're doing NRT commits, you likely have a good >>>> reason for it, and reducing/elimintating autowarming will very >>>> significantly impact search performance in high commit rate +
Jonathan Rochkind 2010-11-15, 21:24
-
Re: Tuning Solr caches with high commit rates (NRT)Peter Karich 2010-11-15, 21:37
Hi Jonathan,
I am too using fc because it simply was faster. Not sure if this can be applied in general. I will add this info to the wiki. Regards, Peter. > Awesome. I'm not sure his point 1 about facet.method=enum is still > valid in Solr 1.4+. The "fc" facet.method was changed significantly > in 1.4, and generally no longer takes a lot of memory -- for facets > with "many" unique values, method fc in fact should take less than > enum, I think? > > Peter Karich wrote: >> Just in case someone is interested: >> >> I put the emails of Peter Sturge with some minor edits in the wiki: >> >> http://wiki.apache.org/solr/NearRealtimeSearchTuning >> >> I found myself search the thread again and again ;-) >> >> Feel free to add and edit content! >> >> Regards, >> Peter. >> >>> Hi Erik, >>> >>> I thought this would be good for the wiki, but I've not submitted to >>> the wiki before, so I thought I'd put this info out there first, then >>> add it if it was deemed useful. >>> If you could let me know the procedure for submitting, it probably >>> would be worth getting it into the wiki (couldn't do it straightaway, >>> as I have a lot of projects on at the moment). If you're able/willing >>> to put it on there for me, that would be very kind of you! >>> >>> Thanks! >>> Peter >>> >>> >>> On Sun, Sep 12, 2010 at 5:43 PM, Erick >>> Erickson<[EMAIL PROTECTED]> wrote: >>>> Peter: >>>> >>>> This kind of information is extremely useful to document, thanks! >>>> Do you >>>> have the time/energy to put it up on the Wiki? Anyone can edit it by >>>> creating >>>> a logon. If you don't, would it be OK if someone else did it (with >>>> attribution, >>>> of course)? I guess that by bringing it up I'm volunteering :)... >>>> >>>> Best >>>> Erick >>>> >>>> On Sun, Sep 12, 2010 at 12:26 PM, Peter >>>> Sturge<[EMAIL PROTECTED]>wrote: >>>> >>>>> Hi, >>>>> >>>>> Below are some notes regarding Solr cache tuning that should prove >>>>> useful for anyone who uses Solr with frequent commits (e.g.<5min). >>>>> >>>>> Environment: >>>>> Solr 1.4.1 or branch_3x trunk. >>>>> Note the 4.x trunk has lots of neat new features, so the notes here >>>>> are likely less relevant to the 4.x environment. >>>>> >>>>> Overview: >>>>> Our Solr environment makes extensive use of faceting, we perform >>>>> commits every 30secs, and the indexes tend be on the large-ish side >>>>> (>20million docs). >>>>> Note: For our data, when we commit, we are always adding new data, >>>>> never changing existing data. >>>>> This type of environment can be tricky to tune, as Solr is more >>>>> geared >>>>> toward fast reads than frequent writes. >>>>> >>>>> Symptoms: >>>>> If anyone has used faceting in searches where you are also performing >>>>> frequent commits, you've likely encountered the dreaded >>>>> OutOfMemory or >>>>> GC Overhead Exeeded errors. >>>>> In high commit rate environments, this is almost always due to >>>>> multiple 'onDeck' searchers and autowarming - i.e. new searchers >>>>> don't >>>>> finish autowarming their caches before the next commit() >>>>> comes along and invalidates them. >>>>> Once this starts happening on a regular basis, it is likely your >>>>> Solr's JVM will run out of memory eventually, as the number of >>>>> searchers (and their cache arrays) will keep growing until the JVM >>>>> dies of thirst. >>>>> To check if your Solr environment is suffering from this, turn on >>>>> INFO >>>>> level logging, and look for: 'PERFORMANCE WARNING: Overlapping >>>>> onDeckSearchers=x'. >>>>> >>>>> In tests, we've only ever seen this problem when using faceting, and >>>>> facet.method=fc. >>>>> >>>>> Some solutions to this are: >>>>> Reduce the commit rate to allow searchers to fully warm before >>>>> the >>>>> next commit http://jetwick.com twitter search prototype +
Peter Karich 2010-11-15, 21:37
-
Re: Tuning Solr caches with high commit rates (NRT)Dennis Gearon 2010-11-15, 21:43
fc='field collapsing'?
Dennis Gearon Signature Warning ---------------- It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. ----- Original Message ---- From: Peter Karich <[EMAIL PROTECTED]> To: solr[EMAIL PROTECTED] Sent: Mon, November 15, 2010 1:37:00 PM Subject: Re: Tuning Solr caches with high commit rates (NRT) Hi Jonathan, I am too using fc because it simply was faster. Not sure if this can be applied in general. I will add this info to the wiki. Regards, Peter. > Awesome. I'm not sure his point 1 about facet.method=enum is still valid in >Solr 1.4+. The "fc" facet.method was changed significantly in 1.4, and >generally no longer takes a lot of memory -- for facets with "many" unique >values, method fc in fact should take less than enum, I think? > > Peter Karich wrote: >> Just in case someone is interested: >> >> I put the emails of Peter Sturge with some minor edits in the wiki: >> >> http://wiki.apache.org/solr/NearRealtimeSearchTuning >> >> I found myself search the thread again and again ;-) >> >> Feel free to add and edit content! >> >> Regards, >> Peter. >> >>> Hi Erik, >>> >>> I thought this would be good for the wiki, but I've not submitted to >>> the wiki before, so I thought I'd put this info out there first, then >>> add it if it was deemed useful. >>> If you could let me know the procedure for submitting, it probably >>> would be worth getting it into the wiki (couldn't do it straightaway, >>> as I have a lot of projects on at the moment). If you're able/willing >>> to put it on there for me, that would be very kind of you! >>> >>> Thanks! >>> Peter >>> >>> >>> On Sun, Sep 12, 2010 at 5:43 PM, Erick Erickson<[EMAIL PROTECTED]> >>>wrote: >>>> Peter: >>>> >>>> This kind of information is extremely useful to document, thanks! Do you >>>> have the time/energy to put it up on the Wiki? Anyone can edit it by >>>> creating >>>> a logon. If you don't, would it be OK if someone else did it (with >>>> attribution, >>>> of course)? I guess that by bringing it up I'm volunteering :)... >>>> >>>> Best >>>> Erick >>>> >>>> On Sun, Sep 12, 2010 at 12:26 PM, Peter Sturge<[EMAIL PROTECTED]>wrote: >>>> >>>>> Hi, >>>>> >>>>> Below are some notes regarding Solr cache tuning that should prove >>>>> useful for anyone who uses Solr with frequent commits (e.g.<5min). >>>>> >>>>> Environment: >>>>> Solr 1.4.1 or branch_3x trunk. >>>>> Note the 4.x trunk has lots of neat new features, so the notes here >>>>> are likely less relevant to the 4.x environment. >>>>> >>>>> Overview: >>>>> Our Solr environment makes extensive use of faceting, we perform >>>>> commits every 30secs, and the indexes tend be on the large-ish side >>>>> (>20million docs). >>>>> Note: For our data, when we commit, we are always adding new data, >>>>> never changing existing data. >>>>> This type of environment can be tricky to tune, as Solr is more geared >>>>> toward fast reads than frequent writes. >>>>> >>>>> Symptoms: >>>>> If anyone has used faceting in searches where you are also performing >>>>> frequent commits, you've likely encountered the dreaded OutOfMemory or >>>>> GC Overhead Exeeded errors. >>>>> In high commit rate environments, this is almost always due to >>>>> multiple 'onDeck' searchers and autowarming - i.e. new searchers don't >>>>> finish autowarming their caches before the next commit() >>>>> comes along and invalidates them. >>>>> Once this starts happening on a regular basis, it is likely your >>>>> Solr's JVM will run out of memory eventually, as the number of +
Dennis Gearon 2010-11-15, 21:43
-
Re: Tuning Solr caches with high commit rates (NRT)Peter Karich 2010-11-15, 22:19
I think it stands for field cache (according to
http://wiki.apache.org/solr/SimpleFacetParameters this could be true ;-)) > fc='field collapsing'? > > Dennis Gearon > > > Signature Warning > ---------------- > It is always a good idea to learn from your own mistakes. It is usually a better > idea to learn from others’ mistakes, so you do not have to make them yourself. > from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > > EARTH has a Right To Life, > otherwise we all die. > > > > ----- Original Message ---- > From: Peter Karich<[EMAIL PROTECTED]> > To: solr[EMAIL PROTECTED] > Sent: Mon, November 15, 2010 1:37:00 PM > Subject: Re: Tuning Solr caches with high commit rates (NRT) > > Hi Jonathan, > > I am too using fc because it simply was faster. Not sure if this can be applied > in general. > I will add this info to the wiki. > > Regards, > Peter. > >> Awesome. I'm not sure his point 1 about facet.method=enum is still valid in >> Solr 1.4+. The "fc" facet.method was changed significantly in 1.4, and >> generally no longer takes a lot of memory -- for facets with "many" unique >> values, method fc in fact should take less than enum, I think? >> >> Peter Karich wrote: >>> Just in case someone is interested: >>> >>> I put the emails of Peter Sturge with some minor edits in the wiki: >>> >>> http://wiki.apache.org/solr/NearRealtimeSearchTuning >>> >>> I found myself search the thread again and again ;-) >>> >>> Feel free to add and edit content! >>> >>> Regards, >>> Peter. >>> >>>> Hi Erik, >>>> >>>> I thought this would be good for the wiki, but I've not submitted to >>>> the wiki before, so I thought I'd put this info out there first, then >>>> add it if it was deemed useful. >>>> If you could let me know the procedure for submitting, it probably >>>> would be worth getting it into the wiki (couldn't do it straightaway, >>>> as I have a lot of projects on at the moment). If you're able/willing >>>> to put it on there for me, that would be very kind of you! >>>> >>>> Thanks! >>>> Peter >>>> >>>> >>>> On Sun, Sep 12, 2010 at 5:43 PM, Erick Erickson<[EMAIL PROTECTED]> >>>> wrote: >>>>> Peter: >>>>> >>>>> This kind of information is extremely useful to document, thanks! Do you >>>>> have the time/energy to put it up on the Wiki? Anyone can edit it by >>>>> creating >>>>> a logon. If you don't, would it be OK if someone else did it (with >>>>> attribution, >>>>> of course)? I guess that by bringing it up I'm volunteering :)... >>>>> >>>>> Best >>>>> Erick >>>>> >>>>> On Sun, Sep 12, 2010 at 12:26 PM, Peter > Sturge<[EMAIL PROTECTED]>wrote: >>>>>> Hi, >>>>>> >>>>>> Below are some notes regarding Solr cache tuning that should prove >>>>>> useful for anyone who uses Solr with frequent commits (e.g.<5min). >>>>>> >>>>>> Environment: >>>>>> Solr 1.4.1 or branch_3x trunk. >>>>>> Note the 4.x trunk has lots of neat new features, so the notes here >>>>>> are likely less relevant to the 4.x environment. >>>>>> >>>>>> Overview: >>>>>> Our Solr environment makes extensive use of faceting, we perform >>>>>> commits every 30secs, and the indexes tend be on the large-ish side >>>>>> (>20million docs). >>>>>> Note: For our data, when we commit, we are always adding new data, >>>>>> never changing existing data. >>>>>> This type of environment can be tricky to tune, as Solr is more geared >>>>>> toward fast reads than frequent writes. >>>>>> >>>>>> Symptoms: >>>>>> If anyone has used faceting in searches where you are also performing >>>>>> frequent commits, you've likely encountered the dreaded OutOfMemory or >>>>>> GC Overhead Exeeded errors. >>>>>> In high commit rate environments, this is almost always due to >>>>>> multiple 'onDeck' searchers and autowarming - i.e. new searchers don't http://jetwick.com twitter search prototype +
Peter Karich 2010-11-15, 22:19
-
Re: Tuning Solr caches with high commit rates (NRT)Koji Sekiguchi 2010-11-15, 22:29
(10/11/16 6:43), Dennis Gearon wrote:
> fc='field collapsing'? fc of facet.method=fc stands for Lucene's FieldCache. enum of facet.method=enum stands for Lucene's TermEnum. Usually, you do not need to set facet.method because Solr automatically uses most appropriate facet method for each field type: boolean: TermEnum multiValued/tokenized: UnInvertedField other than those above: FieldCache If you prefer Solr to use TermEnum, you can set facet.method=enum. If you prefer Solr to use FieldCache, you can set facet.method=fc (but Solr uses UnInvertedField for multiValued/tokenized fields). Koji -- http://www.rondhuit.com/en/ +
Koji Sekiguchi 2010-11-15, 22:29
-
Re: Tuning Solr caches with high commit rates (NRT)Jonathan Rochkind 2010-11-15, 23:36
Koji Sekiguchi wrote:
> > Usually, you do not need to set facet.method because Solr > automatically uses most appropriate facet method for > each field type: > > boolean: TermEnum > multiValued/tokenized: UnInvertedField > other than those above: FieldCache > As I understand it, in Solr 1.4, (and I may NOT understand it, it is confusing), it would be more clear to say that Solr will by default use facet.method=enum for boolean, and facet.method=fc for everything else. facet.method=fc is a strategy that will use one of several different actual methods depending on field qualities. Looking at the code it appears to be _more_ than two branches to me of choices, not sure there are only two methods and it only depends on whether the field is multiValued/tokenized. But if Koji knows and is sure, maybe he knows more than me. It has been suggested on the list and in the wiki, that in Solr 1.4+, if you have a facet field with "few" unique values (I have not seen better guidance for exactly what qualifies as 'few', with regard to total number of documents), it may be profitable to use facet.method=enum. In Solr 1.4, facet.method=enum DOES work on multi-valued fields, I'm pretty certain. I think Koji is wrong about that, if I understand Koji right to say that you can't use facet.method=enum with multi-valued fields, this is not in fact true. I think it is somewhat more complicated in 1.4+ than Koji suggests, although I don't understand it well enough to explain it completely. I think Koji's explanation is based on before Solr 1.4 made improvements to the faceting algorithms. +
Jonathan Rochkind 2010-11-15, 23:36
-
Re: Tuning Solr caches with high commit rates (NRT)Koji Sekiguchi 2010-11-15, 23:56
(10/11/16 8:36), Jonathan Rochkind wrote:
> In Solr 1.4, facet.method=enum DOES work on multi-valued fields, I'm pretty certain. Correct, and I didn't say that facet.method=enum doesn't work for multiValued/tokenized field in my previous mail. > I think Koji's explanation is based on before Solr 1.4 No, as facet.method had been introduced in 1.4. Koji -- http://www.rondhuit.com/en/ +
Koji Sekiguchi 2010-11-15, 23:56
-
Re: Tuning Solr caches with high commit rates (NRT)Peter Sturge 2010-11-16, 09:40
Many thanks, Peter K. for posting up on the wiki - great!
Yes, fc = field cache. Field Collapsing is something very nice indeed, but is entirely different. As Erik mentions in the wiki post, using per-segment faceting can be a huge boon to performance. It does require the latest Solr trunk build and new Lucene, though (last time I checked, this isn't in the Solr 3x branch). enum vs fc? This will depend a lot on what your data looks like - e.g. lots of unique terms vs lots of the same terms. In all the tests we've done here with >20m doc indexes (using 3x branch), enum has always used less memory than fc (sometimes much less), but fc is faster for searches. Again, your data experience may vary. The main point in this thread for NRT and faceting is to warm caches as quickly as possible - this generally means judicious facet selection, and for us at least, using LRUCache a.o.t. FastLRUCache for filter caches. On Mon, Nov 15, 2010 at 11:56 PM, Koji Sekiguchi <[EMAIL PROTECTED]> wrote: > (10/11/16 8:36), Jonathan Rochkind wrote: >> >> In Solr 1.4, facet.method=enum DOES work on multi-valued fields, I'm >> pretty certain. > > Correct, and I didn't say that facet.method=enum doesn't work for > multiValued/tokenized field in my previous mail. > >> I think Koji's explanation is based on before Solr 1.4 > > No, as facet.method had been introduced in 1.4. > > Koji > -- > http://www.rondhuit.com/en/ > +
Peter Sturge 2010-11-16, 09:40
-
Re: Tuning Solr caches with high commit rates (NRT)stockii 2010-12-02, 12:51
great thread and exactly my problems :D i set up two solr-instances, one for update the index and another for searching. When i perform an update. the search-instance dont get the new documents. when i start a commit on searcher he found it. how can i say the searcher that he alwas look not only the "old" index. automatic refresh ? XD -- View this message in context: http://lucene.472066.n3.nabble.com/Tuning-Solr-caches-with-high-commit-rates-NRT-tp1461275p2005738.html Sent from the Solr - User mailing list archive at Nabble.com. +
stockii 2010-12-02, 12:51
-
Re: Tuning Solr caches with high commit rates (NRT)Peter Sturge 2010-12-02, 13:28
In order for the 'read-only' instance to see any new/updated
documents, it needs to do a commit (since it's read-only, it is a commit of 0 documents). You can do this via a client service that issues periodic commits, or use autorefresh from within solrconfig.xml. Be careful that you don't do anything in the read-only instance that will change the underlying index - like optimize. Peter On Thu, Dec 2, 2010 at 12:51 PM, stockii <[EMAIL PROTECTED]> wrote: > > great thread and exactly my problems :D > > i set up two solr-instances, one for update the index and another for > searching. > > When i perform an update. the search-instance dont get the new documents. > when i start a commit on searcher he found it. how can i say the searcher > that he alwas look not only the "old" index. automatic refresh ? XD > -- > View this message in context: http://lucene.472066.n3.nabble.com/Tuning-Solr-caches-with-high-commit-rates-NRT-tp1461275p2005738.html > Sent from the Solr - User mailing list archive at Nabble.com. > +
Peter Sturge 2010-12-02, 13:28
-
Re: Tuning Solr caches with high commit rates (NRT)Jonathan Rochkind 2010-11-15, 21:46
Don't know, don't care. It may have began standing for that, I dont'
know. It's now more of a 'strategy' than a method, it uses different algorithms depending on the nature of your facets, including whether they are multi-term or not. I don't entirely understand it. I've looked at the source a bit. Dennis Gearon wrote: > fc='field collapsing'? > > Dennis Gearon > > > Signature Warning > ---------------- > It is always a good idea to learn from your own mistakes. It is usually a better > idea to learn from others’ mistakes, so you do not have to make them yourself. > from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > > EARTH has a Right To Life, > otherwise we all die. > > > > ----- Original Message ---- > From: Peter Karich <[EMAIL PROTECTED]> > To: solr[EMAIL PROTECTED] > Sent: Mon, November 15, 2010 1:37:00 PM > Subject: Re: Tuning Solr caches with high commit rates (NRT) > > Hi Jonathan, > > I am too using fc because it simply was faster. Not sure if this can be applied > in general. > I will add this info to the wiki. > > Regards, > Peter. > > >> Awesome. I'm not sure his point 1 about facet.method=enum is still valid in >> Solr 1.4+. The "fc" facet.method was changed significantly in 1.4, and >> generally no longer takes a lot of memory -- for facets with "many" unique >> values, method fc in fact should take less than enum, I think? >> >> Peter Karich wrote: >> >>> Just in case someone is interested: >>> >>> I put the emails of Peter Sturge with some minor edits in the wiki: >>> >>> http://wiki.apache.org/solr/NearRealtimeSearchTuning >>> >>> I found myself search the thread again and again ;-) >>> >>> Feel free to add and edit content! >>> >>> Regards, >>> Peter. >>> >>> >>>> Hi Erik, >>>> >>>> I thought this would be good for the wiki, but I've not submitted to >>>> the wiki before, so I thought I'd put this info out there first, then >>>> add it if it was deemed useful. >>>> If you could let me know the procedure for submitting, it probably >>>> would be worth getting it into the wiki (couldn't do it straightaway, >>>> as I have a lot of projects on at the moment). If you're able/willing >>>> to put it on there for me, that would be very kind of you! >>>> >>>> Thanks! >>>> Peter >>>> >>>> >>>> On Sun, Sep 12, 2010 at 5:43 PM, Erick Erickson<[EMAIL PROTECTED]> >>>> wrote: >>>> >>>>> Peter: >>>>> >>>>> This kind of information is extremely useful to document, thanks! Do you >>>>> have the time/energy to put it up on the Wiki? Anyone can edit it by >>>>> creating >>>>> a logon. If you don't, would it be OK if someone else did it (with >>>>> attribution, >>>>> of course)? I guess that by bringing it up I'm volunteering :)... >>>>> >>>>> Best >>>>> Erick >>>>> >>>>> On Sun, Sep 12, 2010 at 12:26 PM, Peter >>>>> > Sturge<[EMAIL PROTECTED]>wrote: > >>>>>> Hi, >>>>>> >>>>>> Below are some notes regarding Solr cache tuning that should prove >>>>>> useful for anyone who uses Solr with frequent commits (e.g.<5min). >>>>>> >>>>>> Environment: >>>>>> Solr 1.4.1 or branch_3x trunk. >>>>>> Note the 4.x trunk has lots of neat new features, so the notes here >>>>>> are likely less relevant to the 4.x environment. >>>>>> >>>>>> Overview: >>>>>> Our Solr environment makes extensive use of faceting, we perform >>>>>> commits every 30secs, and the indexes tend be on the large-ish side >>>>>> (>20million docs). >>>>>> Note: For our data, when we commit, we are always adding new data, >>>>>> never changing existing data. >>>>>> This type of environment can be tricky to tune, as Solr is more geared >>>>>> toward fast reads than frequent writes. >>>>>> >>>>>> Symptoms: >>>>>> If anyone has used faceting in searches where you are also performing >>>>>> frequent <em +
Jonathan Rochkind 2010-11-15, 21:46
-
Re: Tuning Solr caches with high commit rates (NRT)Peter Karich 2010-09-12, 19:46
Peter,
thanks a lot for your in-depth explanations! Your findings will be definitely helpful for my next performance improvement tests :-) Two questions: 1. How would I do that: > or a local read-only instance that reads the same core as the indexing > instance (for the latter, you'll need something that periodically refreshes - i.e. runs commit()). 2. Did you try sharding with your current setup (e.g. one big, nearly-static index and a tiny write+read index)? Regards, Peter. > Hi, > > Below are some notes regarding Solr cache tuning that should prove > useful for anyone who uses Solr with frequent commits (e.g. <5min). > > Environment: > Solr 1.4.1 or branch_3x trunk. > Note the 4.x trunk has lots of neat new features, so the notes here > are likely less relevant to the 4.x environment. > > Overview: > Our Solr environment makes extensive use of faceting, we perform > commits every 30secs, and the indexes tend be on the large-ish side > (>20million docs). > Note: For our data, when we commit, we are always adding new data, > never changing existing data. > This type of environment can be tricky to tune, as Solr is more geared > toward fast reads than frequent writes. > > Symptoms: > If anyone has used faceting in searches where you are also performing > frequent commits, you've likely encountered the dreaded OutOfMemory or > GC Overhead Exeeded errors. > In high commit rate environments, this is almost always due to > multiple 'onDeck' searchers and autowarming - i.e. new searchers don't > finish autowarming their caches before the next commit() > comes along and invalidates them. > Once this starts happening on a regular basis, it is likely your > Solr's JVM will run out of memory eventually, as the number of > searchers (and their cache arrays) will keep growing until the JVM > dies of thirst. > To check if your Solr environment is suffering from this, turn on INFO > level logging, and look for: 'PERFORMANCE WARNING: Overlapping > onDeckSearchers=x'. > > In tests, we've only ever seen this problem when using faceting, and > facet.method=fc. > > Some solutions to this are: > Reduce the commit rate to allow searchers to fully warm before the > next commit > Reduce or eliminate the autowarming in caches > Both of the above > > The trouble is, if you're doing NRT commits, you likely have a good > reason for it, and reducing/elimintating autowarming will very > significantly impact search performance in high commit rate > environments. > > Solution: > Here are some setup steps we've used that allow lots of faceting (we > typically search with at least 20-35 different facet fields, and date > faceting/sorting) on large indexes, and still keep decent search > performance: > > 1. Firstly, you should consider using the enum method for facet > searches (facet.method=enum) unless you've got A LOT of memory on your > machine. In our tests, this method uses a lot less memory and > autowarms more quickly than fc. (Note, I've not tried the new > segement-based 'fcs' option, as I can't find support for it in > branch_3x - looks nice for 4.x though) > Admittedly, for our data, enum is not quite as fast for searching as > fc, but short of purchsing a Thaiwanese RAM factory, it's a worthwhile > tradeoff. > If you do have access to LOTS of memory, AND you can guarantee that > the index won't grow beyond the memory capacity (i.e. you have some > sort of deletion policy in place), fc can be a lot faster than enum > when searching with lots of facets across many terms. > > 2. Secondly, we've found that LRUCache is faster at autowarming than > FastLRUCache - in our tests, about 20% faster. Maybe this is just our > environment - your mileage may vary. > > So, our filterCache section in solrconfig.xml looks like this: +
Peter Karich 2010-09-12, 19:46
-
Re: Tuning Solr caches with high commit rates (NRT)Peter Sturge 2010-09-13, 08:09
1. You can run multiple Solr instances in separate JVMs, with both
having their solr.xml configured to use the same index folder. You need to be careful that one and only one of these instances will ever update the index at a time. The best way to ensure this is to use one for writing only, and the other is read-only and never writes to the index. This read-only instance is the one to use for tuning for high search performance. Even though the RO instance doesn't write to the index, it still needs periodic (albeit empty) commits to kick off autowarming/cache refresh. Depending on your needs, you might not need to have 2 separate instances. We need it because the 'write' instance is also doing a lot of metadata pre-write operations in the same jvm as Solr, and so has its own memory requirements. 2. We use sharding all the time, and it works just fine with this scenario, as the RO instance is simply another shard in the pack. On Sun, Sep 12, 2010 at 8:46 PM, Peter Karich <[EMAIL PROTECTED]> wrote: > Peter, > > thanks a lot for your in-depth explanations! > Your findings will be definitely helpful for my next performance > improvement tests :-) > > Two questions: > > 1. How would I do that: > >> or a local read-only instance that reads the same core as the indexing >> instance (for the latter, you'll need something that periodically refreshes - i.e. runs commit()). > > > 2. Did you try sharding with your current setup (e.g. one big, > nearly-static index and a tiny write+read index)? > > Regards, > Peter. > >> Hi, >> >> Below are some notes regarding Solr cache tuning that should prove >> useful for anyone who uses Solr with frequent commits (e.g. <5min). >> >> Environment: >> Solr 1.4.1 or branch_3x trunk. >> Note the 4.x trunk has lots of neat new features, so the notes here >> are likely less relevant to the 4.x environment. >> >> Overview: >> Our Solr environment makes extensive use of faceting, we perform >> commits every 30secs, and the indexes tend be on the large-ish side >> (>20million docs). >> Note: For our data, when we commit, we are always adding new data, >> never changing existing data. >> This type of environment can be tricky to tune, as Solr is more geared >> toward fast reads than frequent writes. >> >> Symptoms: >> If anyone has used faceting in searches where you are also performing >> frequent commits, you've likely encountered the dreaded OutOfMemory or >> GC Overhead Exeeded errors. >> In high commit rate environments, this is almost always due to >> multiple 'onDeck' searchers and autowarming - i.e. new searchers don't >> finish autowarming their caches before the next commit() >> comes along and invalidates them. >> Once this starts happening on a regular basis, it is likely your >> Solr's JVM will run out of memory eventually, as the number of >> searchers (and their cache arrays) will keep growing until the JVM >> dies of thirst. >> To check if your Solr environment is suffering from this, turn on INFO >> level logging, and look for: 'PERFORMANCE WARNING: Overlapping >> onDeckSearchers=x'. >> >> In tests, we've only ever seen this problem when using faceting, and >> facet.method=fc. >> >> Some solutions to this are: >> Reduce the commit rate to allow searchers to fully warm before the >> next commit >> Reduce or eliminate the autowarming in caches >> Both of the above >> >> The trouble is, if you're doing NRT commits, you likely have a good >> reason for it, and reducing/elimintating autowarming will very >> significantly impact search performance in high commit rate >> environments. >> >> Solution: >> Here are some setup steps we've used that allow lots of faceting (we +
Peter Sturge 2010-09-13, 08:09
-
Re: Tuning Solr caches with high commit rates (NRT)Peter Karich 2010-09-14, 07:37
Hi Peter,
this scenario would be really great for us - I didn't know that this is possible and works, so: thanks! At the moment we are doing similar with replicating to the readonly instance but the replication is somewhat lengthy and resource-intensive at this datavolume ;-) Regards, Peter. > 1. You can run multiple Solr instances in separate JVMs, with both > having their solr.xml configured to use the same index folder. > You need to be careful that one and only one of these instances will > ever update the index at a time. The best way to ensure this is to use > one for writing only, > and the other is read-only and never writes to the index. This > read-only instance is the one to use for tuning for high search > performance. Even though the RO instance doesn't write to the index, > it still needs periodic (albeit empty) commits to kick off > autowarming/cache refresh. > > Depending on your needs, you might not need to have 2 separate > instances. We need it because the 'write' instance is also doing a lot > of metadata pre-write operations in the same jvm as Solr, and so has > its own memory requirements. > > 2. We use sharding all the time, and it works just fine with this > scenario, as the RO instance is simply another shard in the pack. > > > On Sun, Sep 12, 2010 at 8:46 PM, Peter Karich <[EMAIL PROTECTED]> wrote: > >> Peter, >> >> thanks a lot for your in-depth explanations! >> Your findings will be definitely helpful for my next performance >> improvement tests :-) >> >> Two questions: >> >> 1. How would I do that: >> >> >>> or a local read-only instance that reads the same core as the indexing >>> instance (for the latter, you'll need something that periodically refreshes - i.e. runs commit()). >>> >> >> 2. Did you try sharding with your current setup (e.g. one big, >> nearly-static index and a tiny write+read index)? >> >> Regards, >> Peter. >> >> >>> Hi, >>> >>> Below are some notes regarding Solr cache tuning that should prove >>> useful for anyone who uses Solr with frequent commits (e.g. <5min). >>> >>> Environment: >>> Solr 1.4.1 or branch_3x trunk. >>> Note the 4.x trunk has lots of neat new features, so the notes here >>> are likely less relevant to the 4.x environment. >>> >>> Overview: >>> Our Solr environment makes extensive use of faceting, we perform >>> commits every 30secs, and the indexes tend be on the large-ish side >>> (>20million docs). >>> Note: For our data, when we commit, we are always adding new data, >>> never changing existing data. >>> This type of environment can be tricky to tune, as Solr is more geared >>> toward fast reads than frequent writes. >>> >>> Symptoms: >>> If anyone has used faceting in searches where you are also performing >>> frequent commits, you've likely encountered the dreaded OutOfMemory or >>> GC Overhead Exeeded errors. >>> In high commit rate environments, this is almost always due to >>> multiple 'onDeck' searchers and autowarming - i.e. new searchers don't >>> finish autowarming their caches before the next commit() >>> comes along and invalidates them. >>> Once this starts happening on a regular basis, it is likely your >>> Solr's JVM will run out of memory eventually, as the number of >>> searchers (and their cache arrays) will keep growing until the JVM >>> dies of thirst. >>> To check if your Solr environment is suffering from this, turn on INFO >>> level logging, and look for: 'PERFORMANCE WARNING: Overlapping >>> onDeckSearchers=x'. >>> >>> In tests, we've only ever seen this problem when using faceting, and >>> facet.method=fc. >>> >>> Some solutions to this are: >>> Reduce the commit rate to allow searchers to fully warm before the >>> next commit >>> Reduce or eliminate the autowarming in caches +
Peter Karich 2010-09-14, 07:37
-
Re: Tuning Solr caches with high commit rates (NRT)Peter Karich 2010-09-14, 13:00
Peter Sturge,
this was a nice hint, thanks again! If you are here in Germany anytime I can invite you to a beer or an apfelschorle ! :-) I only needed to change the lockType to none in the solrconfig.xml, disable the replication and set the data dir to the master data dir! Regards, Peter Karich. > Hi Peter, > > this scenario would be really great for us - I didn't know that this is > possible and works, so: thanks! > At the moment we are doing similar with replicating to the readonly > instance but > the replication is somewhat lengthy and resource-intensive at this > datavolume ;-) > > Regards, > Peter. > > >> 1. You can run multiple Solr instances in separate JVMs, with both >> having their solr.xml configured to use the same index folder. >> You need to be careful that one and only one of these instances will >> ever update the index at a time. The best way to ensure this is to use >> one for writing only, >> and the other is read-only and never writes to the index. This >> read-only instance is the one to use for tuning for high search >> performance. Even though the RO instance doesn't write to the index, >> it still needs periodic (albeit empty) commits to kick off >> autowarming/cache refresh. >> >> Depending on your needs, you might not need to have 2 separate >> instances. We need it because the 'write' instance is also doing a lot >> of metadata pre-write operations in the same jvm as Solr, and so has >> its own memory requirements. >> >> 2. We use sharding all the time, and it works just fine with this >> scenario, as the RO instance is simply another shard in the pack. >> >> >> On Sun, Sep 12, 2010 at 8:46 PM, Peter Karich <[EMAIL PROTECTED]> wrote: >> >> >>> Peter, >>> >>> thanks a lot for your in-depth explanations! >>> Your findings will be definitely helpful for my next performance >>> improvement tests :-) >>> >>> Two questions: >>> >>> 1. How would I do that: >>> >>> >>> >>>> or a local read-only instance that reads the same core as the indexing >>>> instance (for the latter, you'll need something that periodically refreshes - i.e. runs commit()). >>>> >>>> >>> 2. Did you try sharding with your current setup (e.g. one big, >>> nearly-static index and a tiny write+read index)? >>> >>> Regards, >>> Peter. >>> >>> >>> >>>> Hi, >>>> >>>> Below are some notes regarding Solr cache tuning that should prove >>>> useful for anyone who uses Solr with frequent commits (e.g. <5min). >>>> >>>> Environment: >>>> Solr 1.4.1 or branch_3x trunk. >>>> Note the 4.x trunk has lots of neat new features, so the notes here >>>> are likely less relevant to the 4.x environment. >>>> >>>> Overview: >>>> Our Solr environment makes extensive use of faceting, we perform >>>> commits every 30secs, and the indexes tend be on the large-ish side >>>> (>20million docs). >>>> Note: For our data, when we commit, we are always adding new data, >>>> never changing existing data. >>>> This type of environment can be tricky to tune, as Solr is more geared >>>> toward fast reads than frequent writes. >>>> >>>> Symptoms: >>>> If anyone has used faceting in searches where you are also performing >>>> frequent commits, you've likely encountered the dreaded OutOfMemory or >>>> GC Overhead Exeeded errors. >>>> In high commit rate environments, this is almost always due to >>>> multiple 'onDeck' searchers and autowarming - i.e. new searchers don't >>>> finish autowarming their caches before the next commit() >>>> comes along and invalidates them. >>>> Once this starts happening on a regular basis, it is likely your >>>> Solr's JVM will run out of memory eventually, as the number of >>>> searchers (and their cache arrays) will keep growing until the JVM >>>> dies of thirst. +
Peter Karich 2010-09-14, 13:00
-
Re: Tuning Solr caches with high commit rates (NRT)Peter Sturge 2010-09-17, 09:18
Hi,
It's great to see such a fantastic response to this thread - NRT is alive and well! I'm hoping to collate this information and add it to the wiki when I get a few free cycles (thanks Erik for the heads up). In the meantime, I thought I'd add a few tidbits of additional information that might prove useful: 1. The first one to note is that the techniques/setup described in this thread don't fix the underlying potential for OutOfMemory errors - there can always be an index large enough to ask of its JVM more memory than is available for cache. These techniques, however, mitigate the risk, and provide an efficient balance between memory use and search performance. There are some interesting discussions going on for both Lucene and Solr regarding the '2 pounds of baloney into a 1 pound bag' issue of unbounded caches, with a number of interesting strategies. One strategy that I like, but haven't found in discussion lists is auto-limiting cache size/warming based on available resources (similar to the way file system caches use free memory). This would allow caches to adjust to their memory environment as indexes grow. 2. A note regarding lockType in solrconfig.xml for dual Solr instances: It's best not to use 'none' as a value for lockType - this sets the lockType to null, and as the source comments note, this is a recipe for disaster, so, use 'simple' instead. 3. Chris mentioned setting maxWarmingSearchers to 1 as a way of minimizing the number of onDeckSearchers. This is a prudent move -- thanks Chris for bringing this up! All the best, Peter On Tue, Sep 14, 2010 at 2:00 PM, Peter Karich <[EMAIL PROTECTED]> wrote: > Peter Sturge, > > this was a nice hint, thanks again! If you are here in Germany anytime I > can invite you to a beer or an apfelschorle ! :-) > I only needed to change the lockType to none in the solrconfig.xml, > disable the replication and set the data dir to the master data dir! > > Regards, > Peter Karich. > >> Hi Peter, >> >> this scenario would be really great for us - I didn't know that this is >> possible and works, so: thanks! >> At the moment we are doing similar with replicating to the readonly >> instance but >> the replication is somewhat lengthy and resource-intensive at this >> datavolume ;-) >> >> Regards, >> Peter. >> >> >>> 1. You can run multiple Solr instances in separate JVMs, with both >>> having their solr.xml configured to use the same index folder. >>> You need to be careful that one and only one of these instances will >>> ever update the index at a time. The best way to ensure this is to use >>> one for writing only, >>> and the other is read-only and never writes to the index. This >>> read-only instance is the one to use for tuning for high search >>> performance. Even though the RO instance doesn't write to the index, >>> it still needs periodic (albeit empty) commits to kick off >>> autowarming/cache refresh. >>> >>> Depending on your needs, you might not need to have 2 separate >>> instances. We need it because the 'write' instance is also doing a lot >>> of metadata pre-write operations in the same jvm as Solr, and so has >>> its own memory requirements. >>> >>> 2. We use sharding all the time, and it works just fine with this >>> scenario, as the RO instance is simply another shard in the pack. >>> >>> >>> On Sun, Sep 12, 2010 at 8:46 PM, Peter Karich <[EMAIL PROTECTED]> wrote: >>> >>> >>>> Peter, >>>> >>>> thanks a lot for your in-depth explanations! >>>> Your findings will be definitely helpful for my next performance >>>> improvement tests :-) >>>> >>>> Two questions: >>>> >>>> 1. How would I do that: >>>> >>>> >>>> >>>>> or a local read-only instance that reads the same core as the indexing >>>>> instance (for the latter, you'll need something that periodically refreshes - i.e. runs commit()). >>>>> >>>>> >>>> 2. Did you try sharding with your current setup (e.g. one big, +
Peter Sturge 2010-09-17, 09:18
-
Re: Tuning Solr caches with high commit rates (NRT)Dennis Gearon 2010-09-17, 16:55
BTW, what is NRT?
Dennis Gearon Signature Warning ---------------- EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Fri, 9/17/10, Peter Sturge <[EMAIL PROTECTED]> wrote: > From: Peter Sturge <[EMAIL PROTECTED]> > Subject: Re: Tuning Solr caches with high commit rates (NRT) > To: solr[EMAIL PROTECTED] > Date: Friday, September 17, 2010, 2:18 AM > Hi, > > It's great to see such a fantastic response to this thread > - NRT is > alive and well! > > I'm hoping to collate this information and add it to the > wiki when I > get a few free cycles (thanks Erik for the heads up). > > In the meantime, I thought I'd add a few tidbits of > additional > information that might prove useful: > > 1. The first one to note is that the techniques/setup > described in > this thread don't fix the underlying potential for > OutOfMemory errors > - there can always be an index large enough to ask of its > JVM more > memory than is available for cache. > These techniques, however, mitigate the risk, and provide > an efficient > balance between memory use and search performance. > There are some interesting discussions going on for both > Lucene and > Solr regarding the '2 pounds of baloney into a 1 pound bag' > issue of > unbounded caches, with a number of interesting strategies. > One strategy that I like, but haven't found in discussion > lists is > auto-limiting cache size/warming based on available > resources (similar > to the way file system caches use free memory). This would > allow > caches to adjust to their memory environment as indexes > grow. > > 2. A note regarding lockType in solrconfig.xml for dual > Solr > instances: It's best not to use 'none' as a value for > lockType - this > sets the lockType to null, and as the source comments note, > this is a > recipe for disaster, so, use 'simple' instead. > > 3. Chris mentioned setting maxWarmingSearchers to 1 as a > way of > minimizing the number of onDeckSearchers. This is a prudent > move -- > thanks Chris for bringing this up! > > All the best, > Peter > > > > > On Tue, Sep 14, 2010 at 2:00 PM, Peter Karich <[EMAIL PROTECTED]> > wrote: > > Peter Sturge, > > > > this was a nice hint, thanks again! If you are here in > Germany anytime I > > can invite you to a beer or an apfelschorle ! :-) > > I only needed to change the lockType to none in the > solrconfig.xml, > > disable the replication and set the data dir to the > master data dir! > > > > Regards, > > Peter Karich. > > > >> Hi Peter, > >> > >> this scenario would be really great for us - I > didn't know that this is > >> possible and works, so: thanks! > >> At the moment we are doing similar with > replicating to the readonly > >> instance but > >> the replication is somewhat lengthy and > resource-intensive at this > >> datavolume ;-) > >> > >> Regards, > >> Peter. > >> > >> > >>> 1. You can run multiple Solr instances in > separate JVMs, with both > >>> having their solr.xml configured to use the > same index folder. > >>> You need to be careful that one and only one > of these instances will > >>> ever update the index at a time. The best way > to ensure this is to use > >>> one for writing only, > >>> and the other is read-only and never writes to > the index. This > >>> read-only instance is the one to use for > tuning for high search > >>> performance. Even though the RO instance > doesn't write to the index, > >>> it still needs periodic (albeit empty) commits > to kick off > >>> autowarming/cache refresh. > >>> > >>> Depending on your needs, you might not need to > have 2 separate > >>> instances. We need it because the 'write' > instance is also doing a lot > >>> of metadata pre-write operations in the same > jvm as <em +
Dennis Gearon 2010-09-17, 16:55
-
Re: Tuning Solr caches with high commit rates (NRT)Erick Erickson 2010-09-17, 17:05
Near Real Time...
Erick On Fri, Sep 17, 2010 at 12:55 PM, Dennis Gearon <[EMAIL PROTECTED]>wrote: > BTW, what is NRT? > > Dennis Gearon > > Signature Warning > ---------------- > EARTH has a Right To Life, > otherwise we all die. > > Read 'Hot, Flat, and Crowded' > Laugh at http://www.yert.com/film.php > > > --- On Fri, 9/17/10, Peter Sturge <[EMAIL PROTECTED]> wrote: > > > From: Peter Sturge <[EMAIL PROTECTED]> > > Subject: Re: Tuning Solr caches with high commit rates (NRT) > > To: solr[EMAIL PROTECTED] > > Date: Friday, September 17, 2010, 2:18 AM > > Hi, > > > > It's great to see such a fantastic response to this thread > > - NRT is > > alive and well! > > > > I'm hoping to collate this information and add it to the > > wiki when I > > get a few free cycles (thanks Erik for the heads up). > > > > In the meantime, I thought I'd add a few tidbits of > > additional > > information that might prove useful: > > > > 1. The first one to note is that the techniques/setup > > described in > > this thread don't fix the underlying potential for > > OutOfMemory errors > > - there can always be an index large enough to ask of its > > JVM more > > memory than is available for cache. > > These techniques, however, mitigate the risk, and provide > > an efficient > > balance between memory use and search performance. > > There are some interesting discussions going on for both > > Lucene and > > Solr regarding the '2 pounds of baloney into a 1 pound bag' > > issue of > > unbounded caches, with a number of interesting strategies. > > One strategy that I like, but haven't found in discussion > > lists is > > auto-limiting cache size/warming based on available > > resources (similar > > to the way file system caches use free memory). This would > > allow > > caches to adjust to their memory environment as indexes > > grow. > > > > 2. A note regarding lockType in solrconfig.xml for dual > > Solr > > instances: It's best not to use 'none' as a value for > > lockType - this > > sets the lockType to null, and as the source comments note, > > this is a > > recipe for disaster, so, use 'simple' instead. > > > > 3. Chris mentioned setting maxWarmingSearchers to 1 as a > > way of > > minimizing the number of onDeckSearchers. This is a prudent > > move -- > > thanks Chris for bringing this up! > > > > All the best, > > Peter > > > > > > > > > > On Tue, Sep 14, 2010 at 2:00 PM, Peter Karich <[EMAIL PROTECTED]> > > wrote: > > > Peter Sturge, > > > > > > this was a nice hint, thanks again! If you are here in > > Germany anytime I > > > can invite you to a beer or an apfelschorle ! :-) > > > I only needed to change the lockType to none in the > > solrconfig.xml, > > > disable the replication and set the data dir to the > > master data dir! > > > > > > Regards, > > > Peter Karich. > > > > > >> Hi Peter, > > >> > > >> this scenario would be really great for us - I > > didn't know that this is > > >> possible and works, so: thanks! > > >> At the moment we are doing similar with > > replicating to the readonly > > >> instance but > > >> the replication is somewhat lengthy and > > resource-intensive at this > > >> datavolume ;-) > > >> > > >> Regards, > > >> Peter. > > >> > > >> > > >>> 1. You can run multiple Solr instances in > > separate JVMs, with both > > >>> having their solr.xml configured to use the > > same index folder. > > >>> You need to be careful that one and only one > > of these instances will > > >>> ever update the index at a time. The best way > > to ensure this is to use > > >>> one for writing only, > > >>> and the other is read-only and never writes to > > the index. This > > >>> read-only instance is the one to use for > > tuning for high search > > >>> performance. Even though the RO instance > > doesn't write to the index, > > +
Erick Erickson 2010-09-17, 17:05
-
Re: Tuning Solr caches with high commit rates (NRT)Dennis Gearon 2010-09-17, 17:59
This means both the indexing and the searching in NRT?
Dennis Gearon Signature Warning ---------------- EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Fri, 9/17/10, Erick Erickson <[EMAIL PROTECTED]> wrote: > From: Erick Erickson <[EMAIL PROTECTED]> > Subject: Re: Tuning Solr caches with high commit rates (NRT) > To: solr[EMAIL PROTECTED] > Date: Friday, September 17, 2010, 10:05 AM > Near Real Time... > > Erick > > On Fri, Sep 17, 2010 at 12:55 PM, Dennis Gearon <[EMAIL PROTECTED]>wrote: > > > BTW, what is NRT? > > > > Dennis Gearon > > > > Signature Warning > > ---------------- > > EARTH has a Right To Life, > > otherwise we all die. > > > > Read 'Hot, Flat, and Crowded' > > Laugh at http://www.yert.com/film.php > > > > > > --- On Fri, 9/17/10, Peter Sturge <[EMAIL PROTECTED]> > wrote: > > > > > From: Peter Sturge <[EMAIL PROTECTED]> > > > Subject: Re: Tuning Solr caches with high commit > rates (NRT) > > > To: solr[EMAIL PROTECTED] > > > Date: Friday, September 17, 2010, 2:18 AM > > > Hi, > > > > > > It's great to see such a fantastic response to > this thread > > > - NRT is > > > alive and well! > > > > > > I'm hoping to collate this information and add it > to the > > > wiki when I > > > get a few free cycles (thanks Erik for the heads > up). > > > > > > In the meantime, I thought I'd add a few tidbits > of > > > additional > > > information that might prove useful: > > > > > > 1. The first one to note is that the > techniques/setup > > > described in > > > this thread don't fix the underlying potential > for > > > OutOfMemory errors > > > - there can always be an index large enough to > ask of its > > > JVM more > > > memory than is available for cache. > > > These techniques, however, mitigate the risk, and > provide > > > an efficient > > > balance between memory use and search > performance. > > > There are some interesting discussions going on > for both > > > Lucene and > > > Solr regarding the '2 pounds of baloney into a 1 > pound bag' > > > issue of > > > unbounded caches, with a number of interesting > strategies. > > > One strategy that I like, but haven't found in > discussion > > > lists is > > > auto-limiting cache size/warming based on > available > > > resources (similar > > > to the way file system caches use free memory). > This would > > > allow > > > caches to adjust to their memory environment as > indexes > > > grow. > > > > > > 2. A note regarding lockType in solrconfig.xml > for dual > > > Solr > > > instances: It's best not to use 'none' as a value > for > > > lockType - this > > > sets the lockType to null, and as the source > comments note, > > > this is a > > > recipe for disaster, so, use 'simple' instead. > > > > > > 3. Chris mentioned setting maxWarmingSearchers to > 1 as a > > > way of > > > minimizing the number of onDeckSearchers. This is > a prudent > > > move -- > > > thanks Chris for bringing this up! > > > > > > All the best, > > > Peter > > > > > > > > > > > > > > > On Tue, Sep 14, 2010 at 2:00 PM, Peter Karich > <[EMAIL PROTECTED]> > > > wrote: > > > > Peter Sturge, > > > > > > > > this was a nice hint, thanks again! If you > are here in > > > Germany anytime I > > > > can invite you to a beer or an apfelschorle > ! :-) > > > > I only needed to change the lockType to none > in the > > > solrconfig.xml, > > > > disable the replication and set the data dir > to the > > > master data dir! > > > > > > > > Regards, > > > > Peter Karich. > > > > > > > >> Hi Peter, > > > >> > > > >> this scenario would be really great for > us - I > > > didn't know that this is > > > >> possible and works, so: thanks! > > > >> At the moment we are doing similar with +
Dennis Gearon 2010-09-17, 17:59
-
Re: Tuning Solr caches with high commit rates (NRT)Andy 2010-09-17, 19:06
Does Solr use Lucene NRT?
--- On Fri, 9/17/10, Erick Erickson <[EMAIL PROTECTED]> wrote: > From: Erick Erickson <[EMAIL PROTECTED]> > Subject: Re: Tuning Solr caches with high commit rates (NRT) > To: solr[EMAIL PROTECTED] > Date: Friday, September 17, 2010, 1:05 PM > Near Real Time... > > Erick > > On Fri, Sep 17, 2010 at 12:55 PM, Dennis Gearon <[EMAIL PROTECTED]>wrote: > > > BTW, what is NRT? > > > > Dennis Gearon > > > > Signature Warning > > ---------------- > > EARTH has a Right To Life, > > otherwise we all die. > > > > Read 'Hot, Flat, and Crowded' > > Laugh at http://www.yert.com/film.php > > > > > > --- On Fri, 9/17/10, Peter Sturge <[EMAIL PROTECTED]> > wrote: > > > > > From: Peter Sturge <[EMAIL PROTECTED]> > > > Subject: Re: Tuning Solr caches with high commit > rates (NRT) > > > To: solr[EMAIL PROTECTED] > > > Date: Friday, September 17, 2010, 2:18 AM > > > Hi, > > > > > > It's great to see such a fantastic response to > this thread > > > - NRT is > > > alive and well! > > > > > > I'm hoping to collate this information and add it > to the > > > wiki when I > > > get a few free cycles (thanks Erik for the heads > up). > > > > > > In the meantime, I thought I'd add a few tidbits > of > > > additional > > > information that might prove useful: > > > > > > 1. The first one to note is that the > techniques/setup > > > described in > > > this thread don't fix the underlying potential > for > > > OutOfMemory errors > > > - there can always be an index large enough to > ask of its > > > JVM more > > > memory than is available for cache. > > > These techniques, however, mitigate the risk, and > provide > > > an efficient > > > balance between memory use and search > performance. > > > There are some interesting discussions going on > for both > > > Lucene and > > > Solr regarding the '2 pounds of baloney into a 1 > pound bag' > > > issue of > > > unbounded caches, with a number of interesting > strategies. > > > One strategy that I like, but haven't found in > discussion > > > lists is > > > auto-limiting cache size/warming based on > available > > > resources (similar > > > to the way file system caches use free memory). > This would > > > allow > > > caches to adjust to their memory environment as > indexes > > > grow. > > > > > > 2. A note regarding lockType in solrconfig.xml > for dual > > > Solr > > > instances: It's best not to use 'none' as a value > for > > > lockType - this > > > sets the lockType to null, and as the source > comments note, > > > this is a > > > recipe for disaster, so, use 'simple' instead. > > > > > > 3. Chris mentioned setting maxWarmingSearchers to > 1 as a > > > way of > > > minimizing the number of onDeckSearchers. This is > a prudent > > > move -- > > > thanks Chris for bringing this up! > > > > > > All the best, > > > Peter > > > > > > > > > > > > > > > On Tue, Sep 14, 2010 at 2:00 PM, Peter Karich > <[EMAIL PROTECTED]> > > > wrote: > > > > Peter Sturge, > > > > > > > > this was a nice hint, thanks again! If you > are here in > > > Germany anytime I > > > > can invite you to a beer or an apfelschorle > ! :-) > > > > I only needed to change the lockType to none > in the > > > solrconfig.xml, > > > > disable the replication and set the data dir > to the > > > master data dir! > > > > > > > > Regards, > > > > Peter Karich. > > > > > > > >> Hi Peter, > > > >> > > > >> this scenario would be really great for > us - I > > > didn't know that this is > > > >> possible and works, so: thanks! > > > >> At the moment we are doing similar with > > > replicating to the readonly > > > >> instance but > > > >> the replication is somewhat lengthy and > > > resource-intensive at this > > > >> datavolume ;-) > > > >> > > > >> Regards, +
Andy 2010-09-17, 19:06
-
Re: Tuning Solr caches with high commit rates (NRT)Peter Sturge 2010-09-17, 22:48
Solr 4.x has new NRT stuff included (uses latest Lucene 3.x, includes
per-segment faceting etc.). The Solr 3.x branch doesn't currently.. On Fri, Sep 17, 2010 at 8:06 PM, Andy <[EMAIL PROTECTED]> wrote: > Does Solr use Lucene NRT? > > --- On Fri, 9/17/10, Erick Erickson <[EMAIL PROTECTED]> wrote: > >> From: Erick Erickson <[EMAIL PROTECTED]> >> Subject: Re: Tuning Solr caches with high commit rates (NRT) >> To: solr[EMAIL PROTECTED] >> Date: Friday, September 17, 2010, 1:05 PM >> Near Real Time... >> >> Erick >> >> On Fri, Sep 17, 2010 at 12:55 PM, Dennis Gearon <[EMAIL PROTECTED]>wrote: >> >> > BTW, what is NRT? >> > >> > Dennis Gearon >> > >> > Signature Warning >> > ---------------- >> > EARTH has a Right To Life, >> > otherwise we all die. >> > >> > Read 'Hot, Flat, and Crowded' >> > Laugh at http://www.yert.com/film.php >> > >> > >> > --- On Fri, 9/17/10, Peter Sturge <[EMAIL PROTECTED]> >> wrote: >> > >> > > From: Peter Sturge <[EMAIL PROTECTED]> >> > > Subject: Re: Tuning Solr caches with high commit >> rates (NRT) >> > > To: solr[EMAIL PROTECTED] >> > > Date: Friday, September 17, 2010, 2:18 AM >> > > Hi, >> > > >> > > It's great to see such a fantastic response to >> this thread >> > > - NRT is >> > > alive and well! >> > > >> > > I'm hoping to collate this information and add it >> to the >> > > wiki when I >> > > get a few free cycles (thanks Erik for the heads >> up). >> > > >> > > In the meantime, I thought I'd add a few tidbits >> of >> > > additional >> > > information that might prove useful: >> > > >> > > 1. The first one to note is that the >> techniques/setup >> > > described in >> > > this thread don't fix the underlying potential >> for >> > > OutOfMemory errors >> > > - there can always be an index large enough to >> ask of its >> > > JVM more >> > > memory than is available for cache. >> > > These techniques, however, mitigate the risk, and >> provide >> > > an efficient >> > > balance between memory use and search >> performance. >> > > There are some interesting discussions going on >> for both >> > > Lucene and >> > > Solr regarding the '2 pounds of baloney into a 1 >> pound bag' >> > > issue of >> > > unbounded caches, with a number of interesting >> strategies. >> > > One strategy that I like, but haven't found in >> discussion >> > > lists is >> > > auto-limiting cache size/warming based on >> available >> > > resources (similar >> > > to the way file system caches use free memory). >> This would >> > > allow >> > > caches to adjust to their memory environment as >> indexes >> > > grow. >> > > >> > > 2. A note regarding lockType in solrconfig.xml >> for dual >> > > Solr >> > > instances: It's best not to use 'none' as a value >> for >> > > lockType - this >> > > sets the lockType to null, and as the source >> comments note, >> > > this is a >> > > recipe for disaster, so, use 'simple' instead. >> > > >> > > 3. Chris mentioned setting maxWarmingSearchers to >> 1 as a >> > > way of >> > > minimizing the number of onDeckSearchers. This is >> a prudent >> > > move -- >> > > thanks Chris for bringing this up! >> > > >> > > All the best, >> > > Peter >> > > >> > > >> > > >> > > >> > > On Tue, Sep 14, 2010 at 2:00 PM, Peter Karich >> <[EMAIL PROTECTED]> >> > > wrote: >> > > > Peter Sturge, >> > > > >> > > > this was a nice hint, thanks again! If you >> are here in >> > > Germany anytime I >> > > > can invite you to a beer or an apfelschorle >> ! :-) >> > > > I only needed to change the lockType to none >> in the >> > > solrconfig.xml, >> > > > disable the replication and set the data dir >> to the >> > > master data dir! >> > > > >> > > > Regards, >> > > > Peter Karich. >> > > > >> > > >> Hi Peter, +
Peter Sturge 2010-09-17, 22:48
-
RE: Tuning Solr caches with high commit rates (NRT)Bruce Ritchie 2010-09-30, 15:26
> One strategy that I like, but haven't found in discussion lists is
> auto-limiting cache size/warming based on available resources (similar > to the way file system caches use free memory). This would allow > caches to adjust to their memory environment as indexes grow. I've written such a cache for use as a Voldemort store in the past. I'm going to rewrite it in the near future to improve the code however the general idea can be seen at http://code.google.com/p/project-voldemort/issues/detail?id=225 The trickiest part of doing an auto-limiting cache based on available memory is making sure that it works nicely with the garbage collector. Getting that balance right so that the gc doesn't churn needlessly took me more time than writing the cache. Bruce +
Bruce Ritchie 2010-09-30, 15:26
-
Re: Tuning Solr caches with high commit rates (NRT)Anders Melchiorsen 2010-10-11, 10:01
Hi,
why do you need to change the lockType? Does a readonly instance need locks at all? thanks, Anders. On Tue, 14 Sep 2010 15:00:54 +0200, Peter Karich <[EMAIL PROTECTED]> wrote: > Peter Sturge, > > this was a nice hint, thanks again! If you are here in Germany anytime I > can invite you to a beer or an apfelschorle ! :-) > I only needed to change the lockType to none in the solrconfig.xml, > disable the replication and set the data dir to the master data dir! > > Regards, > Peter Karich. > >> Hi Peter, >> >> this scenario would be really great for us - I didn't know that this is >> possible and works, so: thanks! >> At the moment we are doing similar with replicating to the readonly >> instance but >> the replication is somewhat lengthy and resource-intensive at this >> datavolume ;-) >> >> Regards, >> Peter. >> >> >>> 1. You can run multiple Solr instances in separate JVMs, with both >>> having their solr.xml configured to use the same index folder. >>> You need to be careful that one and only one of these instances will >>> ever update the index at a time. The best way to ensure this is to use >>> one for writing only, >>> and the other is read-only and never writes to the index. This >>> read-only instance is the one to use for tuning for high search >>> performance. Even though the RO instance doesn't write to the index, >>> it still needs periodic (albeit empty) commits to kick off >>> autowarming/cache refresh. >>> >>> Depending on your needs, you might not need to have 2 separate >>> instances. We need it because the 'write' instance is also doing a lot >>> of metadata pre-write operations in the same jvm as Solr, and so has >>> its own memory requirements. >>> >>> 2. We use sharding all the time, and it works just fine with this >>> scenario, as the RO instance is simply another shard in the pack. >>> >>> >>> On Sun, Sep 12, 2010 at 8:46 PM, Peter Karich <[EMAIL PROTECTED]> wrote: >>> >>> >>>> Peter, >>>> >>>> thanks a lot for your in-depth explanations! >>>> Your findings will be definitely helpful for my next performance >>>> improvement tests :-) >>>> >>>> Two questions: >>>> >>>> 1. How would I do that: >>>> >>>> >>>> >>>>> or a local read-only instance that reads the same core as the indexing >>>>> instance (for the latter, you'll need something that periodically >>>>> refreshes - i.e. runs commit()). >>>>> >>>>> >>>> 2. Did you try sharding with your current setup (e.g. one big, >>>> nearly-static index and a tiny write+read index)? >>>> >>>> Regards, >>>> Peter. >>>> >>>> >>>> >>>>> Hi, >>>>> >>>>> Below are some notes regarding Solr cache tuning that should prove >>>>> useful for anyone who uses Solr with frequent commits (e.g. <5min). >>>>> >>>>> Environment: >>>>> Solr 1.4.1 or branch_3x trunk. >>>>> Note the 4.x trunk has lots of neat new features, so the notes here >>>>> are likely less relevant to the 4.x environment. >>>>> >>>>> Overview: >>>>> Our Solr environment makes extensive use of faceting, we perform >>>>> commits every 30secs, and the indexes tend be on the large-ish side >>>>> (>20million docs). >>>>> Note: For our data, when we commit, we are always adding new data, >>>>> never changing existing data. >>>>> This type of environment can be tricky to tune, as Solr is more geared >>>>> toward fast reads than frequent writes. >>>>> >>>>> Symptoms: >>>>> If anyone has used faceting in searches where you are also performing >>>>> frequent commits, you've likely encountered the dreaded OutOfMemory or >>>>> GC Overhead Exeeded errors. >>>>> In high commit rate environments, this is almost always due to >>>>> multiple 'onDeck' searchers and autowarming - i.e. new searchers don't >>>>> finish autowarming their caches before the next commit() >>>>> comes along and invalidates them. INFO the date your worthwhile our to facets use for lot +
Anders Melchiorsen 2010-10-11, 10:01
-
Re: Tuning Solr caches with high commit rates (NRT)Chris Haggstrom 2010-09-13, 01:45
Thanks, Peter. This is really great info.
One setting I've found to be very useful for the problem of overlapping onDeskSearchers is to reduce the value of maxWarmingSearchers in solrconfig.xml. I've reduced this to 1, so if a slave is already busy doing pre-warming, it won't try to also pre-warm additional updates. This has greatly reduced our time to incorporate updates, with no visible downsides other than an uglier snapinstaller.log (we're still using 1.3 w/rsync-based replication). -Chris On Sep 12, 2010, at 9:26 AM, Peter Sturge wrote: > Hi, > > Below are some notes regarding Solr cache tuning that should prove > useful for anyone who uses Solr with frequent commits (e.g. <5min). > > Environment: > Solr 1.4.1 or branch_3x trunk. > Note the 4.x trunk has lots of neat new features, so the notes here > are likely less relevant to the 4.x environment. > > Overview: > Our Solr environment makes extensive use of faceting, we perform > commits every 30secs, and the indexes tend be on the large-ish side > (>20million docs). > Note: For our data, when we commit, we are always adding new data, > never changing existing data. > This type of environment can be tricky to tune, as Solr is more geared > toward fast reads than frequent writes. > > Symptoms: > If anyone has used faceting in searches where you are also performing > frequent commits, you've likely encountered the dreaded OutOfMemory or > GC Overhead Exeeded errors. > In high commit rate environments, this is almost always due to > multiple 'onDeck' searchers and autowarming - i.e. new searchers don't > finish autowarming their caches before the next commit() > comes along and invalidates them. > Once this starts happening on a regular basis, it is likely your > Solr's JVM will run out of memory eventually, as the number of > searchers (and their cache arrays) will keep growing until the JVM > dies of thirst. > To check if your Solr environment is suffering from this, turn on INFO > level logging, and look for: 'PERFORMANCE WARNING: Overlapping > onDeckSearchers=x'. > > In tests, we've only ever seen this problem when using faceting, and > facet.method=fc. > > Some solutions to this are: > Reduce the commit rate to allow searchers to fully warm before the > next commit > Reduce or eliminate the autowarming in caches > Both of the above > > The trouble is, if you're doing NRT commits, you likely have a good > reason for it, and reducing/elimintating autowarming will very > significantly impact search performance in high commit rate > environments. > > Solution: > Here are some setup steps we've used that allow lots of faceting (we > typically search with at least 20-35 different facet fields, and date > faceting/sorting) on large indexes, and still keep decent search > performance: > > 1. Firstly, you should consider using the enum method for facet > searches (facet.method=enum) unless you've got A LOT of memory on your > machine. In our tests, this method uses a lot less memory and > autowarms more quickly than fc. (Note, I've not tried the new > segement-based 'fcs' option, as I can't find support for it in > branch_3x - looks nice for 4.x though) > Admittedly, for our data, enum is not quite as fast for searching as > fc, but short of purchsing a Thaiwanese RAM factory, it's a worthwhile > tradeoff. > If you do have access to LOTS of memory, AND you can guarantee that > the index won't grow beyond the memory capacity (i.e. you have some > sort of deletion policy in place), fc can be a lot faster than enum > when searching with lots of facets across many terms. > > 2. Secondly, we've found that LRUCache is faster at autowarming than > FastLRUCache - in our tests, about 20% faster. Maybe this is just our +
Chris Haggstrom 2010-09-13, 01:45
|