|
|
-
OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?
John Russell 2010-12-10, 20:17
I have been load testing solr 1.4.1 and have been running into OOM errors. Not out of heap but with the GC overhead limit exceeded message meaning that it didn't actually run out of heap space but just spent too much CPU time trying to make room and gave up. I got a heap dump and sent it through the Eclipse MAT and found that a single WeakHashMap in FieldCacheImpl called readerCache is taking up 2.1GB of my 2.6GB heap. >From my understanding of WeakHashMaps the GC should be able to collect those references if it needs to but for some reason it isn't here. My questions are: 1) Any ideas why the GC is not collecting those weak references in that single hashmap? 2) Is there a nob in the solr config that can limit the size of that cache? Also, after the OOM is thrown solr doesn't respond much at all and throws the exception below, however when I go to the code I see this try { processor.processAdd(addCmd); addCmd.clear(); } catch (IOException e) { throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "ERROR adding document " + document); } } So its swallowing the IOException and throwing a new one without setting the cause so I can't see what the IOException is. Is this fixed in any newer version? Should I open a bug? Thanks a lot for your help John SEVERE: org.apache.solr.common.SolrException: ERROR adding document SolrInputDocument[{de.id=de.id(1.0)={C2B3B03F1000012C549254560A568C18}, de.type=de.type(1.0)={Social Contact}, sc.author=sc.author(1.0)={Author-3944}, sc.sourceType=sc.sourceType(1.0)={rss}, sc.link=sc.link(1.0)={ http://www.cisco.com/feed/date_12.07.10_16.18.03/idx/10752}, sc.title=sc.title(1.0)={Title-erat metus eget vestibulum}, sc.publishedDate=sc.publishedDate(1.0)={Tue Dec 07 16:22:09 EST 2010}, sc.createdDate=sc.createdDate(1.0 )={Tue Dec 07 16:20:20 EST 2010}, sc.socialContactStatus=sc.socialContactStatus(1.0)={unread}, sc.socialContactStatusUserId=sc.socialContactStatusUserId(1.0)={}, sc.soc ialContactStatusDate=sc.socialContactStatusDate(1.0)={Tue Dec 07 16:20:20 EST 2010}, sc.tags=sc.tags(1.0)={[]}, sc.authorId=sc.authorId(1.0)={}, sc.replyToId=sc.replyTo Id(1.0)={}, sc.replyToAuthor=sc.replyToAuthor(1.0)={}, sc.replyToAuthorId=sc.replyToAuthorId(1.0)={}, sc.feedId=sc.feedId(1.0)={[124852]}, filterResult_124932_ti=filter Result_124932_ti(1.0)={67}, filterStatus_124932_s=filterStatus_124932_s(1.0)={COMPLETED}, filterResult_124937_ti=filterResult_124937_ti(1.0)={67}, filterStatus_124937_s =filterStatus_124937_s(1.0)={COMPLETED}, campaignDateAdded_124957_tdt=campaignDateAdded_124957_tdt(1.0)={Tue Dec 07 16:20:20 EST 2010}, campaignStatus_124957_s=campaign Status_124957_s(1.0)={NEW}, campaignDateAdded_124947_tdt=campaignDateAdded_124947_tdt(1.0)={Tue Dec 07 16:20:20 EST 2010}, campaignStatus_124947_s=campaignStatus_124947 _s(1.0)={NEW}, sc.campaignResultsSummary=sc.campaignResultsSummary(1.0)={[NEW, NEW]}}] at org.apache.solr.handler.BinaryUpdateRequestHandler$2.document(BinaryUpdateRequestHandler.java:81) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$2.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:136) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$2.readIterator(JavaBinUpdateRequestCodec.java:126) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:210) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$2.readNamedList(JavaBinUpdateRequestCodec.java:112) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:175) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:101) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:141) at org.apache.solr.handler.BinaryUpdateRequestHandler.parseAndLoadDocs(BinaryUpdateRequestHandler.java:68) at org.apache.solr.handler.BinaryUpdateRequestHandler.access$000(BinaryUpdateRequestHandler.java:46) at org.apache.solr.handler.BinaryUpdateRequestHandler$1.load(BinaryUpdateRequestHandler.java:55) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:723) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
+
John Russell 2010-12-10, 20:17
-
Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?
Tom Hill 2010-12-10, 21:02
Hi John, WeakReferences allow things to get GC'd, if there are no other references to the object referred to. My understanding is that WeakHashMaps use weak references for the Keys in the HashMap. What this means is that the keys in HashMap can be GC'd, once there are no other references to the key. I _think_ this occurs when the IndexReader is closed. It does not mean that objects in the FieldCache will get evicted in low memory conditions, unless that field cache entry is no longer needed (i.e. the IndexReader has closed). It just means they can be collected, when they are no longer needed (but not before). So, if you are seeing the FieldCache for the current IndexReader taking up 2.1, that's probably for the current cache usage. There isn't a "knob" you can turn to cut the cache size, but you can evaluate your usage of the cache. Some ideas: How many fields are you searching on? Sorting on? Are you sorting on String fields, where you could be using a numeric field? Numerics save space. Do you need to sort on every field that you are sorting on? Could you facet on fewer fields? For a String field, do you have too many distinct values? If so, can you reduce the number or unique terms? You might check your faceting algorithms, and see if you could use enum, instead of fc for some of them. Check your statistics page, what's your insanity count? Tom On Fri, Dec 10, 2010 at 12:17 PM, John Russell <[EMAIL PROTECTED]> wrote: > I have been load testing solr 1.4.1 and have been running into OOM errors. > Not out of heap but with the GC overhead limit exceeded message meaning that > it didn't actually run out of heap space but just spent too much CPU time > trying to make room and gave up. > > I got a heap dump and sent it through the Eclipse MAT and found that a > single WeakHashMap in FieldCacheImpl called readerCache is taking up 2.1GB > of my 2.6GB heap. > > From my understanding of WeakHashMaps the GC should be able to collect those > references if it needs to but for some reason it isn't here. > > My questions are: > > 1) Any ideas why the GC is not collecting those weak references in that > single hashmap? > 2) Is there a nob in the solr config that can limit the size of that cache? > > > Also, after the OOM is thrown solr doesn't respond much at all and throws > the exception below, however when I go to the code I see this > > try { > processor.processAdd(addCmd); > addCmd.clear(); > } catch (IOException e) { > throw new > SolrException(SolrException.ErrorCode.SERVER_ERROR, "ERROR adding document " > + document); > } > } > > So its swallowing the IOException and throwing a new one without setting > the cause so I can't see what the IOException is. Is this fixed in any > newer version? Should I open a bug? > > > Thanks a lot for your help > > John > > > SEVERE: org.apache.solr.common.SolrException: ERROR adding document > SolrInputDocument[{de.id=de.id(1.0)={C2B3B03F1000012C549254560A568C18}, > de.type=de.type(1.0)={Social > Contact}, sc.author=sc.author(1.0)={Author-3944}, > sc.sourceType=sc.sourceType(1.0)={rss}, sc.link=sc.link(1.0)={ > http://www.cisco.com/feed/date_12.07.10_16.18.03/idx/107> 52}, sc.title=sc.title(1.0)={Title-erat metus eget vestibulum}, > sc.publishedDate=sc.publishedDate(1.0)={Tue Dec 07 16:22:09 EST 2010}, > sc.createdDate=sc.createdDate(1.0 > )={Tue Dec 07 16:20:20 EST 2010}, > sc.socialContactStatus=sc.socialContactStatus(1.0)={unread}, > sc.socialContactStatusUserId=sc.socialContactStatusUserId(1.0)={}, sc.soc > ialContactStatusDate=sc.socialContactStatusDate(1.0)={Tue Dec 07 16:20:20 > EST 2010}, sc.tags=sc.tags(1.0)={[]}, sc.authorId=sc.authorId(1.0)={}, > sc.replyToId=sc.replyTo > Id(1.0)={}, sc.replyToAuthor=sc.replyToAuthor(1.0)={}, > sc.replyToAuthorId=sc.replyToAuthorId(1.0)={}, > sc.feedId=sc.feedId(1.0)={[124852]}, filterResult_124932_ti=filter > Result_124932_ti(1.0)={67}, > filterStatus_124932_s=filterStatus_124932_s(1.0)={COMPLETED},
+
Tom Hill 2010-12-10, 21:02
-
Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?
John Russell 2010-12-10, 21:33
Thanks a lot for the response.
Unfortunately I can't check the statistics page. For some reason the solr webapp itself is only returning a directory listing. This is sometimes fixed when I restart but if I do that I'll lose the state I have now. I can get at the JMX interface. Can I check my insanity level from there?
We did change two parts of the solr config to raise the size of the query Results and document cache. I assume from what you were saying that this does not have an effect on the cache I mentioned taking up all of the space.
<queryResultCache
class=*"solr.LRUCache"*
size=*"16384"*
initialSize=*"4096"*
autowarmCount=*"0"*/>
<documentCache
class=*"solr.LRUCache"*
size=*"16384"*
initialSize=*"16384"*
autowarmCount=*"0"*/> This problem gets worse as our index grows (5.0GB now). Unfortunately we are maxed out on memory for our hardware.
We aren't using faceting at all in our searches right now. We usually sort on 1 or 2 fields at the most. I think the types of our fields are pretty accurate, unfortunately they are mostly strings, and some dates.
How do the field definitions effect that cache? Is it simply that fewer fields mean less to cache? Does it not cache some fields configured in a certain way?
Is there a way to throw out an IndexReader after a while and restart, just to restart the cache? Or maybe explicitly clear it if we see it getting out of hand through JMX or something?
Really anything to stop it from choking like this would be awesome.
Thanks again.
John
On Fri, Dec 10, 2010 at 16:02, Tom Hill <[EMAIL PROTECTED]> wrote:
> Hi John, > > WeakReferences allow things to get GC'd, if there are no other > references to the object referred to. > > My understanding is that WeakHashMaps use weak references for the Keys > in the HashMap. > > What this means is that the keys in HashMap can be GC'd, once there > are no other references to the key. I _think_ this occurs when the > IndexReader is closed. > > It does not mean that objects in the FieldCache will get evicted in > low memory conditions, unless that field cache entry is no longer > needed (i.e. the IndexReader has closed). It just means they can be > collected, when they are no longer needed (but not before). > > So, if you are seeing the FieldCache for the current IndexReader > taking up 2.1, that's probably for the current cache usage. > > There isn't a "knob" you can turn to cut the cache size, but you can > evaluate your usage of the cache. Some ideas: > > How many fields are you searching on? Sorting on? Are you sorting on > String fields, where you could be using a numeric field? Numerics save > space. Do you need to sort on every field that you are sorting on? > Could you facet on fewer fields? For a String field, do you have too > many distinct values? If so, can you reduce the number or unique > terms? You might check your faceting algorithms, and see if you could > use enum, instead of fc for some of them. > > Check your statistics page, what's your insanity count? > > Tom > > > > On Fri, Dec 10, 2010 at 12:17 PM, John Russell <[EMAIL PROTECTED]> > wrote: > > I have been load testing solr 1.4.1 and have been running into OOM > errors. > > Not out of heap but with the GC overhead limit exceeded message meaning > that > > it didn't actually run out of heap space but just spent too much CPU time > > trying to make room and gave up. > > > > I got a heap dump and sent it through the Eclipse MAT and found that a > > single WeakHashMap in FieldCacheImpl called readerCache is taking up > 2.1GB > > of my 2.6GB heap. > > > > From my understanding of WeakHashMaps the GC should be able to collect > those > > references if it needs to but for some reason it isn't here. > > > > My questions are: > > > > 1) Any ideas why the GC is not collecting those weak references in that > > single hashmap? > > 2) Is there a nob in the solr config that can limit the size of that > cache?
+
John Russell 2010-12-10, 21:33
-
Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?
Erick Erickson 2010-12-11, 22:46
"unfortunately I can't check the statistics page. For some reason the solr webapp itself is only returning a directory listing."
This is very weird and makes me wonder if there's something really wonky with your system. I'm assuming when you say "the solr webapp itself" you're taking about ...localhost:8983/solr/admin/...... You might want to be looking at the stats page (and frantically hitting refresh) before you have problems. Alternately, you could record the queries as they are sent to solr to see what the offending
But onwards.... Tell us more about your dates. One of the very common ways people get into trouble is to use dates that are unix-style timestamps, i.e. in milliseconds (either as ints or strings) and sort on them. Trie fields are very much preferred for this.
Your index isn't all that large by regular standards, so I think that there's hope that you can get this working. Wait, wait, wait. Looking again at the stack trace I see that your OOM is happening when you *add* a document. Tell us more about the document, perhaps you can print out some characteristics of the doc before you add it? Is it always the same doc? Are you indexing and searching on the same machine? Is the doc really huge?
Best Erick On Fri, Dec 10, 2010 at 4:33 PM, John Russell <[EMAIL PROTECTED]> wrote:
> Thanks a lot for the response. > > Unfortunately I can't check the statistics page. For some reason the solr > webapp itself is only returning a directory listing. This is sometimes > fixed when I restart but if I do that I'll lose the state I have now. I > can > get at the JMX interface. Can I check my insanity level from there? > > We did change two parts of the solr config to raise the size of the query > Results and document cache. I assume from what you were saying that this > does not have an effect on the cache I mentioned taking up all of the > space. > > <queryResultCache > > class=*"solr.LRUCache"* > > size=*"16384"* > > initialSize=*"4096"* > > autowarmCount=*"0"*/> > > > > > > <documentCache > > class=*"solr.LRUCache"* > > size=*"16384"* > > initialSize=*"16384"* > > autowarmCount=*"0"*/> > > > This problem gets worse as our index grows (5.0GB now). Unfortunately we > are maxed out on memory for our hardware. > > We aren't using faceting at all in our searches right now. We usually sort > on 1 or 2 fields at the most. I think the types of our fields are pretty > accurate, unfortunately they are mostly strings, and some dates. > > How do the field definitions effect that cache? Is it simply that fewer > fields mean less to cache? Does it not cache some fields configured in a > certain way? > > Is there a way to throw out an IndexReader after a while and restart, just > to restart the cache? Or maybe explicitly clear it if we see it getting out > of hand through JMX or something? > > Really anything to stop it from choking like this would be awesome. > > Thanks again. > > John > > On Fri, Dec 10, 2010 at 16:02, Tom Hill <[EMAIL PROTECTED]> wrote: > > > Hi John, > > > > WeakReferences allow things to get GC'd, if there are no other > > references to the object referred to. > > > > My understanding is that WeakHashMaps use weak references for the Keys > > in the HashMap. > > > > What this means is that the keys in HashMap can be GC'd, once there > > are no other references to the key. I _think_ this occurs when the > > IndexReader is closed. > > > > It does not mean that objects in the FieldCache will get evicted in > > low memory conditions, unless that field cache entry is no longer > > needed (i.e. the IndexReader has closed). It just means they can be > > collected, when they are no longer needed (but not before). > > > > So, if you are seeing the FieldCache for the current IndexReader > > taking up 2.1, that's probably for the current cache usage. > > > > There isn't a "knob" you can turn to cut the cache size, but you can > > evaluate your usage of the cache. Some ideas:
+
Erick Erickson 2010-12-11, 22:46
-
Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?
John Russell 2010-12-13, 20:42
Thanks for the response.
The date types are defined in our schema file like this
<fieldType name="date" class="solr.TrieDateField" omitNorms="true" precisionStep="0" positionIncrementGap="0"/>
<!-- A Trie based date field for faster date range queries and date faceting. --> <fieldType name="tdate" class="solr.TrieDateField" omitNorms="true" precisionStep="6" positionIncrementGap="0"/>
Which appears to be what you mentioned. Then we use them in fields like this
<field name="sc.publishedDate" type="date" indexed="true" stored="false" required="false" multiValued="false" /> <field name="sc.createdDate" type="date" indexed="true" stored="false" required="false" multiValued="false" />
So I think we have the right datatypes for the dates. Most of the other ones are strings.
As for the doc we are adding, I don't think it would be considered "huge". It is basically blog posts and tweets broken out into fields like author, title, summary etc. Each doc probably isn't more than 1 or 2k tops. Some probably smaller.
We do create them once and then update the indexes as we perform work on the documents. For example, we create the doc for the original incoming post and then update that doc with tags or the results of filtering so we can look for them later.
We have solr set up as a separate JVM which we talk to over HTTP on the same box using the solrj client java library. Unfortunately we are on 32 bit hardware so solr can only get 2.6GB of memory. Any more than that and the JVM won't start.
I really just need a way to keep the cache from breaking the bank. As I pasted below there are some config elements in the XML that appear to be related to caching but I'm not sure that they are related to that specific hashmap which eventually grows to 2.1GB of our 2.6GB heap. It never actually runs out of heap space but GC's the CPU to death.
Thanks again.
John
On Sat, Dec 11, 2010 at 17:46, Erick Erickson <[EMAIL PROTECTED]>wrote:
> "unfortunately I can't check the statistics page. For some reason the solr > webapp itself is only returning a directory listing." > > This is very weird and makes me wonder if there's something really wonky > with your system. I'm assuming when you say "the solr webapp itself" you're > taking about ...localhost:8983/solr/admin/...... You might want to be > looking > at the stats page (and frantically hitting refresh) before you have > problems. > Alternately, you could record the queries as they are sent to solr to see > what > the offending > > But onwards.... Tell us more about your dates. One of the very common > ways people get into trouble is to use dates that are unix-style > timestamps, > i.e. in milliseconds (either as ints or strings) and sort on them. Trie > fields > are very much preferred for this. > > Your index isn't all that large by regular standards, so I think that > there's > hope that you can get this working. > > > Wait, wait, wait. Looking again at the stack trace I see that your OOM > is happening when you *add* a document. Tell us more about the > document, perhaps you can print out some characteristics of the doc > before you add it? Is it always the same doc? Are you indexing and > searching on the same machine? Is the doc really huge? > > Best > Erick > > > On Fri, Dec 10, 2010 at 4:33 PM, John Russell <[EMAIL PROTECTED]> wrote: > > > Thanks a lot for the response. > > > > Unfortunately I can't check the statistics page. For some reason the > solr > > webapp itself is only returning a directory listing. This is sometimes > > fixed when I restart but if I do that I'll lose the state I have now. I > > can > > get at the JMX interface. Can I check my insanity level from there? > > > > We did change two parts of the solr config to raise the size of the query > > Results and document cache. I assume from what you were saying that this > > does not have an effect on the cache I mentioned taking up all of the > > space. > > > > <queryResultCache
+
John Russell 2010-12-13, 20:42
-
Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?
Jonathan Rochkind 2010-12-13, 22:38
Forgive me if I've said this in this thread already, but I'm beginning to think this is the main 'mysterious' cause of Solr RAM/gc issues.
Are you committing very frequently? So frequently that you commit faster than it takes for warming operations on a new Solr index to complete, and you're getting over-lapping indexes being prepared?
But if the problem really is just GC issues and not actually too much RAM being used, try this JVM setting:
-XX:+UseConcMarkSweepGC
Will make GC happen in a different thread, instead of the same thread as solr operations.
I think that is also something that many many Solr installations probably need, but don't realize they need.
On 12/13/2010 3:42 PM, John Russell wrote: > Thanks for the response. > > The date types are defined in our schema file like this > > <fieldType name="date" class="solr.TrieDateField" omitNorms="true" > precisionStep="0" positionIncrementGap="0"/> > > <!-- A Trie based date field for faster date range queries and date > faceting. --> > <fieldType name="tdate" class="solr.TrieDateField" omitNorms="true" > precisionStep="6" positionIncrementGap="0"/> > > Which appears to be what you mentioned. Then we use them in fields like > this > > <field name="sc.publishedDate" type="date" indexed="true" stored="false" > required="false" multiValued="false" /> > <field name="sc.createdDate" type="date" indexed="true" stored="false" > required="false" multiValued="false" /> > > So I think we have the right datatypes for the dates. Most of the other > ones are strings. > > As for the doc we are adding, I don't think it would be considered "huge". > It is basically blog posts and tweets broken out into fields like author, > title, summary etc. Each doc probably isn't more than 1 or 2k tops. Some > probably smaller. > > We do create them once and then update the indexes as we perform work on the > documents. For example, we create the doc for the original incoming post > and then update that doc with tags or the results of filtering so we can > look for them later. > > We have solr set up as a separate JVM which we talk to over HTTP on the same > box using the solrj client java library. Unfortunately we are on 32 bit > hardware so solr can only get 2.6GB of memory. Any more than that and the > JVM won't start. > > I really just need a way to keep the cache from breaking the bank. As I > pasted below there are some config elements in the XML that appear to be > related to caching but I'm not sure that they are related to that specific > hashmap which eventually grows to 2.1GB of our 2.6GB heap. It never > actually runs out of heap space but GC's the CPU to death. > > Thanks again. > > John > > On Sat, Dec 11, 2010 at 17:46, Erick Erickson<[EMAIL PROTECTED]>wrote: > >> "unfortunately I can't check the statistics page. For some reason the solr >> webapp itself is only returning a directory listing." >> >> This is very weird and makes me wonder if there's something really wonky >> with your system. I'm assuming when you say "the solr webapp itself" you're >> taking about ...localhost:8983/solr/admin/...... You might want to be >> looking >> at the stats page (and frantically hitting refresh) before you have >> problems. >> Alternately, you could record the queries as they are sent to solr to see >> what >> the offending >> >> But onwards.... Tell us more about your dates. One of the very common >> ways people get into trouble is to use dates that are unix-style >> timestamps, >> i.e. in milliseconds (either as ints or strings) and sort on them. Trie >> fields >> are very much preferred for this. >> >> Your index isn't all that large by regular standards, so I think that >> there's >> hope that you can get this working. >> >> >> Wait, wait, wait. Looking again at the stack trace I see that your OOM >> is happening when you *add* a document. Tell us more about the >> document, perhaps you can print out some characteristics of the doc >> before you add it? Is it always the same doc? Are you indexing and
+
Jonathan Rochkind 2010-12-13, 22:38
-
Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?
John Russell 2010-12-14, 01:47
Wow, you read my mind. We are committing very frequently. We are trying to get as close to realtime access to the stuff we put in as possible. Our current commit time is... ahem.... every 4 seconds.
Is that insane?
I'll try the ConcMarkSweep as well and see if that helps.
On Mon, Dec 13, 2010 at 17:38, Jonathan Rochkind <[EMAIL PROTECTED]> wrote:
> Forgive me if I've said this in this thread already, but I'm beginning to > think this is the main 'mysterious' cause of Solr RAM/gc issues. > > Are you committing very frequently? So frequently that you commit faster > than it takes for warming operations on a new Solr index to complete, and > you're getting over-lapping indexes being prepared? > > But if the problem really is just GC issues and not actually too much RAM > being used, try this JVM setting: > > -XX:+UseConcMarkSweepGC > > Will make GC happen in a different thread, instead of the same thread as > solr operations. > > I think that is also something that many many Solr installations probably > need, but don't realize they need. > > > On 12/13/2010 3:42 PM, John Russell wrote: > >> Thanks for the response. >> >> The date types are defined in our schema file like this >> >> <fieldType name="date" class="solr.TrieDateField" omitNorms="true" >> precisionStep="0" positionIncrementGap="0"/> >> >> <!-- A Trie based date field for faster date range queries and date >> faceting. --> >> <fieldType name="tdate" class="solr.TrieDateField" omitNorms="true" >> precisionStep="6" positionIncrementGap="0"/> >> >> Which appears to be what you mentioned. Then we use them in fields like >> this >> >> <field name="sc.publishedDate" type="date" indexed="true" >> stored="false" >> required="false" multiValued="false" /> >> <field name="sc.createdDate" type="date" indexed="true" stored="false" >> required="false" multiValued="false" /> >> >> So I think we have the right datatypes for the dates. Most of the other >> ones are strings. >> >> As for the doc we are adding, I don't think it would be considered "huge". >> It is basically blog posts and tweets broken out into fields like author, >> title, summary etc. Each doc probably isn't more than 1 or 2k tops. Some >> probably smaller. >> >> We do create them once and then update the indexes as we perform work on >> the >> documents. For example, we create the doc for the original incoming post >> and then update that doc with tags or the results of filtering so we can >> look for them later. >> >> We have solr set up as a separate JVM which we talk to over HTTP on the >> same >> box using the solrj client java library. Unfortunately we are on 32 bit >> hardware so solr can only get 2.6GB of memory. Any more than that and the >> JVM won't start. >> >> I really just need a way to keep the cache from breaking the bank. As I >> pasted below there are some config elements in the XML that appear to be >> related to caching but I'm not sure that they are related to that specific >> hashmap which eventually grows to 2.1GB of our 2.6GB heap. It never >> actually runs out of heap space but GC's the CPU to death. >> >> Thanks again. >> >> John >> >> On Sat, Dec 11, 2010 at 17:46, Erick Erickson<[EMAIL PROTECTED] >> >wrote: >> >> "unfortunately I can't check the statistics page. For some reason the >>> solr >>> webapp itself is only returning a directory listing." >>> >>> This is very weird and makes me wonder if there's something really wonky >>> with your system. I'm assuming when you say "the solr webapp itself" >>> you're >>> taking about ...localhost:8983/solr/admin/...... You might want to be >>> looking >>> at the stats page (and frantically hitting refresh) before you have >>> problems. >>> Alternately, you could record the queries as they are sent to solr to see >>> what >>> the offending >>> >>> But onwards.... Tell us more about your dates. One of the very common >>> ways people get into trouble is to use dates that are unix-style >>> timestamps, >>> i.e. in milliseconds (either as ints or strings) and sort on them. Trie
+
John Russell 2010-12-14, 01:47
-
Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?
Yonik Seeley 2010-12-14, 02:07
On Mon, Dec 13, 2010 at 8:47 PM, John Russell <[EMAIL PROTECTED]> wrote: > Wow, you read my mind. We are committing very frequently. We are trying to > get as close to realtime access to the stuff we put in as possible. Our > current commit time is... ahem.... every 4 seconds. > > Is that insane? Not necessarily insane, but challenging ;-) I'd start by setting maxWarmingSearchers to 1 in solrconfig.xml. When that is exceeded, a commit will fail (this just means a new searcher won't be opened on that commit... the docs will be visible with the next commit that does succeed.) -Yonik http://www.lucidimagination.com
+
Yonik Seeley 2010-12-14, 02:07
-
RE: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?
Jonathan Rochkind 2010-12-14, 02:27
Wow, really, it's that easy? I could swear there's a wiki page somewhere that suggests otherwise, but I believe Yonik today over a wiki page last edited wherever. But this should be well-publisized, it's a pretty easy solution that will at least give you "as up to date as your Solr can handle", to a problem that many people seem to be having. I would suggest a maxWarmingSearchers 1 example should at least be included commented out in the example solrconfig.xml, if not even included live. (This would be even better if, on a commit failing due to maxWarmingSearchers, Solr would automatically commit them when the warming is complete -- instead of relying on another commit manually being made at some future point. Is there any built-in hook for 'warming complete' or 'index fully ready' that could be used to jury-rig this?) Yonik, how will maxWarmingSearchers in this scenario effect replication? If a slave is pulling down new indexes so quickly that the warming searchers would ordinarily pile up, but maxWarmingSearchers is set to 1.... what happens? ________________________________________ From: [EMAIL PROTECTED] [[EMAIL PROTECTED]] On Behalf Of Yonik Seeley [[EMAIL PROTECTED]] Sent: Monday, December 13, 2010 9:07 PM To: [EMAIL PROTECTED] Subject: Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected? On Mon, Dec 13, 2010 at 8:47 PM, John Russell <[EMAIL PROTECTED]> wrote: > Wow, you read my mind. We are committing very frequently. We are trying to > get as close to realtime access to the stuff we put in as possible. Our > current commit time is... ahem.... every 4 seconds. > > Is that insane? Not necessarily insane, but challenging ;-) I'd start by setting maxWarmingSearchers to 1 in solrconfig.xml. When that is exceeded, a commit will fail (this just means a new searcher won't be opened on that commit... the docs will be visible with the next commit that does succeed.) -Yonik http://www.lucidimagination.com
+
Jonathan Rochkind 2010-12-14, 02:27
-
Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?
Yonik Seeley 2010-12-14, 03:41
On Mon, Dec 13, 2010 at 9:27 PM, Jonathan Rochkind <[EMAIL PROTECTED]> wrote: > Yonik, how will maxWarmingSearchers in this scenario effect replication? If a slave is pulling down new indexes so quickly that the warming searchers would ordinarily pile up, but maxWarmingSearchers is set to 1.... what happens? Like any other commits, this will limit the number of searchers warming in the background to 1. If a commit is called, and that tries to open a new searcher while another is already warming, it will fail. The next commit that does succeed will have all the updates though. Today, this maxWarmingSearchers check is done after the writer has closed and before a new searcher is opened... so calling commit too often won't affect searching, but it will currently affect indexing speed (since the IndexWriter is constantly being closed/flushed). -Yonik http://www.lucidimagination.com
+
Yonik Seeley 2010-12-14, 03:41
-
RE: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?
Jonathan Rochkind 2010-12-14, 04:11
Sorry, I guess I don't understand the details of replication enough. So slave tries to replicate. It pulls down the new index files. It tries to do a commit but fails. But "the next commit that does succeed will have all the updates." Since it's a slave, it doesn't get any commits of it's own. But then some amount of time later, it does another replication pull. There are at this time maybe no _new_ changes since the last failed replication pull. Does this trigger a commit that will get those previous changes actually added to the slave? In the meantime, between commits.. are those potentially large pulled new index files sitting around somewhere but not replacing the old slave index files, doubling disk space for those files? Thanks for any clarification. Jonathan ________________________________________ From: [EMAIL PROTECTED] [[EMAIL PROTECTED]] On Behalf Of Yonik Seeley [[EMAIL PROTECTED]] Sent: Monday, December 13, 2010 10:41 PM To: [EMAIL PROTECTED] Subject: Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected? On Mon, Dec 13, 2010 at 9:27 PM, Jonathan Rochkind <[EMAIL PROTECTED]> wrote: > Yonik, how will maxWarmingSearchers in this scenario effect replication? If a slave is pulling down new indexes so quickly that the warming searchers would ordinarily pile up, but maxWarmingSearchers is set to 1.... what happens? Like any other commits, this will limit the number of searchers warming in the background to 1. If a commit is called, and that tries to open a new searcher while another is already warming, it will fail. The next commit that does succeed will have all the updates though. Today, this maxWarmingSearchers check is done after the writer has closed and before a new searcher is opened... so calling commit too often won't affect searching, but it will currently affect indexing speed (since the IndexWriter is constantly being closed/flushed). -Yonik http://www.lucidimagination.com
+
Jonathan Rochkind 2010-12-14, 04:11
-
RE: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?
Upayavira 2010-12-14, 07:23
The second commit will bring in all changes, from both syncs. Think of the sync part as a glorified rsync of files on disk. So the files will have been copied to disk, but the in memory index on the slave will not have noticed that those files have changed. The commit is intended to remedy that - it causes a new index reader to be created, based upon the new on disk files, which will include updates from both syncs. Upayavira On Mon, 13 Dec 2010 23:11 -0500, "Jonathan Rochkind" <[EMAIL PROTECTED]> wrote: > Sorry, I guess I don't understand the details of replication enough. > > So slave tries to replicate. It pulls down the new index files. It tries > to do a commit but fails. But "the next commit that does succeed will > have all the updates." Since it's a slave, it doesn't get any commits of > it's own. But then some amount of time later, it does another replication > pull. There are at this time maybe no _new_ changes since the last failed > replication pull. Does this trigger a commit that will get those previous > changes actually added to the slave? > > In the meantime, between commits.. are those potentially large pulled new > index files sitting around somewhere but not replacing the old slave > index files, doubling disk space for those files? > > Thanks for any clarification. > > Jonathan > ________________________________________ > From: [EMAIL PROTECTED] [[EMAIL PROTECTED]] On Behalf Of Yonik Seeley > [[EMAIL PROTECTED]] > Sent: Monday, December 13, 2010 10:41 PM > To: [EMAIL PROTECTED] > Subject: Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't > WeakHashMap getting collected? > > On Mon, Dec 13, 2010 at 9:27 PM, Jonathan Rochkind <[EMAIL PROTECTED]> > wrote: > > Yonik, how will maxWarmingSearchers in this scenario effect replication? If a slave is pulling down new indexes so quickly that the warming searchers would ordinarily pile up, but maxWarmingSearchers is set to 1.... what happens? > > Like any other commits, this will limit the number of searchers > warming in the background to 1. If a commit is called, and that tries > to open a new searcher while another is already warming, it will fail. > The next commit that does succeed will have all the updates though. > > Today, this maxWarmingSearchers check is done after the writer has > closed and before a new searcher is opened... so calling commit too > often won't affect searching, but it will currently affect indexing > speed (since the IndexWriter is constantly being closed/flushed). > > -Yonik > http://www.lucidimagination.com>
+
Upayavira 2010-12-14, 07:23
-
RE: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?
Jonathan Rochkind 2010-12-14, 14:15
But the entirety of the old indexes (no longer on disk) wasn't cached in memory, right? Or is it? Maybe this is me not understanding lucene enough. I thought that portions of the index were cached in disk, but that sometimes the index reader still has to go to disk to get things that aren't currently in caches. If this is true (tell me if it's not!), we have an index reader that was based on indexes that... are no longer on disk. But the index reader is still open. What happens when it has to go to disk for info? And the second replication will trigger a commit even if there are in fact no new files to be transfered over to slave, because there have been no changes since the prior sync with failed commit? ________________________________________ From: Upayavira [[EMAIL PROTECTED]] Sent: Tuesday, December 14, 2010 2:23 AM To: [EMAIL PROTECTED] Subject: RE: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected? The second commit will bring in all changes, from both syncs. Think of the sync part as a glorified rsync of files on disk. So the files will have been copied to disk, but the in memory index on the slave will not have noticed that those files have changed. The commit is intended to remedy that - it causes a new index reader to be created, based upon the new on disk files, which will include updates from both syncs. Upayavira On Mon, 13 Dec 2010 23:11 -0500, "Jonathan Rochkind" <[EMAIL PROTECTED]> wrote: > Sorry, I guess I don't understand the details of replication enough. > > So slave tries to replicate. It pulls down the new index files. It tries > to do a commit but fails. But "the next commit that does succeed will > have all the updates." Since it's a slave, it doesn't get any commits of > it's own. But then some amount of time later, it does another replication > pull. There are at this time maybe no _new_ changes since the last failed > replication pull. Does this trigger a commit that will get those previous > changes actually added to the slave? > > In the meantime, between commits.. are those potentially large pulled new > index files sitting around somewhere but not replacing the old slave > index files, doubling disk space for those files? > > Thanks for any clarification. > > Jonathan > ________________________________________ > From: [EMAIL PROTECTED] [[EMAIL PROTECTED]] On Behalf Of Yonik Seeley > [[EMAIL PROTECTED]] > Sent: Monday, December 13, 2010 10:41 PM > To: [EMAIL PROTECTED] > Subject: Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't > WeakHashMap getting collected? > > On Mon, Dec 13, 2010 at 9:27 PM, Jonathan Rochkind <[EMAIL PROTECTED]> > wrote: > > Yonik, how will maxWarmingSearchers in this scenario effect replication? If a slave is pulling down new indexes so quickly that the warming searchers would ordinarily pile up, but maxWarmingSearchers is set to 1.... what happens? > > Like any other commits, this will limit the number of searchers > warming in the background to 1. If a commit is called, and that tries > to open a new searcher while another is already warming, it will fail. > The next commit that does succeed will have all the updates though. > > Today, this maxWarmingSearchers check is done after the writer has > closed and before a new searcher is opened... so calling commit too > often won't affect searching, but it will currently affect indexing > speed (since the IndexWriter is constantly being closed/flushed). > > -Yonik > http://www.lucidimagination.com>
+
Jonathan Rochkind 2010-12-14, 14:15
-
RE: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?
Upayavira 2010-12-14, 14:53
A Lucene index is made up of segments. Each commit writes a segment. Sometimes, upon commit, some segments are merged together into one, to reduce the overall segment count, as too many segments hinders performance. Upon optimisation, all segments are (typically) merged into a single segment.
Replication copies any new segments from the master to the slave, whether they be new segments arriving from a commit, or new segments that are a result of a segment merge. The result is a set of index files on disk that are a clean mirror of the master.
Then, when your replication process has finished syncing changed segments, it fires a commit on the slave. This causes Solr to create a new index reader.
When the first query comes in, this triggers Solr to populate caches. Whoever was unfortunate to cause that cache population will see poorer results (we've seen 40s responses rather than 1s).
The solution to this is to set up an autowarming query in solrconfig.xml. This query is executed against the new index reader, causing caches to populate from the updated files on disk. Only once that autowarming query has completed will the index reader be made available to Solr for answering search queries.
There's some cleverness that I can't remember the details of specifying how much to keep from the existing caches, and how much to build up from the files on disk. If I recall, it is all configured in solrconfig.xml.
You ask a good question whether a commit will be triggered if the sync brought over no new files (i.e. if the previous one did, but this one didn't). I'd imagine that Solr would compare the maximum segment ID on disk with the one in memory to make such a decision, in which case Solr would spot the changes from the previous sync and still work. The best way to be sure is to try it!
The simplest way to try it (as I would do it) would be to:
1) switch off post-commit replication 2) post some content to solr 3) commit on the master 4) use rsync to copy the indexes from the master to the slave 5) do another (empty) commit on the master 6) trigger replication via an HTTP request to the slave 7) See if your posted content is available on your slave.
Maybe someone else here can tell you what is actually going on and save you the effort!
Does that help you get some understand what is going on?
Upayavira
On Tue, 14 Dec 2010 09:15 -0500, "Jonathan Rochkind" <[EMAIL PROTECTED]> wrote: > But the entirety of the old indexes (no longer on disk) wasn't cached in > memory, right? Or is it? Maybe this is me not understanding lucene > enough. I thought that portions of the index were cached in disk, but > that sometimes the index reader still has to go to disk to get things > that aren't currently in caches. If this is true (tell me if it's not!), > we have an index reader that was based on indexes that... are no longer > on disk. But the index reader is still open. What happens when it has to > go to disk for info? > > And the second replication will trigger a commit even if there are in > fact no new files to be transfered over to slave, because there have been > no changes since the prior sync with failed commit? > ________________________________________ > From: Upayavira [[EMAIL PROTECTED]] > Sent: Tuesday, December 14, 2010 2:23 AM > To: [EMAIL PROTECTED] > Subject: RE: OutOfMemory GC: GC overhead limit exceeded - Why isn't > WeakHashMap getting collected? > > The second commit will bring in all changes, from both syncs. > > Think of the sync part as a glorified rsync of files on disk. So the > files will have been copied to disk, but the in memory index on the > slave will not have noticed that those files have changed. The commit is > intended to remedy that - it causes a new index reader to be created, > based upon the new on disk files, which will include updates from both > syncs. > > Upayavira > > On Mon, 13 Dec 2010 23:11 -0500, "Jonathan Rochkind" <[EMAIL PROTECTED]> > wrote: > > Sorry, I guess I don't understand the details of replication enough.
+
Upayavira 2010-12-14, 14:53
-
Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?
Jonathan Rochkind 2010-12-14, 16:02
Yeah, I understand basically how caches work.
What I don't understand is what happens in replication if, the new segment files are succesfully copied, but the actual commit fails due to maxAutoWarmingSearches. The new files are on disk... but the commit could not succeed and there is NOT a new index reader, because the commit failed. And there is potentially a long gap before a future succesful commit.
1. Will the existing index searcher have problems because the files have been changed out from under it?
2. Will a future replication -- at which NO new files are available on master -- still trigger a future commit on slave?
Maybe these are obvious to everyone but me, because I keep asking this question, and the answer I keep getting is just describing the basics of replication, as if this obviously answers my question.
Or maybe the answer isn't obvious or clear to anyone including me, in which case the only way to get an answer is to try and test it myself. A bit complicated to test, at least for my level of knowledge, as I'm not sure exactly what I'd be looking for to answer either of those questions.
Jonathan
On 12/14/2010 9:53 AM, Upayavira wrote: > A Lucene index is made up of segments. Each commit writes a segment. > Sometimes, upon commit, some segments are merged together into one, to > reduce the overall segment count, as too many segments hinders > performance. Upon optimisation, all segments are (typically) merged into > a single segment. > > Replication copies any new segments from the master to the slave, > whether they be new segments arriving from a commit, or new segments > that are a result of a segment merge. The result is a set of index files > on disk that are a clean mirror of the master. > > Then, when your replication process has finished syncing changed > segments, it fires a commit on the slave. This causes Solr to create a > new index reader. > > When the first query comes in, this triggers Solr to populate caches. > Whoever was unfortunate to cause that cache population will see poorer > results (we've seen 40s responses rather than 1s). > > The solution to this is to set up an autowarming query in > solrconfig.xml. This query is executed against the new index reader, > causing caches to populate from the updated files on disk. Only once > that autowarming query has completed will the index reader be made > available to Solr for answering search queries. > > There's some cleverness that I can't remember the details of specifying > how much to keep from the existing caches, and how much to build up from > the files on disk. If I recall, it is all configured in solrconfig.xml. > > You ask a good question whether a commit will be triggered if the sync > brought over no new files (i.e. if the previous one did, but this one > didn't). I'd imagine that Solr would compare the maximum segment ID on > disk with the one in memory to make such a decision, in which case Solr > would spot the changes from the previous sync and still work. The best > way to be sure is to try it! > > The simplest way to try it (as I would do it) would be to: > > 1) switch off post-commit replication > 2) post some content to solr > 3) commit on the master > 4) use rsync to copy the indexes from the master to the slave > 5) do another (empty) commit on the master > 6) trigger replication via an HTTP request to the slave > 7) See if your posted content is available on your slave. > > Maybe someone else here can tell you what is actually going on and save > you the effort! > > Does that help you get some understand what is going on? > > Upayavira > > On Tue, 14 Dec 2010 09:15 -0500, "Jonathan Rochkind"<[EMAIL PROTECTED]> > wrote: >> But the entirety of the old indexes (no longer on disk) wasn't cached in >> memory, right? Or is it? Maybe this is me not understanding lucene >> enough. I thought that portions of the index were cached in disk, but >> that sometimes the index reader still has to go to disk to get things
+
Jonathan Rochkind 2010-12-14, 16:02
-
Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?
Shawn Heisey 2010-12-14, 16:38
On 12/14/2010 9:02 AM, Jonathan Rochkind wrote: > 1. Will the existing index searcher have problems because the files > have been changed out from under it? > > 2. Will a future replication -- at which NO new files are available on > master -- still trigger a future commit on slave?
I'm not really sure of the answer to #2, but I believe I can answer #1. Lucene is designed so that all files necessary for an index to work are kept around after a commit until there is a new searcher to take over all requests with the new files. If you are replicating only new segments, the old files will still be there both before and after. If you just optimized the master and therefore are copying an entire new index, the old one will not be removed until there is a successful commit and therefore a new searcher.
There is another thread on replication that I just replied to as well. Solr actually seems a little too intent on keeping old files around - see SOLR-1781.
Shawn
+
Shawn Heisey 2010-12-14, 16:38
-
Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?
Jonathan Rochkind 2010-12-14, 20:19
Thanks Shawn, that helps explain things.
So the issue there, with using maxSearchWarmers to try and prevent out of control RAM/CPU usage from over-lapping on-deck, combined with replication... is if you're still pulling down replications very frequently but using maxSearchWarmers to prevent overlapping on-deck, you'll save RAM/CPU but might trade that off to instead use a LOT of disk space for multiple versions of index segment files, until a commit finally goes through.
On 12/14/2010 11:38 AM, Shawn Heisey wrote: > On 12/14/2010 9:02 AM, Jonathan Rochkind wrote: >> 1. Will the existing index searcher have problems because the files >> have been changed out from under it? >> >> 2. Will a future replication -- at which NO new files are available on >> master -- still trigger a future commit on slave? > I'm not really sure of the answer to #2, but I believe I can answer #1. > Lucene is designed so that all files necessary for an index to work are > kept around after a commit until there is a new searcher to take over > all requests with the new files. If you are replicating only new > segments, the old files will still be there both before and after. If > you just optimized the master and therefore are copying an entire new > index, the old one will not be removed until there is a successful > commit and therefore a new searcher. > > There is another thread on replication that I just replied to as well. > Solr actually seems a little too intent on keeping old files around - > see SOLR-1781. > > Shawn > >
+
Jonathan Rochkind 2010-12-14, 20:19
-
RE: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?
Jonathan Rochkind 2010-12-14, 02:23
ConcMarkSweep probably won't help. Solr 1.4 is not very good at 'near real time' committing. There are some features post-1.4, that I don't know if they are in trunk yet or still just patches, that I have not investigated myself, but google (or JIRA search) for 'near real time'. http://wiki.apache.org/solr/SolrPerformanceFactors#Updates_and_Commit_Frequency_TradeoffsThis seems to be a very frequent issue these days; everyone running Solr should at least read that wiki section to understand what's going on. ________________________________________ From: John Russell [[EMAIL PROTECTED]] Sent: Monday, December 13, 2010 8:47 PM To: [EMAIL PROTECTED] Subject: Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected? Wow, you read my mind. We are committing very frequently. We are trying to get as close to realtime access to the stuff we put in as possible. Our current commit time is... ahem.... every 4 seconds. Is that insane? I'll try the ConcMarkSweep as well and see if that helps. On Mon, Dec 13, 2010 at 17:38, Jonathan Rochkind <[EMAIL PROTECTED]> wrote: > Forgive me if I've said this in this thread already, but I'm beginning to > think this is the main 'mysterious' cause of Solr RAM/gc issues. > > Are you committing very frequently? So frequently that you commit faster > than it takes for warming operations on a new Solr index to complete, and > you're getting over-lapping indexes being prepared? > > But if the problem really is just GC issues and not actually too much RAM > being used, try this JVM setting: > > -XX:+UseConcMarkSweepGC > > Will make GC happen in a different thread, instead of the same thread as > solr operations. > > I think that is also something that many many Solr installations probably > need, but don't realize they need. > > > On 12/13/2010 3:42 PM, John Russell wrote: > >> Thanks for the response. >> >> The date types are defined in our schema file like this >> >> <fieldType name="date" class="solr.TrieDateField" omitNorms="true" >> precisionStep="0" positionIncrementGap="0"/> >> >> <!-- A Trie based date field for faster date range queries and date >> faceting. --> >> <fieldType name="tdate" class="solr.TrieDateField" omitNorms="true" >> precisionStep="6" positionIncrementGap="0"/> >> >> Which appears to be what you mentioned. Then we use them in fields like >> this >> >> <field name="sc.publishedDate" type="date" indexed="true" >> stored="false" >> required="false" multiValued="false" /> >> <field name="sc.createdDate" type="date" indexed="true" stored="false" >> required="false" multiValued="false" /> >> >> So I think we have the right datatypes for the dates. Most of the other >> ones are strings. >> >> As for the doc we are adding, I don't think it would be considered "huge". >> It is basically blog posts and tweets broken out into fields like author, >> title, summary etc. Each doc probably isn't more than 1 or 2k tops. Some >> probably smaller. >> >> We do create them once and then update the indexes as we perform work on >> the >> documents. For example, we create the doc for the original incoming post >> and then update that doc with tags or the results of filtering so we can >> look for them later. >> >> We have solr set up as a separate JVM which we talk to over HTTP on the >> same >> box using the solrj client java library. Unfortunately we are on 32 bit >> hardware so solr can only get 2.6GB of memory. Any more than that and the >> JVM won't start. >> >> I really just need a way to keep the cache from breaking the bank. As I >> pasted below there are some config elements in the XML that appear to be >> related to caching but I'm not sure that they are related to that specific >> hashmap which eventually grows to 2.1GB of our 2.6GB heap. It never >> actually runs out of heap space but GC's the CPU to death. >> >> Thanks again. >> >> John >> >> On Sat, Dec 11, 2010 at 17:46, Erick Erickson<[EMAIL PROTECTED]
+
Jonathan Rochkind 2010-12-14, 02:23
-
Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?
Shawn Heisey 2010-12-14, 02:31
On 12/13/2010 3:38 PM, Jonathan Rochkind wrote: > But if the problem really is just GC issues and not actually too much > RAM being used, try this JVM setting: > > -XX:+UseConcMarkSweepGC
That's I use on my shards, I've never had any visible problems with memory or garbage collection delays. I have not done any kind of profiling, though.
The servers (CentOS Xen VMs) have 9GB of total RAM and serve indexes that are nearing 15GB in size and have over 8 million documents. Important parts of my java commandline:
-Xms512M -Xmx2048M -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
java version "1.6.0_22" Java(TM) SE Runtime Environment (build 1.6.0_22-b04) Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode)
Shawn
+
Shawn Heisey 2010-12-14, 02:31
|