Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Lucene and all its subprojects:

Switch to Threaded View
Solr >> mail # user >> FunctionQueries and FieldCache and OOM


Copy link to this message
-
FunctionQueries and FieldCache and OOM
Hi,

In one of the environments i'm working on (4 Solr 1.4.1. nodes with
replication, 3+ million docs, ~5.5GB index size, high commit rate (~1-2min),
high query rate (~50q/s), high number of updates (~1000docs/commit)) the nodes
continuously run out of memory.

During development we frequently ran excessive stress tests and after tuning
JVM and Solr settings all ran fine. A while ago i added the DisMax bq parameter
for boosting recent documents, documents older than a day receive 50% less
boost, similar to the example but with a much steeper slope. For clarity, i'm
not using the ordinal function but the reciprocal version in the bq parameter
which is warned against when using Solr 1.4.1 according to the wiki.

This week we started the stress tests and nodes are going down again. I've
reconfigured the nodes to have different settings for the bq parameter (or no bq
parameter).

It seems the bq the cause of the misery.

Issue SOLR-1111 keeps popping up but it has not been resolved. Is there anyone
who can confirm one of those patches fixes this issue before i waste hours of
work finding out it doesn't? ;)

Am i correct when i assume that Lucene FieldCache entries are added for each
unique function query?  In that case, every query is a unique cache entry
because it operates on milliseconds. If all doesn't work i might be able to
reduce precision by operating on minutes or even more instead of milli
seconds. I, however, cannot use other nice math function in the ms() parameter
so that might make things difficult.

However, date math seems available (NOW/HOUR) so i assume it would also work
for <SOME_DATE_FIELD>/HOUR as well. This way i just might prevent useless
entries.

My apologies for this long mail but it may prove useful for other users and
hopefully we find the solution and can update the wiki to add this warning.

Cheers,
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB