|
|
-
How to cap facet counts beyond a specified limit
Andrew Laird 2012-06-07, 08:01
We have an index with ~100M documents and I am looking for a simple way to speed up faceted searches. Is there a relatively straightforward way to stop counting the number of matching documents beyond some specifiable value? For our needs we don't really need to know that a particular facet has exactly 14,203,527 matches - just knowing that there are "more than a million" is enough. If I could somehow limit the hit counts to a million (say) it seems like that could decrease the work required to compute the values (just stop counting after the limit is reached) and potentially improve faceted search time - especially when we have 20-30 fields to facet on. Has anyone else tried to do something like this?
Many thanks for comments and info,
Sincerely, andy laird | gettyimages | 206.925.6728
-
Re: How to cap facet counts beyond a specified limit
Jack Krupansky 2012-06-07, 15:53
Sounds like an interesting improvement to propose.
It will also depend on various factors, such as number of unique terms in a field, field type, etc.
Which field types are giving you the most trouble and how many unique values do they have? And do you specify a facet.method or just let it default?
What release of Solr are you on? Are you using "trie" for numeric fields? Are these mostly string fields? Any boolean fields?
-- Jack Krupansky
-----Original Message----- From: Andrew Laird Sent: Thursday, June 07, 2012 4:01 AM To: [EMAIL PROTECTED] Subject: How to cap facet counts beyond a specified limit
We have an index with ~100M documents and I am looking for a simple way to speed up faceted searches. Is there a relatively straightforward way to stop counting the number of matching documents beyond some specifiable value? For our needs we don't really need to know that a particular facet has exactly 14,203,527 matches - just knowing that there are "more than a million" is enough. If I could somehow limit the hit counts to a million (say) it seems like that could decrease the work required to compute the values (just stop counting after the limit is reached) and potentially improve faceted search time - especially when we have 20-30 fields to facet on. Has anyone else tried to do something like this?
Many thanks for comments and info,
Sincerely, andy laird | gettyimages | 206.925.6728
-
Re: How to cap facet counts beyond a specified limit
Toke Eskildsen 2012-06-08, 10:32
On Thu, 2012-06-07 at 10:01 +0200, Andrew Laird wrote: > For our needs we don't really need to know that a particular facet has > exactly 14,203,527 matches - just knowing that there are "more than a > million" is enough. If I could somehow limit the hit counts to a > million (say) [...]
It should be feasible to stop the collector after 1M documents has been processed. If nothing else then just by ignoring subsequent IDs. However, the ID's received would be in index-order, which normally means old-to-new. If the nature of the corpus, and thereby the facet values, changes over time, this change would not be reflected in the facets that has many hits as the collector never reaches the newer documents.
> it seems like that could decrease the work required to > compute the values (just stop counting after the limit is reached) and > potentially improve faceted search time - especially when we have 20-30 > fields to facet on. Has anyone else tried to do something like this?
The current Solr facet implementation treats every facet structure individually. It works fine in a lot of areas but it also means that the list of IDs for matching documents is iterated once for every facet: In the sample case, 14M+ hits * 25 fields = 350M+ hits processed.
I have been experimenting with an alternative approach (SOLR-2412) that packs the terms in the facets as a single structure underneath the hood, which means only 14M+ hits processed in the current case. Unfortunately it is not mature and only works for text fields.
- Toke Eskildsen, State and University Library, Denmark
|
|