-Re: [Solr Wiki] Update of "SolrFacetingOverview" by JJLarrea
Erik Hatcher 2006-12-28, 03:37
JJ: Fantastic - this is excellent info, and sharing it helps a LOT!
On Dec 27, 2006, at 7:25 PM, Apache Wiki wrote:
> Dear Wiki user,
> You have subscribed to a wiki page or wiki category on "Solr Wiki"
> for change notification.
> The following page has been changed by JJLarrea:
> The comment on the change is:
> Added page per 12/8/06 suggestion by Yonik
> New page:
> = Faceting Overview >
> Solr provides a [http://incubator.apache.org/solr/docs/api/org/
> apache/solr/request/SimpleFacets.html Simple Faceting toolkit]
> which can be reused by various Request Handlers to include "Facet
> counts" of based on some simple criteria. Both the
> StandardRequestHandler and the DisMaxRequestHandler currently use
> these utilities. Detailed descriptions of the parameters used to
> control faceting can be found (along with several examples) at
> This page briefly provides some general background information:
> = Facet Indexing >
> Faceting is done on __indexed__ rather than __stored__ values.
> This is because the primary use for faceting is drilldown into a
> subset of hits resulting from a query, and so the chosen facet
> value is used to construct a filter query which literally matches
> that value in the index. For the stock Solr request handlers this
> is done by adding an `fq=<facet-field>:<quoted facet-value>`
> parameter and resubmitting the query.
> Because faceting fields are often specified to serve two purposes,
> human-readable text and drill-down query value, they are frequently
> indexed differently from fields used for searching and sorting:
> * They are not tokenized into separate words
> * They are not mapped into lower case
> * Human-readable punctuation is not removed (other than double-
> * There is often no need to store them, since stored values would
> look much like indexed values and the faceting mechanism is used
> for value retrieval.
> * Depending on how the field is defined the SimpleFacets
> mechanism may only allow for a single value per field per document
> (see below)
> As an example, if I had a field with a list of authors, such as:
> Schildt, Herbert; Wolpert, Lewis; Davies, P.
> I might want to index the same data differently in three different
> fields (perhaps using the Solr [:SchemaXml#Copy Fields:copyField]
> * For searching: Tokenized, case-folded, punctuation-stripped:
> schildt / herbert / wolpert / lewis / davies / p
> * For sorting: Untokenized, case-folded, punctuation-stripped:
> schildt herbert wolpert lewis davies p
> * For faceting: Primary author only, using a `solr.StringField`:
> Schildt, Herbert
> Then when the user drills down on the "Schildt, Herbert" string I
> would reissue the query with an added fq="Schild, Herbert" parameter.
> = Facet Operation >
> Currently SimpleFacets has 3 modes of operation:
> == FacetQueries =>
> Any number of [:SimpleFacetParameters#facet.query:facet.query]
> parameters can be passed to the request handler. Each distinct
> facet.query will first be executed against the entire index, with
> the results cached as a hashed set (if fewer than hashDocSet) or a
> bit set (if greater) of document IDs (see [:SolrCaching#The
> hashDocSet Max Size:hashDocSet]). Then every time that facet.query
> is used for faceting a query, the cached set will be intersected
> against the set of document ids returned by the query to count the
> number of documents for which the facet.query condition is true.
> == FacetFields =>
> Any number of [:SimpleFacetParameters#facet.field:facet.field]
> parameters can be passed to the request handler. For each
> facet.field, one of two approaches will be used:
> * Field Queries: If the facet field is defined in the schema
> as multi-valued, boolean, or tokenized, then every indexed value