|
|
-
Re: [Solr Wiki] Update of "SolrFacetingOverview" by JJLarreaErik Hatcher 2006-12-28, 03:37
JJ: Fantastic - this is excellent info, and sharing it helps a LOT!
Erik On Dec 27, 2006, at 7:25 PM, Apache Wiki wrote: > Dear Wiki user, > > You have subscribed to a wiki page or wiki category on "Solr Wiki" > for change notification. > > The following page has been changed by JJLarrea: > http://wiki.apache.org/solr/SolrFacetingOverview > > The comment on the change is: > Added page per 12/8/06 suggestion by Yonik > > New page: > = Faceting Overview > > Solr provides a [http://incubator.apache.org/solr/docs/api/org/ > apache/solr/request/SimpleFacets.html Simple Faceting toolkit] > which can be reused by various Request Handlers to include "Facet > counts" of based on some simple criteria. Both the > StandardRequestHandler and the DisMaxRequestHandler currently use > these utilities. Detailed descriptions of the parameters used to > control faceting can be found (along with several examples) at > [SimpleFacetParameters]. > > This page briefly provides some general background information: > > = Facet Indexing > > Faceting is done on __indexed__ rather than __stored__ values. > This is because the primary use for faceting is drilldown into a > subset of hits resulting from a query, and so the chosen facet > value is used to construct a filter query which literally matches > that value in the index. For the stock Solr request handlers this > is done by adding an `fq=<facet-field>:<quoted facet-value>` > parameter and resubmitting the query. > > Because faceting fields are often specified to serve two purposes, > human-readable text and drill-down query value, they are frequently > indexed differently from fields used for searching and sorting: > * They are not tokenized into separate words > * They are not mapped into lower case > * Human-readable punctuation is not removed (other than double- > quotes) > * There is often no need to store them, since stored values would > look much like indexed values and the faceting mechanism is used > for value retrieval. > * Depending on how the field is defined the SimpleFacets > mechanism may only allow for a single value per field per document > (see below) > > As an example, if I had a field with a list of authors, such as: > > Schildt, Herbert; Wolpert, Lewis; Davies, P. > > I might want to index the same data differently in three different > fields (perhaps using the Solr [:SchemaXml#Copy Fields:copyField] > directive): > * For searching: Tokenized, case-folded, punctuation-stripped: > schildt / herbert / wolpert / lewis / davies / p > * For sorting: Untokenized, case-folded, punctuation-stripped: > schildt herbert wolpert lewis davies p > * For faceting: Primary author only, using a `solr.StringField`: > Schildt, Herbert > > Then when the user drills down on the "Schildt, Herbert" string I > would reissue the query with an added fq="Schild, Herbert" parameter. > > = Facet Operation > > Currently SimpleFacets has 3 modes of operation: > > == FacetQueries => > Any number of [:SimpleFacetParameters#facet.query:facet.query] > parameters can be passed to the request handler. Each distinct > facet.query will first be executed against the entire index, with > the results cached as a hashed set (if fewer than hashDocSet) or a > bit set (if greater) of document IDs (see [:SolrCaching#The > hashDocSet Max Size:hashDocSet]). Then every time that facet.query > is used for faceting a query, the cached set will be intersected > against the set of document ids returned by the query to count the > number of documents for which the facet.query condition is true. > > == FacetFields => > Any number of [:SimpleFacetParameters#facet.field:facet.field] > parameters can be passed to the request handler. For each > facet.field, one of two approaches will be used: > > * Field Queries: If the facet field is defined in the schema > as multi-valued, boolean, or tokenized, then every indexed value |