-Re: Query, Searcher, Weight, Similarity = ?
Robert Muir 2012-06-29, 17:50
On Fri, Jun 29, 2012 at 9:02 AM, Arjun Dhar <[EMAIL PROTECTED]> wrote:
> I'm new and that is my disclaimer to the stupid question I am about to ask.
> Am trying to form a conceptual picture of the relation between Query <-->
> Weight <--> IndexReader, Scorer, Searcher <--> Similarity
> *From what I gather : (and someone please validate or correct me) *
> 1. We want *Queries* to be RE-USABLE instances hence *Weight* is a specific
> Queries state !?
Queries are independent of a Searcher. When executing a Query, it
creates a Weight specifically for that searcher. This contains things
things like IDF computations: collection-wide state.
> 2. *Searcher* is STATEFUL, and though it processes a *Query*, the state for
> that *Searcher* is delegated to the WEIGHT !?
Searcher wraps an indexreader (usually a composite indexreader
containing multiple segments like a DirectoryReader) to provide search
capabilities. It also has extension points that are search specific:
one of these is Similarity, but there are others. For example, in 4.0
you can override methods to provide collection-wide stats where the
collection is distributed: consisting of indexes across multiple
> 3. *IndexReader* Reads an Index, and the *Searcher* uses the Reader to
> SEARCH, using a QUERY
> 4. From the JavaDocs of Weight class ----> "IndexReader dependent state
> should reside in the Scorer. " -- Means, when *weights* are calculated, the
> final result of the Calculation goes into a STATEFUL object represented by
> the *Scorer* which is also Iterable !?
This could maybe be clarified to say per-segment state. So if you have
an IndexSearcher wrapping a DirectoryReader with 4 index segments, in
the typical case the Weight holds the state of the entire collection:
e.g. IDF across all 4 segments. The Weight creates 4 Scorers: a Scorer
for each segment in that DirectoryReader. Any per-segment information
such as the document length normalization ("norms") array resides in
each of those Scorers.
> 5. *Searcher* can be assigned a *Similarity* algorithm. ... hence using that
> algorithm, it calculates *Weight*, which eventually leads to the
> construction of an Iterable *Scorer* !?
A Similarity is a hook for term weighting. But term weighting is not
the entire scoring algorithm in many cases: Scorers don't have to use
Similarity to compute things: they can use whatever logic they want.
> 6. While Indexing, its simple there is a direct relation between
> IndexWriterConfig <--> Similarity
this is for computing document length normalization information
("norms") at indexing time. Currently thats the only way that
IndexWriter interacts with Similarity.
> +Q) Apart from the validation of my understanding, is there a Sequence
> Diagram explaining the process of calculation, during a Query?
have a look at https://builds.apache.org/job/Lucene-trunk/javadoc/ ,
click "Searching and Scoring in Lucene". I don't think there are any
diagrams there, but there is more information available.
> +Q) There are different implementations of Queries. Do they differ in how
> they mash up all the other stuff?
> Looks like if i mess each of the other entities, I can pretty much produce
> whatever Query?!
See the link above for more information, especially the section on
writing custom queries.