Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Lucene, mail # user - Computing Relevancy Differently


Copy link to this message
-
Re: Computing Relevancy Differently
Doug Cutting 2003-02-07, 19:37
Terry Steichen wrote:
> I read all the relevant references I could find in the Users (not
> Developers) list, and I still don't exactly know what to do.
>
> What I'd like to do is get a relevancy-based order in which (a) longer
> documents tend to get more weight than shorter ones, (b) a document body
> with 'X' instances of a query term gets a higher ranking than one with fewer
> than 'X' instances. and (c) a term found in the headline (usually in
> addition to finding the same term in the body) is more highly ranked than
> one with the term only in the body.

In the latest sources this can all be done by defining your own
Similarity implementation.  You can make longer documents score higher
by overriding the lengthNorm() method.  You can boost headlines there,
or with Field.setBoost(), or at query time with Query.setBoost().

Doug
---------------------------------------------------------------------