-Re: Computing Relevancy Differently
Terry Steichen 2003-01-26, 16:27
I read all the relevant references I could find in the Users (not
Developers) list, and I still don't exactly know what to do.
Let me explain a bit more. The documents I index are all news stories. The
typical document body ranges in size from 200 to 2000 words. The document
is structured into a couple of dozen indexed fields, but nearly all
searching is done in two: the headline and the body.
What I'd like to do is get a relevancy-based order in which (a) longer
documents tend to get more weight than shorter ones, (b) a document body
with 'X' instances of a query term gets a higher ranking than one with fewer
than 'X' instances. and (c) a term found in the headline (usually in
addition to finding the same term in the body) is more highly ranked than
one with the term only in the body.
But that's not what happens with the default scoring, and I'd like to change
I'm guessing, but maybe if I check the document length at indexing time and
boost longer documents, that will help. Maybe I could also (at index time)
give an extra boost to the headline field. Would that be the most I could
do without changing the Lucene core source?
PS: I'm also wondering if the fact that I have so many other fields, this
may affect the ranking in a way that diminishes the relevance of the
headline and/or body fields?
PSS: I'd just like to clarify another point. Much of the background
information on the scoring algorithms is beyond me and I have no interest
whatsoever in pushing the boundaries of this part of the technology. All I
want to do is use it so it comes out in a way that seems reasonable (without
having to become an expert in the complex theory behind this).
----- Original Message -----
From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Saturday, January 25, 2003 2:09 AM
Subject: Re: Computing Relevancy Differently
> Check the lucene-user archives, search for subject "custom scoring api
> I think that may give you the answer....
> --- Terry Steichen <[EMAIL PROTECTED]> wrote:
> > How would one go about altering the formula for relevancy? (That is,
> > which modules and which code?) I'm certain that the current
> > algorithm is well founded in logic and probably works well in many
> > environments.
> > However, I find that, as I index news stories, the current algorithm
> > frequently doesn't produce meaningful rankings. In previous
> > discussions in this list about relevancy, the algorithm seemed to be
> > very complex, possibly too complex for my poor brain to fully grasp.
> > But I'd like to try some other options and see if they result in
> > rankings more in line with what my average viewer would expect.
> > Regards,
> > Terry
> Do you Yahoo!?
> Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> To unsubscribe, e-mail:
> For additional commands, e-mail: