|
|
-
Similarity coefficient for more exact matching
Maxim Terletsky 2012-04-27, 12:18
Hi guys, I have a field, Anayzed, Store.No. Suppose one Document with value inside this field "Hello". Another one "Hello world , one, two, three, four". Since the field is Analyzed (with norms), the "one two three four) will definitely affect the resulting rating in case we search for "Hello world" query. Does anyone know whether I can control some coefficients to determine what is the weight for exact matching vs. amount of worlds (the norm factor)? Thanks,
Maxim
+
Maxim Terletsky 2012-04-27, 12:18
-
Re: Similarity coefficient for more exact matching
Ian Lea 2012-04-27, 13:29
You can override org.apache.lucene.search.Similarity/DefaultSimilarity to tweak quite a lot of stuff.
computeNorm() may be the method you are interested in. Called at indexing time so be sure to use the same implementation at index and query time, using IndexWriterConfig.setSimilarity() and IndexSearcher.setSimilarity(), unless you are clever or like being confused.
SweetSpotSimilarity might also be worth a look.
-- Ian. On Fri, Apr 27, 2012 at 1:18 PM, Maxim Terletsky <[EMAIL PROTECTED]> wrote: > Hi guys, > I have a field, Anayzed, Store.No. > Suppose one Document with value inside this field "Hello". > Another one "Hello world , one, two, three, four". > Since the field is Analyzed (with norms), the "one two three four) will definitely affect the resulting rating in case we search for "Hello world" query. Does anyone know whether I can control some coefficients to determine what is the weight for exact matching vs. amount of worlds (the norm factor)? > Thanks, > > > Maxim
---------------------------------------------------------------------
+
Ian Lea 2012-04-27, 13:29
-
RE: Similarity coefficient for more exact matching
Paul Hill 2012-05-04, 16:32
> [use] IndexWriterConfig.setSimilarity() and > IndexSearcher.setSimilarity(), unless you are clever or like being confused. > > SweetSpotSimilarity might also be worth a look. > > -- > Ian.
Being even less clever, I just make sure I set:
Similarity.setDefault(new MySimilarity())
when crawling and searching, so everything uses the same similarity strategies.
Checking the 3.4 code IndexWriterConfig and IndexSearcher, both default to Similarity.getDefault().
Any thoughts on scenarios where you'd not push a custom similarity into the default position?
-Paul ---------------------------------------------------------------------
+
Paul Hill 2012-05-04, 16:32
-
Re: Similarity coefficient for more exact matching
Ian Lea 2012-05-10, 08:26
Similarity.setDefault(new MySimilarity()) is certainly better than the 2 calls I recommended. Thanks.
I find it hard to see why one might not want to do this in normal usage but have a vague recollection of someone once outlining some obscure scenarios where different similarities at index and search time made sense. -- Ian. On Fri, May 4, 2012 at 5:32 PM, Paul Hill <[EMAIL PROTECTED]> wrote: >> [use] IndexWriterConfig.setSimilarity() and >> IndexSearcher.setSimilarity(), unless you are clever or like being confused. >> >> SweetSpotSimilarity might also be worth a look. >> >> -- >> Ian. > > Being even less clever, I just make sure I set: > > Similarity.setDefault(new MySimilarity()) > > when crawling and searching, so everything uses the same similarity strategies. > > Checking the 3.4 code IndexWriterConfig and IndexSearcher, both default to Similarity.getDefault(). > > Any thoughts on scenarios where you'd not push a custom similarity into the default position? > > -Paul > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] >
---------------------------------------------------------------------
+
Ian Lea 2012-05-10, 08:26
|
|