Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Solr, mail # user - two word phrase search using dismax


Copy link to this message
-
Re: two word phrase search using dismax
alxsss@... 2011-12-05, 23:15
Hi Eric,

After reading more about pf param I increased them a few times and this solved options 2, 3, 4 but 1. As an example,  for  phrase "newspaper latimes" latimes.com is not even in the results to boost it to the first place and changing mm param to   <str name="mm">1<-1 5<-2 6<90%</str> solves only 1,4 but 2,3.

Thanks.
Alex.

 

 

 

-----Original Message-----
From: Erick Erickson <[EMAIL PROTECTED]>
To: solr-user <[EMAIL PROTECTED]>
Sent: Mon, Dec 5, 2011 5:52 am
Subject: Re: two word phrase search using dismax
Have you looked at the "pf" (phrase fields)
parameter of edismax?

http://wiki.apache.org/solr/DisMaxQParserPlugin#pf_.28Phrase_Fields.29

Best
Erick

On Sat, Dec 3, 2011 at 7:04 PM,  <[EMAIL PROTECTED]> wrote:
> Hello,
>
> Here is my request handler
>
> <requestHandler name="search" class="solr.SearchHandler" >
> <lst name="defaults">
> <str name="defType">edismax</str>
> <str name="echoParams">explicit</str>
> <float name="tie">0.01</float>
> <str name="qf">site^1.5 content^0.5 title^1.2</str>
> <str name="pf">site^1.5 content^0.5 title^1.2</str>
> <str name="fl">id,title, site</str>
> <str name="mm">2<-1 5<-2 6<90%</str>
> <int name="ps">300</int>
> <bool name="hl">true</bool>
> <str name="q.alt">*:*</str>
> <str name="hl.fl">content</str>
> <str name="f.title.hl.fragsize">0</str>
> <str name="hl.fragsize">165</str>
> <str name="f.title.hl.alternateField">title</str>
> <str name="f.url.hl.fragsize">0</str>
> <str name="f.url.hl.alternateField">url</str>
> <str name="f.content.hl.fragmenter">regex</str>
> </lst>
> </requestHandler>
>
> I have made a few tests with debugQuery and realised that for two word
phrases, solr takes the first word and gives it a score according to qf param
then takes the second word and gives it score and etc, but not to the whole
phrase. That is why if one of the words is in the title and one of them in the
content then this doc is given higher score than the one that has both words in
the content but none in the title.
>
> Ideally, I want to achieve the following order.
> 1. If one (or both) of the words are in field site, then it must be given
higher score.
> 2. Then come docs with both words in the title.
> 3. Next, docs with both words in the content.
> 4. And finally docs having either of words in the title and content.
>
> I tried to change mm param to <str name="mm">1<-1 5<-2 6<90%</str>
> This allows to achieve 1,4 but not 2,3
>
> Thanks.
> Alex.
>
>
>
>
>
>
>
>
>
>
>
>
> -----Original Message-----
> From: Chris Hostetter <[EMAIL PROTECTED]>
> To: solr-user <[EMAIL PROTECTED]>
> Sent: Thu, Nov 17, 2011 2:17 pm
> Subject: Re: two word phrase search using dismax
>
>
>
>
> : After putting the same score for title and content in qf filed, docs
>
> : with both words in content moved to fifth place. The doc in the first,
>
> : third and fourth places still have only one of the words in content and
>
> : title. The doc in the second place has one of the words in title and
>
> : both words in the content but in different places not together.
>
>
>
> details matter -- if you send futher followup mails the full details of
>
> your dismax options and the score explanations for debugQuery are
>
> neccessary to be sure people understand what you are describing (a
>
> snapshot of reality is far more valuable then a vague description of
>
> reality)
>
>
>
> off hand what you are describing sounds correct -- this is what the
>
> dismax parser is really designed to do.
>
>
>
> even if you have given both title and content equal boosts, your title
>
> field is probably shorter then your content field, so words matching once
>
> in title are likly to score higher then the same word matching once in
>
> content due to length normalization -- and unless you set the "tie" param
>
> to something really high, the score contribution from the highest scoring
>
> field (in this case title) will be the dominant factor in the score (it's