Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Solr, mail # user - StopFilterFactory and "qf" containing some fields that use it and some that do not


Copy link to this message
-
RE: StopFilterFactory and "qf" containing some fields that use it and some that do not
Dyer, James 2011-01-13, 16:36
I appreciate the reply and blog posting.  For now, I just enabled stopwords for all the fields on "Qf".  We have a very short list anyhow and our legacy search engine didn't even allow field-by-field configuration (stopwords are global on that system).

I do wonder...what if (e)dismax had a flag you could set that would tell it that if any analyzers removed a term, then that term would become optional for any fields for which it remained?  I'm not sure what the development effort would perhaps it would be a nice way to circumvent this problem in a future release...

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311
-----Original Message-----
From: Jonathan Rochkind [mailto:[EMAIL PROTECTED]]
Sent: Thursday, January 13, 2011 9:54 AM
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Cc: Dyer, James
Subject: Re: StopFilterFactory and "qf" containing some fields that use it and some that do not

It's a known 'issue' in dismax, (really an inherent part of dismax's
design with no clear way to do anything about it), that qf over fields
with different stop word definitions will produce odd results for a
query with a stopword.

Here's my understanding of what's going on:
http://bibwild.wordpress.com/2010/04/14/solr-stop-wordsdismax-gotcha/

On 1/12/2011 6:48 PM, Markus Jelsma wrote:
> Here's another thread on the subject:
> http://lucene.472066.n3.nabble.com/Dismax-Minimum-Match-Stopwords-Bug-
> td493483.html
>
> And slightly off topic: you'd also might want to look at using common grams,
> they are really useful for phrase queries that contain stopwords.
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.CommonGramsFilterFactory
>
>
>> Here is what debug says each of these queries parse to:
>>
>> 1. q=life&defType=edismax&qf=Title  ... returns 277,635 results
>> 2. q=the life&defType=edismax&qf=Title ... returns 277,635 results
>> 3. q=life&defType=edismax&qf=Title Contributor  ... returns 277,635
>> 4. q=the life&defType=edismax&qf=Title Contributor ... returns 0 results
>>
>> 1. +DisjunctionMaxQuery((Title:life))
>> 2. +((DisjunctionMaxQuery((Title:life)))~1)
>> 3. +DisjunctionMaxQuery((CTBR_SEARCH:life | Title:life))
>> 4. +((DisjunctionMaxQuery((Contributor:the))
>> DisjunctionMaxQuery((Contributor:life | Title:life)))~2)
>>
>> I see what's going on here.  Because "the" is a stop word for Title, it
>> gets removed from first part of the expression.  This means that
>> "Contributor" is required to contain "the".  dismax does the same thing
>> too.  I guess I should have run debug before asking the mail list!
>>
>> It looks like the only workarounds I have is to either filter out the
>> stopwords in the client when this happens, or enable stop words for all
>> the fields that are used in "qf" with stopword-enabled fields.
>> Unless...someone has a better idea??
>>
>> James Dyer
>> E-Commerce Systems
>> Ingram Content Group
>> (615) 213-4311
>>
>> -----Original Message-----
>> From: Markus Jelsma [mailto:[EMAIL PROTECTED]]
>> Sent: Wednesday, January 12, 2011 4:44 PM
>> To: [EMAIL PROTECTED]
>> Cc: Jayendra Patil
>> Subject: Re: StopFilterFactory and "qf" containing some fields that use it
>> and some that do not
>>
>>> Have used edismax and Stopword filters as well. But usually use the fq
>>> parameter e.g. fq=title:the life and never had any issues.
>> That is because filter queries are not relevant for the mm parameter which
>> is being used for the main query.
>>
>>> Can you turn on the debugQuery and check whats the Query formed for all
>>> the combinations you mentioned.
>>>
>>> Regards,
>>> Jayendra
>>>
>>> On Wed, Jan 12, 2011 at 5:19 PM, Dyer, James
>> <[EMAIL PROTECTED]>wrote:
>>>> I'm running into a problem with StopFilterFactory in conjunction with
>>>> (e)dismax queries that have a mix of fields, only some of which use
>>>> StopFilterFactory.  It seems that if even 1 field on the "qf" parameter