Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Solr, mail # user - StopFilterFactory and "qf" containing some fields that use it and some that do not


Copy link to this message
-
RE: StopFilterFactory and "qf" containing some fields that use it and some that do not
Dyer, James 2011-01-12, 23:23
Here is what debug says each of these queries parse to:

1. q=life&defType=edismax&qf=Title  ... returns 277,635 results
2. q=the life&defType=edismax&qf=Title ... returns 277,635 results
3. q=life&defType=edismax&qf=Title Contributor  ... returns 277,635
4. q=the life&defType=edismax&qf=Title Contributor ... returns 0 results

1. +DisjunctionMaxQuery((Title:life))
2. +((DisjunctionMaxQuery((Title:life)))~1)
3. +DisjunctionMaxQuery((CTBR_SEARCH:life | Title:life))
4. +((DisjunctionMaxQuery((Contributor:the)) DisjunctionMaxQuery((Contributor:life | Title:life)))~2)

I see what's going on here.  Because "the" is a stop word for Title, it gets removed from first part of the expression.  This means that "Contributor" is required to contain "the".  dismax does the same thing too.  I guess I should have run debug before asking the mail list!

It looks like the only workarounds I have is to either filter out the stopwords in the client when this happens, or enable stop words for all the fields that are used in "qf" with stopword-enabled fields.  Unless...someone has a better idea??

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-----Original Message-----
From: Markus Jelsma [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, January 12, 2011 4:44 PM
To: [EMAIL PROTECTED]
Cc: Jayendra Patil
Subject: Re: StopFilterFactory and "qf" containing some fields that use it and some that do not
> Have used edismax and Stopword filters as well. But usually use the fq
> parameter e.g. fq=title:the life and never had any issues.

That is because filter queries are not relevant for the mm parameter which is
being used for the main query.

>
> Can you turn on the debugQuery and check whats the Query formed for all the
> combinations you mentioned.
>
> Regards,
> Jayendra
>
> On Wed, Jan 12, 2011 at 5:19 PM, Dyer, James
<[EMAIL PROTECTED]>wrote:
> > I'm running into a problem with StopFilterFactory in conjunction with
> > (e)dismax queries that have a mix of fields, only some of which use
> > StopFilterFactory.  It seems that if even 1 field on the "qf" parameter
> > does not use StopFilterFactory, then stop words are not removed when
> > searching any fields.  Here's an example of what I mean:
> >
> > - I have 2 fields indexed:
> >  > Title is "textStemmed", which includes StopFilterFactory (see below).
> >  > Contributor is "textSimple", which does not include StopFilterFactory
> >
> > (see below).
> > - "The" is a stop word in stopwords.txt
> > - q=life&defType=edismax&qf=Title  ... returns 277,635 results
> > - q=the life&defType=edismax&qf=Title ... returns 277,635 results
> > - q=life&defType=edismax&qf=Title Contributor  ... returns 277,635
> > results - q=the life&defType=edismax&qf=Title Contributor ... returns 0
> > results
> >
> > It seems as if the stop words are not being stripped from the query
> > because "qf" contains a field that doesn't use StopFilterFactory.  I did
> > testing with combining Stemmed fields with not Stemmed fields in "qf"
> > and it seems as if stemming gets applied regardless.  But stop words do
> > not.
> >
> > Does anyone have ideas on what is going on?  Is this a feature or
> > possibly a bug?  Any known workarounds?  Any advice is appreciated.
> >
> > James Dyer
> > E-Commerce Systems
> > Ingram Content Group
> > (615) 213-4311
> > ________________________________
> > <fieldType name="textSimple" class="solr.TextField"
> > positionIncrementGap="100">
> > <analyzer type="index">
> > <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > <filter class="solr.LowerCaseFilterFactory"/>
> > </analyzer>
> > <analyzer type="query">
> > <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > <filter class="solr.LowerCaseFilterFactory"/>
> > </analyzer>
> > </fieldType>
> >
> > <fieldType name="textStemmed" class="solr.TextField"
> > positionIncrementGap="100">
> > <analyzer type="index">
> > <tokenizer class="solr.WhitespaceTokenizerFactory"/>