Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Solr, mail # user - edismax doesn't obey 'pf' parameter


Copy link to this message
-
Re: edismax doesn't obey 'pf' parameter
Chris Hostetter 2011-12-28, 20:05

: Of course. What I meant to say was there is
: always exactly one token in a non-tokenized
: field and it's offset is always exactly 0. There
: will never be tokens at position 1.
:
: So asking to match phrases, which is based on
: term positions is basically a no-op.

That's not always true.

consider a situation where you have a multivalued "author_exact" field
containing the authors full name as a literal string -- either using
StrField or TextField w/keywordTokenizer; and it's copyFielded from an
"author" field which is similar but tokenized.

So if a document contains the following two values in the author field...
"David Smiley"
"Eric Pugh"

then that document should be matched by all three of these queries...

defType=edismax&q=David&qf=author&pf=author_exact
defType=edismax&q=David+Pugh&qf=author&pf=author_exact
defType=edismax&q=David+Smiley&qf=author&pf=author_exact

...but it should score *really* high for that last query because it not
only matches on the author field, but it also gets an exact match on the
entire query string as an implicit phrase in the authr_exact field.

Dismax does behave this way, as you can see using the 3.5 example configs
& data (note that "cat" is a StrField)...

http://localhost:8983/solr/select/?debugQuery=true&defType=dismax&qf=name^5+features^3&pf=features^2+cat^4&q=hard+drive
<str name="parsedquery">
  +((DisjunctionMaxQuery((features:hard^3.0 | name:hard^5.0))
     DisjunctionMaxQuery((features:drive^3.0 | name:drive^5.0))
    )~2)
   DisjunctionMaxQuery((features:"hard drive"^2.0 | cat:hard drive^4.0))
But for some reason EDismax doesn't behave similarly...

http://localhost:8983/solr/select/?debugQuery=true&defType=edismax&qf=name^5+features^3&pf=features^2+cat^4&q=hard+drive
<str name="parsedquery">
  +((DisjunctionMaxQuery((features:hard^3.0 | name:hard^5.0))
     DisjunctionMaxQuery((features:drive^3.0 | name:drive^5.0))
    )~2)
   DisjunctionMaxQuery((features:"hard drive"^2.0))

...that definitely seems like a bug to me.  but it's not entirely clear
why it's happening (the pf related code in edismax is kind of hairy)

https://issues.apache.org/jira/browse/SOLR-2988

-Hoss