Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Solr, mail # user - autoGeneratePhraseQueries sort of silently set to false


Copy link to this message
-
Re: autoGeneratePhraseQueries sort of silently set to false
Erik Hatcher 2012-02-23, 19:52
there's this (for 3.1, but in the 3.x CHANGES.txt):

* SOLR-2015: Add a boolean attribute autoGeneratePhraseQueries to TextField.
  autoGeneratePhraseQueries="true" (the default) causes the query parser to
  generate phrase queries if multiple tokens are generated from a single
  non-quoted analysis string.  For example WordDelimiterFilter splitting text:pdp-11
  will cause the parser to generate text:"pdp 11" rather than (text:PDP OR text:11).
  Note that autoGeneratePhraseQueries="true" tends to not work well for non whitespace
  delimited languages. (yonik)

with a ton of useful, though back and forth, commentary here: <https://issues.apache.org/jira/browse/SOLR-2015>

Note that the behavior, as Naomi pointed out so succinctly, is adjustable based off the *schema* version setting.  (look at your <schema> line in schema.xml).  The code is simply this:

    if (schema.getVersion() > 1.3f) {
      autoGeneratePhraseQueries = false;
    } else {
      autoGeneratePhraseQueries = true;
    }

on TextField.  Specifying autoGeneratePhraseQueries explicitly on a field type overrides whatever the default may be.

Erik

On Feb 23, 2012, at 14:45 , Burton-West, Tom wrote:

> Seems like a change in default behavior like this should be included in the changes.txt for Solr 3.5.
> Not sure how to do that.
>
> Tom
>
> -----Original Message-----
> From: Naomi Dushay [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, February 23, 2012 1:57 PM
> To: [EMAIL PROTECTED]
> Subject: autoGeneratePhraseQueries sort of silently set to false
>
> Another thing I noticed when upgrading from Solr 1.4 to Solr 3.5 had to do with results when there were hyphenated words:   aaa-bbb.   Erik Hatcher pointed me to the autoGeneratePhraseQueries attribute now available on fieldtype definitions in schema.xml.  This is a great feature, and everything is peachy if you start with Solr 3.4.   But many of us started earlier and are upgrading, and that's a different story.
>
> It was surprising to me that
>
> a.  the default for this new feature caused different search results than Solr 1.4
>
> b.  it wasn't documented clearly, IMO
>
> http://wiki.apache.org/solr/SchemaXml   makes no mention of it
>
>
> In the schema.xml example, there is this at the top:
>
> <!-- attribute "name" is the name of this schema and is only used for display purposes.
>       Applications should change this to reflect the nature of the search collection.
>       version="1.4" is Solr's version number for the schema syntax and semantics.  It should
>       not normally be changed by applications.
>       1.0: multiValued attribute did not exist, all fields are multiValued by nature
>       1.1: multiValued attribute introduced, false by default
>       1.2: omitTermFreqAndPositions attribute introduced, true by default except for text fields.
>       1.3: removed optional field compress feature
>       1.4: default auto-phrase (QueryParser feature) to off
>     -->
>
> And there was this in a couple of field definitions:
>
> <fieldType name="text_en_splitting" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
> <fieldType name="text_ja" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="false">
>
> But that was it.
>