|
Em
2011-10-05, 07:42
Ian Lea
2011-10-05, 09:32
Uwe Schindler
2011-10-05, 09:39
Em
2011-10-05, 10:22
Ian Lea
2011-10-05, 10:51
Em
2011-10-05, 11:03
Chris Hostetter
2011-10-05, 18:19
Em
2011-10-07, 17:37
Em
2011-10-07, 17:54
Chris Hostetter
2011-10-10, 18:18
|
-
How is Number of Boolean Clauses calculated - Minimum Should Match?Em 2011-10-05, 07:42
Hello list,
in what way does BooleanQuery calculates the number of its clauses? Is this number based on the analyzed query or based on the raw query-string? Imagine you got a StopFilter or a SynonymFilter applied to a BooleanQuery during analyzing - the number of clauses could shrink or increase. I remind that in connection with the MinimumShouldMatch-param there may occur problems if you query fields with an applied StopFilter and some fields without. I tried to answer a question on mailinglists and noticed that I am relatively unsure about how MM is calculated in general and how especially in Solr (since I am not sure, I am a little bit confused when I made a code review). Thank you! Regards, Em ---------------------------------------------------------------------
-
Re: How is Number of Boolean Clauses calculated - Minimum Should Match?Ian Lea 2011-10-05, 09:32
It will work on the query, whether produced by a query parser or
constructed in code. I don't see that the number of clauses will change if you are applying filters. Filters are not query clauses, although it can get confusing if you start using stuff like FilteredQuery or QueryWrapperFilter. -- Ian. On Wed, Oct 5, 2011 at 8:42 AM, Em <[EMAIL PROTECTED]> wrote: > Hello list, > > in what way does BooleanQuery calculates the number of its clauses? Is > this number based on the analyzed query or based on the raw query-string? > > Imagine you got a StopFilter or a SynonymFilter applied to a > BooleanQuery during analyzing - the number of clauses could shrink or > increase. > > I remind that in connection with the MinimumShouldMatch-param there may > occur problems if you query fields with an applied StopFilter and some > fields without. > > I tried to answer a question on mailinglists and noticed that I am > relatively unsure about how MM is calculated in general and how > especially in Solr (since I am not sure, I am a little bit confused when > I made a code review). > > Thank you! > > Regards, > Em > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > ---------------------------------------------------------------------
-
RE: How is Number of Boolean Clauses calculated - Minimum Should Match?Uwe Schindler 2011-10-05, 09:39
Hi,
The TooManyClausesException is thrown by BooleanQuery.add(Clause). Because of this, it can only count clauses actually added to the BooleanQuery - terms thrown away by QueryParser before are not counted as they will not be in the final query. If a token in the query parser expands to multiple synonyms, multiple clauses are added and count against the limit. Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: [EMAIL PROTECTED] > -----Original Message----- > From: Ian Lea [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, October 05, 2011 11:32 AM > To: [EMAIL PROTECTED] > Subject: Re: How is Number of Boolean Clauses calculated - Minimum Should > Match? > > It will work on the query, whether produced by a query parser or > constructed in code. I don't see that the number of clauses will > change if you are applying filters. Filters are not query clauses, > although it can get confusing if you start using stuff like > FilteredQuery or QueryWrapperFilter. > > > -- > Ian. > > > On Wed, Oct 5, 2011 at 8:42 AM, Em <[EMAIL PROTECTED]> wrote: > > Hello list, > > > > in what way does BooleanQuery calculates the number of its clauses? Is > > this number based on the analyzed query or based on the raw query-string? > > > > Imagine you got a StopFilter or a SynonymFilter applied to a > > BooleanQuery during analyzing - the number of clauses could shrink or > > increase. > > > > I remind that in connection with the MinimumShouldMatch-param there may > > occur problems if you query fields with an applied StopFilter and some > > fields without. > > > > I tried to answer a question on mailinglists and noticed that I am > > relatively unsure about how MM is calculated in general and how > > especially in Solr (since I am not sure, I am a little bit confused when > > I made a code review). > > > > Thank you! > > > > Regards, > > Em > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] ---------------------------------------------------------------------
-
Re: How is Number of Boolean Clauses calculated - Minimum Should Match?Em 2011-10-05, 10:22
Hi,
thank you Uwe and Ian! So if an Analyzer contains a StopFilter and the parser uses this Analyzer, than the following will happen: Original: "To be or not to be said Shakespeare" Stopwords: To, be, or Resulting BooleanClauses: - not - said - Shakespeare Is this right? If the MM was set to 4 (too many), than this means all queries have to match? If so, what is the problem in Solr with Stopwords and the Dismax-Parser? Regards, Em Am 05.10.2011 11:39, schrieb Uwe Schindler: > Hi, > > The TooManyClausesException is thrown by BooleanQuery.add(Clause). Because > of this, it can only count clauses actually added to the BooleanQuery - > terms thrown away by QueryParser before are not counted as they will not be > in the final query. If a token in the query parser expands to multiple > synonyms, multiple clauses are added and count against the limit. > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: [EMAIL PROTECTED] > >> -----Original Message----- >> From: Ian Lea [mailto:[EMAIL PROTECTED]] >> Sent: Wednesday, October 05, 2011 11:32 AM >> To: [EMAIL PROTECTED] >> Subject: Re: How is Number of Boolean Clauses calculated - Minimum Should >> Match? >> >> It will work on the query, whether produced by a query parser or >> constructed in code. I don't see that the number of clauses will >> change if you are applying filters. Filters are not query clauses, >> although it can get confusing if you start using stuff like >> FilteredQuery or QueryWrapperFilter. >> >> >> -- >> Ian. >> >> >> On Wed, Oct 5, 2011 at 8:42 AM, Em <[EMAIL PROTECTED]> wrote: >>> Hello list, >>> >>> in what way does BooleanQuery calculates the number of its clauses? Is >>> this number based on the analyzed query or based on the raw > query-string? >>> >>> Imagine you got a StopFilter or a SynonymFilter applied to a >>> BooleanQuery during analyzing - the number of clauses could shrink or >>> increase. >>> >>> I remind that in connection with the MinimumShouldMatch-param there may >>> occur problems if you query fields with an applied StopFilter and some >>> fields without. >>> >>> I tried to answer a question on mailinglists and noticed that I am >>> relatively unsure about how MM is calculated in general and how >>> especially in Solr (since I am not sure, I am a little bit confused when >>> I made a code review). >>> >>> Thank you! >>> >>> Regards, >>> Em >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>> For additional commands, e-mail: [EMAIL PROTECTED] >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > ---------------------------------------------------------------------
-
Re: How is Number of Boolean Clauses calculated - Minimum Should Match?Ian Lea 2011-10-05, 10:51
Sorry - you did say StopFilter or SynonymFilter but I started talking
about oal.search.Filter instead. > So if an Analyzer contains a StopFilter and the parser uses this > Analyzer, than the following will happen: > > Original: > "To be or not to be said Shakespeare" > > Stopwords: To, be, or > > Resulting BooleanClauses: > - not > - said > - Shakespeare > > Is this right? Yes. > If the MM was set to 4 (too many), than this means all queries have to > match? Presumably this query would fail, since you've only got three clauses. Easy to verify. > If so, what is the problem in Solr with Stopwords and the Dismax-Parser? That sounds like a different question, maybe one for the solr list. -- Ian. ---------------------------------------------------------------------
-
Re: How is Number of Boolean Clauses calculated - Minimum Should Match?Em 2011-10-05, 11:03
Hi Ian,
thanks for the fast feedback. >> If the MM was set to 4 (too many), than this means all queries have to >> match? > > Presumably this query would fail, since you've only got three clauses. > Easy to verify. Seems like different behaviour compared to Solr. Probably Solr is intelligent enough to reduce the parameter to the maximum value if it is too large. I'll wait a little bit, before reposting my question on the Solr list. Regards, Em Am 05.10.2011 12:51, schrieb Ian Lea: > Sorry - you did say StopFilter or SynonymFilter but I started talking > about oal.search.Filter instead. > >> So if an Analyzer contains a StopFilter and the parser uses this >> Analyzer, than the following will happen: >> >> Original: >> "To be or not to be said Shakespeare" >> >> Stopwords: To, be, or >> >> Resulting BooleanClauses: >> - not >> - said >> - Shakespeare >> >> Is this right? > > Yes. > >> If the MM was set to 4 (too many), than this means all queries have to >> match? > > Presumably this query would fail, since you've only got three clauses. > Easy to verify. > >> If so, what is the problem in Solr with Stopwords and the Dismax-Parser? > > That sounds like a different question, maybe one for the solr list. > > > -- > Ian. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > ---------------------------------------------------------------------
-
Re: How is Number of Boolean Clauses calculated - Minimum Should Match?Chris Hostetter 2011-10-05, 18:19
: > Presumably this query would fail, since you've only got three clauses. : > Easy to verify. : : Seems like different behaviour compared to Solr. Probably Solr is : intelligent enough to reduce the parameter to the maximum value if it is : too large. correct, the dismax parser in solr is smart enough not to calculate an illegal value for minNrShouldMatch using the mm param. : >> If so, what is the problem in Solr with Stopwords and the Dismax-Parser? the problem people sometimes have understanding the interaction of the dismax parser and stopwords comes from using sotpwords in the analyzers for *some* fields they are querying but not others, and then being suprised that the stopwords are still part of their overall query (in the fields where they didn't use them in their analyzer)... https://wiki.apache.org/solr/DisMax http://www.lucidimagination.com/blog/2010/05/23/whats-a-dismax/ ...note in particula the "Where people tend to get tripped up..." para in that blog post -Hoss ---------------------------------------------------------------------
-
Re: How is Number of Boolean Clauses calculated - Minimum Should Match?Em 2011-10-07, 17:37
Hi Hoss,
I read your article. I have to review the solr-code but with the help of your pseudo-code I think I understand what goes on now. Thank you! Regards, Em Am 05.10.2011 20:19, schrieb Chris Hostetter: > > : > Presumably this query would fail, since you've only got three clauses. > : > Easy to verify. > : > : Seems like different behaviour compared to Solr. Probably Solr is > : intelligent enough to reduce the parameter to the maximum value if it is > : too large. > > correct, the dismax parser in solr is smart enough not to calculate an > illegal value for minNrShouldMatch using the mm param. > > : >> If so, what is the problem in Solr with Stopwords and the Dismax-Parser? > > the problem people sometimes have understanding the interaction of the dismax > parser and stopwords comes from using sotpwords in the analyzers > for *some* fields they are querying but not others, and then being > suprised that the stopwords are still part of their overall query (in the > fields where they didn't use them in their analyzer)... > > https://wiki.apache.org/solr/DisMax > http://www.lucidimagination.com/blog/2010/05/23/whats-a-dismax/ > > ...note in particula the "Where people tend to get tripped up..." para in > that blog post > > > -Hoss > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > ---------------------------------------------------------------------
-
Re: How is Number of Boolean Clauses calculated - Minimum Should Match?Em 2011-10-07, 17:54
Hoss,
did you have a look on the responses you got? The first one is really interesting. It asks about the behaviour when synonyms come into play. >From my understanding this could be also dangerous for queries that reduce the number of tokens. Imagine: Search Engine => SE (reduced to SE). This should have the same impact on the min should match as a stopword, no? What if I remove a stopword but add another token when synonyms come in? Just some thoughts :). Regards, Em Am 07.10.2011 19:37, schrieb Em: > Hi Hoss, > > I read your article. > > I have to review the solr-code but with the help of your pseudo-code I > think I understand what goes on now. > > Thank you! > > Regards, > Em > > Am 05.10.2011 20:19, schrieb Chris Hostetter: >> >> : > Presumably this query would fail, since you've only got three clauses. >> : > Easy to verify. >> : >> : Seems like different behaviour compared to Solr. Probably Solr is >> : intelligent enough to reduce the parameter to the maximum value if it is >> : too large. >> >> correct, the dismax parser in solr is smart enough not to calculate an >> illegal value for minNrShouldMatch using the mm param. >> >> : >> If so, what is the problem in Solr with Stopwords and the Dismax-Parser? >> >> the problem people sometimes have understanding the interaction of the dismax >> parser and stopwords comes from using sotpwords in the analyzers >> for *some* fields they are querying but not others, and then being >> suprised that the stopwords are still part of their overall query (in the >> fields where they didn't use them in their analyzer)... >> >> https://wiki.apache.org/solr/DisMax >> http://www.lucidimagination.com/blog/2010/05/23/whats-a-dismax/ >> >> ...note in particula the "Where people tend to get tripped up..." para in >> that blog post >> >> >> -Hoss >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > ---------------------------------------------------------------------
-
Re: How is Number of Boolean Clauses calculated - Minimum Should Match?Chris Hostetter 2011-10-10, 18:18
: From my understanding this could be also dangerous for queries that : reduce the number of tokens. : Imagine: Search Engine => SE (reduced to SE). : This should have the same impact on the min should match as a stopword, no? Not really ... assuming you mean *query* based synonyms, then a multiword synonym used in the query string isn't going to be respected unless it's explicilty quoted, because each "chunk" of query parser input is analyzed independently. (remember: the QueryParser parses according to it's own meta-characters -- including whitespace -- before passing any parts of hte input to the individual analyzers) Even if it is quoted, and it reduces to one term in fieldA, but remains two terms in fieldB, the number of clauses isn't affected because the end result for each chunk is what's used to create the DisjunctionMaxQuery objects that are used as the clauses in the top level BooleanQuery. : What if I remove a stopword but add another token when synonyms come in? try it ... you'll see what i mean. (when it comes to query parsing, no amount of textual description can substitue fo first hand experience and experimentation -- i've written documenation, blogs, emails ... i've even done training classes where i've discussed this specific thing for ~1 hour -- nothing makes it hit home like having people sit down and actually play with the config and see the output) -Hoss --------------------------------------------------------------------- |