|
Matthew Hall
2010-11-17, 18:16
kenf_nc
2010-11-17, 18:38
Matthew Hall
2010-11-17, 18:44
Matthew Hall
2010-11-17, 18:48
Erick Erickson
2010-11-17, 20:56
Matthew Hall
2010-11-17, 20:59
|
-
Multi Word searches in SolrMatthew Hall 2010-11-17, 18:16
Good afternoon,
We are running some queries against a default query field (of type text) that can be expected to be multiple words. For example, after parsing the query form I'm left with something something like this: abstract: mouse genome informatics The strange behavior that I am seeing however is that solr (either via the DisMaxQueryParser... or something unknown at this time) is really just running the first whitespace delimited word of this query against the indexes. So, in theory all of the following queries are functionally equivalent: abstract: mouse abstract: mouse genome abstract: mouse wouiweiopqeuqwiopdjakldjkldjklasdj k23jl12k3jkl123j Now based on my previous work with Lucene, this is not how I would have expected a query like this to act. I would have expected Solr to have basically OR'd the words together against that field (by default, unless you change the default operand to AND) Then when it comes time to score the results Solr would generally favor the matches that match multiple words in the query over single matches, etc and so forth. As a side note, I've taken my query right down to the basics and run it against the Solr admin, and I'm seeing the same type of results. I've also taken the query concept over to a friends project, and run it against one of his default fields and we are seeing the same behavior. So, I must be missing something in the documentation, or something along those lines, and I figure I'd probably best post the question out there. For multi word queries, against the default text type field what exactly happens in a multi word non phrase query? Furthermore where in the Solr documentation is this distinction spelled out? Thank you for your time, Matt
-
Re: Multi Word searches in Solrkenf_nc 2010-11-17, 18:38
Multi word queries is the bread and butter of Solr/Lucene, so I'm not sure I understand the complete issue here. For clarity, is 'abstract' the name of your default text field, or is your query q=abstract: mouse genome if the latter, my thought was is it possible that the query is being converted into a query of q=abstract:mouse genome where mouse is looked for in the field abstract, and genome is compared to the default text field. This is a stab in the dark, I don't know what your data looks like. You say it doesn't work the way you expect, but you don't really say what you do see. Are you getting zero results, or fewer than you expected, or only results that match all fields (the AND proposition)? -- View this message in context: http://lucene.472066.n3.nabble.com/Multi-Word-searches-in-Solr-tp1918802p1918915.html Sent from the Solr - User mailing list archive at Nabble.com.
-
Re: Multi Word searches in SolrMatthew Hall 2010-11-17, 18:44
I'm getting the result set that matches what it would be if I just
searched for the first word in the query. So I'm getting the results for mouse. And yes, abstract: is the name of the field. So a search for abstract: mouse would yield 69103 results abstract: mouse anythingelseIputhere yields 69103 results, even if the second token doesn't exist anywhere in the indexes. abstract: mouse verycommonterm yields 69103 results, even if the second token would normally result in say 150000 matches when searched on its own. I think what you are describing as being possible is what is happening, the query parser is taking what's being passed to it abstract: foo bar and really converting it into abstract: foo OR default:bar Matt On 11/17/2010 1:38 PM, kenf_nc wrote: > Multi word queries is the bread and butter of Solr/Lucene, so I'm not sure I > understand the complete issue here. For clarity, is 'abstract' the name of > your default text field, or is your query > > q=abstract: mouse genome > > if the latter, my thought was is it possible that the query is being > converted into a query of > q=abstract:mouse genome where mouse is looked for in the field abstract, > and genome is compared to the default text field. This is a stab in the > dark, I don't know what your data looks like. > > You say it doesn't work the way you expect, but you don't really say what > you do see. Are you getting zero results, or fewer than you expected, or > only results that match all fields (the AND proposition)?
-
Re: Multi Word searches in SolrMatthew Hall 2010-11-17, 18:48
Oh and to clarify what I expect to see.
I expect to see the term in a multiword query to be put together with OR's (Unless I've set the default to be AND's in the solrconfig.xml) I'm guessing that what I'm going to need to do here is place all of my queries in parentheses for the fields that I expect I'm going to have multiple word queries. This isn't what I was expecting having come from Lucene before this. So instead of my query looking like this in the Solr admin abstract: mouse genome informatics I really need it looking like: abstract: (mouse genome informatics) so that the query parser understands that these are multiple terms for a single field. Or am I misunderstanding here? Matt On 11/17/2010 1:44 PM, Matthew Hall wrote: > I'm getting the result set that matches what it would be if I just > searched for the first word in the query. > > So I'm getting the results for mouse. > > And yes, abstract: is the name of the field. > > So a search for > > abstract: mouse > > would yield 69103 results > > abstract: mouse anythingelseIputhere > yields 69103 results, even if the second token doesn't exist anywhere > in the indexes. > > abstract: mouse verycommonterm > yields 69103 results, even if the second token would normally result > in say 150000 matches when searched on its own. > > I think what you are describing as being possible is what is > happening, the query parser is taking what's being passed to it > > abstract: foo bar > and really converting it into > > abstract: foo OR default:bar > > Matt > > On 11/17/2010 1:38 PM, kenf_nc wrote: >> Multi word queries is the bread and butter of Solr/Lucene, so I'm not >> sure I >> understand the complete issue here. For clarity, is 'abstract' the >> name of >> your default text field, or is your query >> >> q=abstract: mouse genome >> >> if the latter, my thought was is it possible that the query is being >> converted into a query of >> q=abstract:mouse genome where mouse is looked for in the field >> abstract, >> and genome is compared to the default text field. This is a stab in the >> dark, I don't know what your data looks like. >> >> You say it doesn't work the way you expect, but you don't really say >> what >> you do see. Are you getting zero results, or fewer than you expected, or >> only results that match all fields (the AND proposition)? >
-
Re: Multi Word searches in SolrErick Erickson 2010-11-17, 20:56
Nope, you've got it right. Parenthesis are what's necessary. This is
actually similar to the Lucene world if you consider <defaultField> in your config to be equivalent to specifying a default field when you instantiate a parser. But that's a stretch..... That said, it is surprising that you are getting the exact same numbers of results found, unless you default field isn't one that's actually in your documents.... Best Erick On Wed, Nov 17, 2010 at 1:48 PM, Matthew Hall <[EMAIL PROTECTED]>wrote: > Oh and to clarify what I expect to see. > > I expect to see the term in a multiword query to be put together with OR's > (Unless I've set the default to be AND's in the solrconfig.xml) > > I'm guessing that what I'm going to need to do here is place all of my > queries in parentheses for the fields that I expect I'm going to have > multiple word queries. This isn't what I was expecting having come from > Lucene before this. > > So instead of my query looking like this in the Solr admin > > abstract: mouse genome informatics > > I really need it looking like: > > abstract: (mouse genome informatics) so that the query parser understands > that these are multiple terms for a single field. > > Or am I misunderstanding here? > > Matt > > > On 11/17/2010 1:44 PM, Matthew Hall wrote: > >> I'm getting the result set that matches what it would be if I just >> searched for the first word in the query. >> >> So I'm getting the results for mouse. >> >> And yes, abstract: is the name of the field. >> >> So a search for >> >> abstract: mouse >> >> would yield 69103 results >> >> abstract: mouse anythingelseIputhere >> yields 69103 results, even if the second token doesn't exist anywhere in >> the indexes. >> >> abstract: mouse verycommonterm >> yields 69103 results, even if the second token would normally result in >> say 150000 matches when searched on its own. >> >> I think what you are describing as being possible is what is happening, >> the query parser is taking what's being passed to it >> >> abstract: foo bar >> and really converting it into >> >> abstract: foo OR default:bar >> >> Matt >> >> On 11/17/2010 1:38 PM, kenf_nc wrote: >> >>> Multi word queries is the bread and butter of Solr/Lucene, so I'm not >>> sure I >>> understand the complete issue here. For clarity, is 'abstract' the name >>> of >>> your default text field, or is your query >>> >>> q=abstract: mouse genome >>> >>> if the latter, my thought was is it possible that the query is being >>> converted into a query of >>> q=abstract:mouse genome where mouse is looked for in the field abstract, >>> and genome is compared to the default text field. This is a stab in the >>> dark, I don't know what your data looks like. >>> >>> You say it doesn't work the way you expect, but you don't really say what >>> you do see. Are you getting zero results, or fewer than you expected, or >>> only results that match all fields (the AND proposition)? >>> >> >> > >
-
Re: Multi Word searches in SolrMatthew Hall 2010-11-17, 20:59
Yeah, I actually don't use the default field at all.
Well I learned something new and good today ^^ I just need to recheck my assumptions on how Solr works versus how core lucene worked and I think I'll be fine. The way solr is doing it makes sense too in a way, so I just need to readjust my query parsing on my end accordingly. Thanks folks, Matt On 11/17/2010 3:56 PM, Erick Erickson wrote: > Nope, you've got it right. Parenthesis are what's necessary. This is > actually similar to the Lucene world if you consider<defaultField> in > your config to be equivalent to specifying a default field when > you instantiate a parser. But that's a stretch..... > > That said, it is surprising that you are getting the exact same > numbers of results found, unless you default field isn't one that's > actually in your documents.... > > Best > Erick > > On Wed, Nov 17, 2010 at 1:48 PM, Matthew Hall<[EMAIL PROTECTED]>wrote: > >> Oh and to clarify what I expect to see. >> >> I expect to see the term in a multiword query to be put together with OR's >> (Unless I've set the default to be AND's in the solrconfig.xml) >> >> I'm guessing that what I'm going to need to do here is place all of my >> queries in parentheses for the fields that I expect I'm going to have >> multiple word queries. This isn't what I was expecting having come from >> Lucene before this. >> >> So instead of my query looking like this in the Solr admin >> >> abstract: mouse genome informatics >> >> I really need it looking like: >> >> abstract: (mouse genome informatics) so that the query parser understands >> that these are multiple terms for a single field. >> >> Or am I misunderstanding here? >> >> Matt >> >> >> On 11/17/2010 1:44 PM, Matthew Hall wrote: >> >>> I'm getting the result set that matches what it would be if I just >>> searched for the first word in the query. >>> >>> So I'm getting the results for mouse. >>> >>> And yes, abstract: is the name of the field. >>> >>> So a search for >>> >>> abstract: mouse >>> >>> would yield 69103 results >>> >>> abstract: mouse anythingelseIputhere >>> yields 69103 results, even if the second token doesn't exist anywhere in >>> the indexes. >>> >>> abstract: mouse verycommonterm >>> yields 69103 results, even if the second token would normally result in >>> say 150000 matches when searched on its own. >>> >>> I think what you are describing as being possible is what is happening, >>> the query parser is taking what's being passed to it >>> >>> abstract: foo bar >>> and really converting it into >>> >>> abstract: foo OR default:bar >>> >>> Matt >>> >>> On 11/17/2010 1:38 PM, kenf_nc wrote: >>> >>>> Multi word queries is the bread and butter of Solr/Lucene, so I'm not >>>> sure I >>>> understand the complete issue here. For clarity, is 'abstract' the name >>>> of >>>> your default text field, or is your query >>>> >>>> q=abstract: mouse genome >>>> >>>> if the latter, my thought was is it possible that the query is being >>>> converted into a query of >>>> q=abstract:mouse genome where mouse is looked for in the field abstract, >>>> and genome is compared to the default text field. This is a stab in the >>>> dark, I don't know what your data looks like. >>>> >>>> You say it doesn't work the way you expect, but you don't really say what >>>> you do see. Are you getting zero results, or fewer than you expected, or >>>> only results that match all fields (the AND proposition)? >>>> >>> >> -- Matthew Hall Software Engineer Mouse Genome Informatics [EMAIL PROTECTED] (207) 288-6012 |