|
|
-
Problem with updating a document or TermQuery with current trunk
Benson Margulies 2012-03-06, 13:35
I've posted a self-contained test case to github of a mystery. git://github.com/bimargulies/lucene-4-update-case.git The code can be seen at https://github.com/bimargulies/lucene-4-update-case/blob/master/src/test/java/org/apache/lucene/BadFieldTokenizedFlagTest.java. I write a doc to an index, close the index, then reopen and do a delete/add on the doc to add a field. If I iterate the docs in the index, all looks well, but when I try to query for the doc, it isn't found. To be a bit more specific, the doc has a field "field1" which is a StringField.TYPE_STORED, and it is a query on that field which comes up empty. I expect to learn that I've missed something obvious, and I offer thanks and apologies in advance. ---------------------------------------------------------------------
+
Benson Margulies 2012-03-06, 13:35
-
Re: Problem with updating a document or TermQuery with current trunk
Robert Muir 2012-03-06, 14:20
I think the issue is that your analyzer is standardanalyzer, yet field text value is "value-1" So standardanalyzer will tokenize this into two terms: "value" and "1" But later, you proceed to do TermQueries on "value-1". This term won't exist... TermQuery etc that take Term don't analyze any text. Instead usually higher-level things like QueryParsers analyze text into Terms. On Tue, Mar 6, 2012 at 8:35 AM, Benson Margulies <[EMAIL PROTECTED]> wrote: > I've posted a self-contained test case to github of a mystery. > > git://github.com/bimargulies/lucene-4-update-case.git > > The code can be seen at > https://github.com/bimargulies/lucene-4-update-case/blob/master/src/test/java/org/apache/lucene/BadFieldTokenizedFlagTest.java. > > I write a doc to an index, close the index, then reopen and do a > delete/add on the doc to add a field. If I iterate the docs in the > index, all looks well, but when I try to query for the doc, it isn't > found. > > To be a bit more specific, the doc has a field "field1" which is a > StringField.TYPE_STORED, and it is a query on that field which comes > up empty. > > I expect to learn that I've missed something obvious, and I offer > thanks and apologies in advance. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > -- lucidimagination.com ---------------------------------------------------------------------
+
Robert Muir 2012-03-06, 14:20
-
Re: Problem with updating a document or TermQuery with current trunk
Benson Margulies 2012-03-06, 14:23
On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir <[EMAIL PROTECTED]> wrote: > I think the issue is that your analyzer is standardanalyzer, yet field > text value is "value-1" Robert, Why is this field analyzed at all? It's built with StringField.TYPE_STORED. I'll push another copy that shows that it works fine when the doc is first added, and gets bad after the 'update', when the field acquires the 'tokenized' boolean mysteriously. --benson > > So standardanalyzer will tokenize this into two terms: "value" and "1" > > But later, you proceed to do TermQueries on "value-1". This term won't > exist... TermQuery etc that take Term don't analyze any text. > > Instead usually higher-level things like QueryParsers analyze text into Terms. > > On Tue, Mar 6, 2012 at 8:35 AM, Benson Margulies <[EMAIL PROTECTED]> wrote: >> I've posted a self-contained test case to github of a mystery. >> >> git://github.com/bimargulies/lucene-4-update-case.git >> >> The code can be seen at >> https://github.com/bimargulies/lucene-4-update-case/blob/master/src/test/java/org/apache/lucene/BadFieldTokenizedFlagTest.java. >> >> I write a doc to an index, close the index, then reopen and do a >> delete/add on the doc to add a field. If I iterate the docs in the >> index, all looks well, but when I try to query for the doc, it isn't >> found. >> >> To be a bit more specific, the doc has a field "field1" which is a >> StringField.TYPE_STORED, and it is a query on that field which comes >> up empty. >> >> I expect to learn that I've missed something obvious, and I offer >> thanks and apologies in advance. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> > > > > -- > lucidimagination.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > ---------------------------------------------------------------------
+
Benson Margulies 2012-03-06, 14:23
-
Re: Problem with updating a document or TermQuery with current trunk
Robert Muir 2012-03-06, 14:33
On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies <[EMAIL PROTECTED]> wrote: > On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir <[EMAIL PROTECTED]> wrote: >> I think the issue is that your analyzer is standardanalyzer, yet field >> text value is "value-1" > > Robert, > > Why is this field analyzed at all? It's built with StringField.TYPE_STORED. >
thanks Benson, you are right!
-- lucidimagination.com
---------------------------------------------------------------------
+
Robert Muir 2012-03-06, 14:33
-
Re: Problem with updating a document or TermQuery with current trunk
Michael McCandless 2012-03-06, 14:41
Hmm something is up here... I'll dig. Seems like we are somehow analyzing StringField when we shouldn't... Mike McCandless http://blog.mikemccandless.comOn Tue, Mar 6, 2012 at 9:33 AM, Robert Muir <[EMAIL PROTECTED]> wrote: > On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies <[EMAIL PROTECTED]> wrote: >> On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir <[EMAIL PROTECTED]> wrote: >>> I think the issue is that your analyzer is standardanalyzer, yet field >>> text value is "value-1" >> >> Robert, >> >> Why is this field analyzed at all? It's built with StringField.TYPE_STORED. >> > > thanks Benson, you are right! > > -- > lucidimagination.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > ---------------------------------------------------------------------
+
Michael McCandless 2012-03-06, 14:41
-
RE: Problem with updating a document or TermQuery with current trunk
Uwe Schindler 2012-03-06, 14:47
String field is analyzed, but with KeywordTokenizer, so all should be fine. ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.deeMail: [EMAIL PROTECTED] > -----Original Message----- > From: Michael McCandless [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, March 06, 2012 3:42 PM > To: [EMAIL PROTECTED] > Subject: Re: Problem with updating a document or TermQuery with current > trunk > > Hmm something is up here... I'll dig. Seems like we are somehow analyzing > StringField when we shouldn't... > > Mike McCandless > > http://blog.mikemccandless.com> > On Tue, Mar 6, 2012 at 9:33 AM, Robert Muir <[EMAIL PROTECTED]> wrote: > > On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies > <[EMAIL PROTECTED]> wrote: > >> On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir <[EMAIL PROTECTED]> wrote: > >>> I think the issue is that your analyzer is standardanalyzer, yet > >>> field text value is "value-1" > >> > >> Robert, > >> > >> Why is this field analyzed at all? It's built with StringField.TYPE_STORED. > >> > > > > thanks Benson, you are right! > > > > -- > > lucidimagination.com > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] ---------------------------------------------------------------------
+
Uwe Schindler 2012-03-06, 14:47
-
Re: Problem with updating a document or TermQuery with current trunk
Benson Margulies 2012-03-06, 14:58
On Tue, Mar 6, 2012 at 9:47 AM, Uwe Schindler <[EMAIL PROTECTED]> wrote: > String field is analyzed, but with KeywordTokenizer, so all should be fine. I filed LUCENE-3854. > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de> eMail: [EMAIL PROTECTED] > > >> -----Original Message----- >> From: Michael McCandless [mailto:[EMAIL PROTECTED]] >> Sent: Tuesday, March 06, 2012 3:42 PM >> To: [EMAIL PROTECTED] >> Subject: Re: Problem with updating a document or TermQuery with current >> trunk >> >> Hmm something is up here... I'll dig. Seems like we are somehow analyzing >> StringField when we shouldn't... >> >> Mike McCandless >> >> http://blog.mikemccandless.com>> >> On Tue, Mar 6, 2012 at 9:33 AM, Robert Muir <[EMAIL PROTECTED]> wrote: >> > On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies >> <[EMAIL PROTECTED]> wrote: >> >> On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir <[EMAIL PROTECTED]> wrote: >> >>> I think the issue is that your analyzer is standardanalyzer, yet >> >>> field text value is "value-1" >> >> >> >> Robert, >> >> >> >> Why is this field analyzed at all? It's built with > StringField.TYPE_STORED. >> >> >> > >> > thanks Benson, you are right! >> > >> > -- >> > lucidimagination.com >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: [EMAIL PROTECTED] >> > For additional commands, e-mail: [EMAIL PROTECTED] >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > ---------------------------------------------------------------------
+
Benson Margulies 2012-03-06, 14:58
-
Re: Problem with updating a document or TermQuery with current trunk
Robert Muir 2012-03-06, 15:04
Thanks Benson: look like the problem revolves around indexing Document/Fields you get back from IR.document... this has always been 'lossy', but I think this is a real API trap. Please keep testing :) On Tue, Mar 6, 2012 at 9:58 AM, Benson Margulies <[EMAIL PROTECTED]> wrote: > On Tue, Mar 6, 2012 at 9:47 AM, Uwe Schindler <[EMAIL PROTECTED]> wrote: >> String field is analyzed, but with KeywordTokenizer, so all should be fine. > > I filed LUCENE-3854. > >> >> ----- >> Uwe Schindler >> H.-H.-Meier-Allee 63, D-28213 Bremen >> http://www.thetaphi.de>> eMail: [EMAIL PROTECTED] >> >> >>> -----Original Message----- >>> From: Michael McCandless [mailto:[EMAIL PROTECTED]] >>> Sent: Tuesday, March 06, 2012 3:42 PM >>> To: [EMAIL PROTECTED] >>> Subject: Re: Problem with updating a document or TermQuery with current >>> trunk >>> >>> Hmm something is up here... I'll dig. Seems like we are somehow analyzing >>> StringField when we shouldn't... >>> >>> Mike McCandless >>> >>> http://blog.mikemccandless.com>>> >>> On Tue, Mar 6, 2012 at 9:33 AM, Robert Muir <[EMAIL PROTECTED]> wrote: >>> > On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies >>> <[EMAIL PROTECTED]> wrote: >>> >> On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir <[EMAIL PROTECTED]> wrote: >>> >>> I think the issue is that your analyzer is standardanalyzer, yet >>> >>> field text value is "value-1" >>> >> >>> >> Robert, >>> >> >>> >> Why is this field analyzed at all? It's built with >> StringField.TYPE_STORED. >>> >> >>> > >>> > thanks Benson, you are right! >>> > >>> > -- >>> > lucidimagination.com >>> > >>> > --------------------------------------------------------------------- >>> > To unsubscribe, e-mail: [EMAIL PROTECTED] >>> > For additional commands, e-mail: [EMAIL PROTECTED] >>> > >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>> For additional commands, e-mail: [EMAIL PROTECTED] >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > -- lucidimagination.com ---------------------------------------------------------------------
+
Robert Muir 2012-03-06, 15:04
-
Re: Problem with updating a document or TermQuery with current trunk
Benson Margulies 2012-03-06, 15:06
On Tue, Mar 6, 2012 at 10:04 AM, Robert Muir <[EMAIL PROTECTED]> wrote: > Thanks Benson: look like the problem revolves around indexing > Document/Fields you get back from IR.document... this has always been > 'lossy', but I think this is a real API trap. > > Please keep testing :) Got a suggestion for sneaking around this in the mean time? > > On Tue, Mar 6, 2012 at 9:58 AM, Benson Margulies <[EMAIL PROTECTED]> wrote: >> On Tue, Mar 6, 2012 at 9:47 AM, Uwe Schindler <[EMAIL PROTECTED]> wrote: >>> String field is analyzed, but with KeywordTokenizer, so all should be fine. >> >> I filed LUCENE-3854. >> >>> >>> ----- >>> Uwe Schindler >>> H.-H.-Meier-Allee 63, D-28213 Bremen >>> http://www.thetaphi.de>>> eMail: [EMAIL PROTECTED] >>> >>> >>>> -----Original Message----- >>>> From: Michael McCandless [mailto:[EMAIL PROTECTED]] >>>> Sent: Tuesday, March 06, 2012 3:42 PM >>>> To: [EMAIL PROTECTED] >>>> Subject: Re: Problem with updating a document or TermQuery with current >>>> trunk >>>> >>>> Hmm something is up here... I'll dig. Seems like we are somehow analyzing >>>> StringField when we shouldn't... >>>> >>>> Mike McCandless >>>> >>>> http://blog.mikemccandless.com>>>> >>>> On Tue, Mar 6, 2012 at 9:33 AM, Robert Muir <[EMAIL PROTECTED]> wrote: >>>> > On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies >>>> <[EMAIL PROTECTED]> wrote: >>>> >> On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir <[EMAIL PROTECTED]> wrote: >>>> >>> I think the issue is that your analyzer is standardanalyzer, yet >>>> >>> field text value is "value-1" >>>> >> >>>> >> Robert, >>>> >> >>>> >> Why is this field analyzed at all? It's built with >>> StringField.TYPE_STORED. >>>> >> >>>> > >>>> > thanks Benson, you are right! >>>> > >>>> > -- >>>> > lucidimagination.com >>>> > >>>> > --------------------------------------------------------------------- >>>> > To unsubscribe, e-mail: [EMAIL PROTECTED] >>>> > For additional commands, e-mail: [EMAIL PROTECTED] >>>> > >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>>> For additional commands, e-mail: [EMAIL PROTECTED] >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>> For additional commands, e-mail: [EMAIL PROTECTED] >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> > > > > -- > lucidimagination.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > ---------------------------------------------------------------------
+
Benson Margulies 2012-03-06, 15:06
-
Re: Problem with updating a document or TermQuery with current trunk
Michael McCandless 2012-03-06, 15:07
On Tue, Mar 6, 2012 at 10:06 AM, Benson Margulies <[EMAIL PROTECTED]> wrote: > On Tue, Mar 6, 2012 at 10:04 AM, Robert Muir <[EMAIL PROTECTED]> wrote: >> Thanks Benson: look like the problem revolves around indexing >> Document/Fields you get back from IR.document... this has always been >> 'lossy', but I think this is a real API trap. >> >> Please keep testing :) > > Got a suggestion for sneaking around this in the mean time? I just put a comment on the issue: you have to build a new Document rather than re-index a Document loaded from IR.document. Mike McCandless http://blog.mikemccandless.com---------------------------------------------------------------------
+
Michael McCandless 2012-03-06, 15:07
-
Re: Problem with updating a document or TermQuery with current trunk
Benson Margulies 2012-03-06, 14:42
On Tue, Mar 6, 2012 at 9:33 AM, Robert Muir <[EMAIL PROTECTED]> wrote: > On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies <[EMAIL PROTECTED]> wrote: >> On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir <[EMAIL PROTECTED]> wrote: >>> I think the issue is that your analyzer is standardanalyzer, yet field >>> text value is "value-1" >> >> Robert, >> >> Why is this field analyzed at all? It's built with StringField.TYPE_STORED. >> > > thanks Benson, you are right!
So, should I attach this to a JIRA? > > -- > lucidimagination.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] >
---------------------------------------------------------------------
+
Benson Margulies 2012-03-06, 14:42
-
Re: Problem with updating a document or TermQuery with current trunk
Benson Margulies 2012-03-06, 14:24
On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies <[EMAIL PROTECTED]> wrote: > On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir <[EMAIL PROTECTED]> wrote: >> I think the issue is that your analyzer is standardanalyzer, yet field >> text value is "value-1" > > Robert, > > Why is this field analyzed at all? It's built with StringField.TYPE_STORED. > > I'll push another copy that shows that it works fine when the doc is > first added, and gets bad after the 'update', when the field acquires > the 'tokenized' boolean mysteriously. I pushed a new copy that runs the query successfully before the 'delete/add' sequence, and then fails afterwards. > > --benson > > >> >> So standardanalyzer will tokenize this into two terms: "value" and "1" >> >> But later, you proceed to do TermQueries on "value-1". This term won't >> exist... TermQuery etc that take Term don't analyze any text. >> >> Instead usually higher-level things like QueryParsers analyze text into Terms. >> >> On Tue, Mar 6, 2012 at 8:35 AM, Benson Margulies <[EMAIL PROTECTED]> wrote: >>> I've posted a self-contained test case to github of a mystery. >>> >>> git://github.com/bimargulies/lucene-4-update-case.git >>> >>> The code can be seen at >>> https://github.com/bimargulies/lucene-4-update-case/blob/master/src/test/java/org/apache/lucene/BadFieldTokenizedFlagTest.java. >>> >>> I write a doc to an index, close the index, then reopen and do a >>> delete/add on the doc to add a field. If I iterate the docs in the >>> index, all looks well, but when I try to query for the doc, it isn't >>> found. >>> >>> To be a bit more specific, the doc has a field "field1" which is a >>> StringField.TYPE_STORED, and it is a query on that field which comes >>> up empty. >>> >>> I expect to learn that I've missed something obvious, and I offer >>> thanks and apologies in advance. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>> For additional commands, e-mail: [EMAIL PROTECTED] >>> >> >> >> >> -- >> lucidimagination.com >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> ---------------------------------------------------------------------
+
Benson Margulies 2012-03-06, 14:24
|
|