|
Tim Eck
2012-05-08, 20:08
Tim Eck
2012-05-08, 20:48
Tim Eck
2012-05-09, 17:21
Tim Eck
2012-05-09, 17:27
Tim Eck
2012-05-09, 17:33
Tim Eck
2012-05-09, 17:38
Ian Lea
2012-05-10, 08:20
Tim Eck
2012-05-10, 08:26
Ian Lea
2012-05-10, 08:43
Michael McCandless
2012-05-10, 09:58
|
-
update/re-add an existing document with numeric fieldsTim Eck 2012-05-08, 20:08
Note: I'm bound to lucene 3.0.3 for the context of this question, but
I would be interested to know if newer versions would help me here. I have an existing document in my directory that has one regular String field and one numeric field. I naively thought I could update that document to change the String field with code like this: FSDirectory dir = FSDirectory.open(...); IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_30), MaxFieldLength.UNLIMITED); // doc has 2 fields, one String and the other numeric Document doc = new Document(); doc.add(new Field("string", "value", Store.YES, Index.ANALYZED_NO_NORMS)); NumericField nf = new NumericField("numeric", Field.Store.YES, true); nf.setIntValue(42); doc.add(nf); writer.addDocument(doc); writer.commit(); // make sure we can query on the numeric field IndexSearcher searcher = new IndexSearcher(dir); TopDocs docs = searcher.search(new TermQuery(new Term("numeric", NumericUtils.intToPrefixCoded(42))), 1); if (docs.totalHits != 1) { throw new AssertionError(); } doc = searcher.doc(docs.scoreDocs[0].doc); searcher.close(); // update document with new value for string field doc.removeField("string"); doc.add(new Field("string", "value2", Store.YES, Index.ANALYZED_NO_NORMS)); writer.updateDocument(new Term("string", "value"), doc); writer.commit(); // search again searcher = new IndexSearcher(dir); docs = searcher.search(new TermQuery(new Term("numeric", NumericUtils.intToPrefixCoded(42))), 1); if (docs.totalHits != 1) { throw new AssertionError(docs.totalHits); } That doesn't seem to work however. It seems I need to get the NumericField rematerialized in the document passed to updateDocument(). I was hoping to avoid that if possible so I'm looking for any suggestions someone might offer. ---------------------------------------------------------------------
-
update/re-add an existing document with numeric fieldsTim Eck 2012-05-08, 20:48
Note: I'm bound to lucene 3.0.3 for the context of this question, but
I would be interested to know if newer versions would help me here. I have an existing document in my directory that has one regular String field and one numeric field. I naively thought I could update that document to change the String field with code like this: FSDirectory dir = FSDirectory.open(...); IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_30), MaxFieldLength.UNLIMITED); // doc has 2 fields, one String and the other numeric Document doc = new Document(); doc.add(new Field("string", "value", Store.YES, Index.ANALYZED_NO_NORMS)); NumericField nf = new NumericField("numeric", Field.Store.YES, true); nf.setIntValue(42); doc.add(nf); writer.addDocument(doc); writer.commit(); // make sure we can query on the numeric field IndexSearcher searcher = new IndexSearcher(dir); TopDocs docs = searcher.search(new TermQuery(new Term("numeric", NumericUtils.intToPrefixCoded(42))), 1); if (docs.totalHits != 1) { throw new AssertionError(); } doc = searcher.doc(docs.scoreDocs[0].doc); searcher.close(); // update document with new value for string field doc.removeField("string"); doc.add(new Field("string", "value2", Store.YES, Index.ANALYZED_NO_NORMS)); writer.updateDocument(new Term("string", "value"), doc); writer.commit(); // search again searcher = new IndexSearcher(dir); docs = searcher.search(new TermQuery(new Term("numeric", NumericUtils.intToPrefixCoded(42))), 1); if (docs.totalHits != 1) { throw new AssertionError(docs.totalHits); } That doesn't seem to work however. It seems I need to get the NumericField rematerialized in the document passed to updateDocument(). I was hoping to avoid that if possible so I'm looking for any suggestions someone might offer. ---------------------------------------------------------------------
-
update/re-add an existing document with numeric fieldsTim Eck 2012-05-09, 17:21
Note: I'm bound to lucene 3.0.3 for the context of this question, but
I would be interested to know if newer versions would help me here. I have an existing document in my directory that has one regular String field and one numeric field. I naively thought I could update that document to change the String field with code like this: FSDirectory dir = FSDirectory.open(...); IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_30), MaxFieldLength.UNLIMITED); // doc has 2 fields, one String and the other numeric Document doc = new Document(); doc.add(new Field("string", "value", Store.YES, Index.ANALYZED_NO_NORMS)); NumericField nf = new NumericField("numeric", Field.Store.YES, true); nf.setIntValue(42); doc.add(nf); writer.addDocument(doc); writer.commit(); // make sure we can query on the numeric field IndexSearcher searcher = new IndexSearcher(dir); TopDocs docs = searcher.search(new TermQuery(new Term("numeric", NumericUtils.intToPrefixCoded(42))), 1); if (docs.totalHits != 1) { throw new AssertionError(); } doc = searcher.doc(docs.scoreDocs[0].doc); searcher.close(); // update document with new value for string field doc.removeField("string"); doc.add(new Field("string", "value2", Store.YES, Index.ANALYZED_NO_NORMS)); writer.updateDocument(new Term("string", "value"), doc); writer.commit(); // search again searcher = new IndexSearcher(dir); docs = searcher.search(new TermQuery(new Term("numeric", NumericUtils.intToPrefixCoded(42))), 1); if (docs.totalHits != 1) { throw new AssertionError(docs.totalHits); } That doesn't seem to work however. It seems I need to get the NumericField rematerialized in the document passed to updateDocument(). I was hoping to avoid that if possible so I'm looking for any suggestions someone might offer.
-
update/re-add an existing document with numeric fieldsTim Eck 2012-05-09, 17:27
Note: I'm bound to lucene 3.0.3 for the context of this question, but
I would be interested to know if newer versions would help me here. I have an existing document in my directory that has one regular String field and one numeric field. I naively thought I could update that document to change the String field with code like this: FSDirectory dir = FSDirectory.open(...); IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_30), MaxFieldLength.UNLIMITED); // doc has 2 fields, one String and the other numeric Document doc = new Document(); doc.add(new Field("string", "value", Store.YES, Index.ANALYZED_NO_NORMS)); NumericField nf = new NumericField("numeric", Field.Store.YES, true); nf.setIntValue(42); doc.add(nf); writer.addDocument(doc); writer.commit(); // make sure we can query on the numeric field IndexSearcher searcher = new IndexSearcher(dir); TopDocs docs = searcher.search(new TermQuery(new Term("numeric", NumericUtils.intToPrefixCoded(42))), 1); if (docs.totalHits != 1) { throw new AssertionError(); } doc = searcher.doc(docs.scoreDocs[0].doc); searcher.close(); // update document with new value for string field doc.removeField("string"); doc.add(new Field("string", "value2", Store.YES, Index.ANALYZED_NO_NORMS)); writer.updateDocument(new Term("string", "value"), doc); writer.commit(); // search again searcher = new IndexSearcher(dir); docs = searcher.search(new TermQuery(new Term("numeric", NumericUtils.intToPrefixCoded(42))), 1); if (docs.totalHits != 1) { throw new AssertionError(docs.totalHits); } That doesn't seem to work however. It seems I need to get the NumericField rematerialized in the document passed to updateDocument(). I was hoping to avoid that if possible so I'm looking for any suggestions someone might offer.
-
update/re-add an existing document with numeric fieldsTim Eck 2012-05-09, 17:33
Note: I'm bound to lucene 3.0.3 for the context of this question, but
I would be interested to know if newer versions would help me here. I have an existing document in my directory that has one regular String field and one numeric field. I naively thought I could update that document to change the String field with code like this: FSDirectory dir = FSDirectory.open(...); IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_30), MaxFieldLength.UNLIMITED); // doc has 2 fields, one String and the other numeric Document doc = new Document(); doc.add(new Field("string", "value", Store.YES, Index.ANALYZED_NO_NORMS)); NumericField nf = new NumericField("numeric", Field.Store.YES, true); nf.setIntValue(42); doc.add(nf); writer.addDocument(doc); writer.commit(); // make sure we can query on the numeric field IndexSearcher searcher = new IndexSearcher(dir); TopDocs docs = searcher.search(new TermQuery(new Term("numeric", NumericUtils.intToPrefixCoded(42))), 1); if (docs.totalHits != 1) { throw new AssertionError(); } doc = searcher.doc(docs.scoreDocs[0].doc); searcher.close(); // update document with new value for string field doc.removeField("string"); doc.add(new Field("string", "value2", Store.YES, Index.ANALYZED_NO_NORMS)); writer.updateDocument(new Term("string", "value"), doc); writer.commit(); // search again searcher = new IndexSearcher(dir); docs = searcher.search(new TermQuery(new Term("numeric", NumericUtils.intToPrefixCoded(42))), 1); if (docs.totalHits != 1) { throw new AssertionError(docs.totalHits); } That doesn't seem to work however. It seems I need to get the NumericField rematerialized in the document passed to updateDocument(). I was hoping to avoid that if possible so I'm looking for any suggestions someone might offer.
-
update/re-add an existing document with numeric fieldsTim Eck 2012-05-09, 17:38
Note: I'm bound to lucene 3.0.3 for the context of this question, but
I would be interested to know if newer versions would help me here. I have an existing document in my directory that has one regular String field and one numeric field. I naively thought I could update that document to change the String field with code like this: FSDirectory dir = FSDirectory.open(...); IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_30), MaxFieldLength.UNLIMITED); // doc has 2 fields, one String and the other numeric Document doc = new Document(); doc.add(new Field("string", "value", Store.YES, Index.ANALYZED_NO_NORMS)); NumericField nf = new NumericField("numeric", Field.Store.YES, true); nf.setIntValue(42); doc.add(nf); writer.addDocument(doc); writer.commit(); // make sure we can query on the numeric field IndexSearcher searcher = new IndexSearcher(dir); TopDocs docs = searcher.search(new TermQuery(new Term("numeric", NumericUtils.intToPrefixCoded(42))), 1); if (docs.totalHits != 1) { throw new AssertionError(); } doc = searcher.doc(docs.scoreDocs[0].doc); searcher.close(); // update document with new value for string field doc.removeField("string"); doc.add(new Field("string", "value2", Store.YES, Index.ANALYZED_NO_NORMS)); writer.updateDocument(new Term("string", "value"), doc); writer.commit(); // search again searcher = new IndexSearcher(dir); docs = searcher.search(new TermQuery(new Term("numeric", NumericUtils.intToPrefixCoded(42))), 1); if (docs.totalHits != 1) { throw new AssertionError(docs.totalHits); } That doesn't seem to work however. It seems I need to get the NumericField rematerialized in the document passed to updateDocument(). I was hoping to avoid that if possible so I'm looking for any suggestions someone might offer.
-
Re: update/re-add an existing document with numeric fieldsIan Lea 2012-05-10, 08:20
You can't selectively update fields in docs read from an index, in old
or current versions of lucene. I think there are some ideas floating around but nothing usable today as far as I know. You'll need to rebuild the whole doc before passing it to writer.updateDocument(). -- Ian. On Wed, May 9, 2012 at 6:38 PM, Tim Eck <[EMAIL PROTECTED]> wrote: > Note: I'm bound to lucene 3.0.3 for the context of this question, but > I would be interested to know if newer versions would help me here. > > I have an existing document in my directory that has one regular > String field and one numeric field. I naively thought I could update > that document to change the String field with code like this: > > FSDirectory dir = FSDirectory.open(...); > IndexWriter writer = new IndexWriter(dir, new > StandardAnalyzer(Version.LUCENE_30), MaxFieldLength.UNLIMITED); > > // doc has 2 fields, one String and the other numeric > Document doc = new Document(); > doc.add(new Field("string", "value", Store.YES, > Index.ANALYZED_NO_NORMS)); > NumericField nf = new NumericField("numeric", Field.Store.YES, > true); > nf.setIntValue(42); > doc.add(nf); > writer.addDocument(doc); > writer.commit(); > > // make sure we can query on the numeric field > IndexSearcher searcher = new IndexSearcher(dir); > TopDocs docs = searcher.search(new TermQuery(new Term("numeric", > NumericUtils.intToPrefixCoded(42))), 1); > if (docs.totalHits != 1) { > throw new AssertionError(); > } > doc = searcher.doc(docs.scoreDocs[0].doc); > searcher.close(); > > // update document with new value for string field > doc.removeField("string"); > doc.add(new Field("string", "value2", Store.YES, > Index.ANALYZED_NO_NORMS)); > writer.updateDocument(new Term("string", "value"), doc); > writer.commit(); > > // search again > searcher = new IndexSearcher(dir); > docs = searcher.search(new TermQuery(new Term("numeric", > NumericUtils.intToPrefixCoded(42))), 1); > if (docs.totalHits != 1) { > throw new AssertionError(docs.totalHits); > } > > > That doesn't seem to work however. It seems I need to get the > NumericField rematerialized in the document passed to > updateDocument(). I was hoping to avoid that if possible so > I'm looking for any suggestions someone might offer. > > > > > > > > ---------------------------------------------------------------------
-
RE: update/re-add an existing document with numeric fieldsTim Eck 2012-05-10, 08:26
Thanks for the response (and sorry for the excessive duplicate posting to
the list, that obviously wasn't on purpose). I should have explicitly asked this, but one specific thing I wondered was whether LUCENE-3065 would make any difference in my example program (I haven't had time to test it yet) In the meantime I have gone through the motions to rebuild my doc from whole cloth and I'm reasonably sure it is working me :-) Thanks! -----Original Message----- From: Ian Lea [mailto:[EMAIL PROTECTED]] Sent: Thursday, May 10, 2012 1:20 AM To: [EMAIL PROTECTED] Subject: Re: update/re-add an existing document with numeric fields You can't selectively update fields in docs read from an index, in old or current versions of lucene. I think there are some ideas floating around but nothing usable today as far as I know. You'll need to rebuild the whole doc before passing it to writer.updateDocument(). -- Ian. On Wed, May 9, 2012 at 6:38 PM, Tim Eck <[EMAIL PROTECTED]> wrote: > Note: I'm bound to lucene 3.0.3 for the context of this question, but > I would be interested to know if newer versions would help me here. > > I have an existing document in my directory that has one regular > String field and one numeric field. I naively thought I could update > that document to change the String field with code like this: > > �FSDirectory dir = FSDirectory.open(...); > �IndexWriter writer = new IndexWriter(dir, new > � � �StandardAnalyzer(Version.LUCENE_30), MaxFieldLength.UNLIMITED); > > �// doc has 2 fields, one String and the other numeric > �Document doc = new Document(); > �doc.add(new Field("string", "value", Store.YES, > � � �Index.ANALYZED_NO_NORMS)); > �NumericField nf = new NumericField("numeric", Field.Store.YES, > � � �true); > �nf.setIntValue(42); > �doc.add(nf); > �writer.addDocument(doc); > �writer.commit(); > > �// make sure we can query on the numeric field > �IndexSearcher searcher = new IndexSearcher(dir); > �TopDocs docs = searcher.search(new TermQuery(new Term("numeric", > � � �NumericUtils.intToPrefixCoded(42))), 1); > �if (docs.totalHits != 1) { > � � �throw new AssertionError(); > �} > �doc = searcher.doc(docs.scoreDocs[0].doc); > �searcher.close(); > > �// update document with new value for string field > �doc.removeField("string"); > �doc.add(new Field("string", "value2", Store.YES, > � � �Index.ANALYZED_NO_NORMS)); > �writer.updateDocument(new Term("string", "value"), doc); > �writer.commit(); > > �// search again > �searcher = new IndexSearcher(dir); > �docs = searcher.search(new TermQuery(new Term("numeric", > � � �NumericUtils.intToPrefixCoded(42))), 1); > �if (docs.totalHits != 1) { > � � �throw new AssertionError(docs.totalHits); > �} > > > That doesn't seem to work however. It seems I need to get the > NumericField rematerialized in the document passed to > updateDocument(). I was hoping to avoid that if possible so > I'm looking for any suggestions someone might offer. > > > > > > > > --------------------------------------------------------------------- ---------------------------------------------------------------------
-
Re: update/re-add an existing document with numeric fieldsIan Lea 2012-05-10, 08:43
My guess is that LUCENE-3065 won't help. Couldn't do the read/update
stuff before NumericFields came along and still can't, with or without that patch. -- Ian. On Thu, May 10, 2012 at 9:26 AM, Tim Eck <[EMAIL PROTECTED]> wrote: > Thanks for the response (and sorry for the excessive duplicate posting to > the list, that obviously wasn't on purpose). > > I should have explicitly asked this, but one specific thing I wondered was > whether LUCENE-3065 would make any difference in my example program (I > haven't had time to test it yet) > > In the meantime I have gone through the motions to rebuild my doc from > whole cloth and I'm reasonably sure it is working me :-) > > Thanks! > > -----Original Message----- > From: Ian Lea [mailto:[EMAIL PROTECTED]] > Sent: Thursday, May 10, 2012 1:20 AM > To: [EMAIL PROTECTED] > Subject: Re: update/re-add an existing document with numeric fields > > You can't selectively update fields in docs read from an index, in old > or current versions of lucene. I think there are some ideas floating > around but nothing usable today as far as I know. You'll need to > rebuild the whole doc before passing it to writer.updateDocument(). > > > -- > Ian. > > > On Wed, May 9, 2012 at 6:38 PM, Tim Eck <[EMAIL PROTECTED]> wrote: >> Note: I'm bound to lucene 3.0.3 for the context of this question, but >> I would be interested to know if newer versions would help me here. >> >> I have an existing document in my directory that has one regular >> String field and one numeric field. I naively thought I could update >> that document to change the String field with code like this: >> >> áFSDirectory dir = FSDirectory.open(...); >> áIndexWriter writer = new IndexWriter(dir, new >> á á áStandardAnalyzer(Version.LUCENE_30), MaxFieldLength.UNLIMITED); >> >> á// doc has 2 fields, one String and the other numeric >> áDocument doc = new Document(); >> ádoc.add(new Field("string", "value", Store.YES, >> á á áIndex.ANALYZED_NO_NORMS)); >> áNumericField nf = new NumericField("numeric", Field.Store.YES, >> á á átrue); >> ánf.setIntValue(42); >> ádoc.add(nf); >> áwriter.addDocument(doc); >> áwriter.commit(); >> >> á// make sure we can query on the numeric field >> áIndexSearcher searcher = new IndexSearcher(dir); >> áTopDocs docs = searcher.search(new TermQuery(new Term("numeric", >> á á áNumericUtils.intToPrefixCoded(42))), 1); >> áif (docs.totalHits != 1) { >> á á áthrow new AssertionError(); >> á} >> ádoc = searcher.doc(docs.scoreDocs[0].doc); >> ásearcher.close(); >> >> á// update document with new value for string field >> ádoc.removeField("string"); >> ádoc.add(new Field("string", "value2", Store.YES, >> á á áIndex.ANALYZED_NO_NORMS)); >> áwriter.updateDocument(new Term("string", "value"), doc); >> áwriter.commit(); >> >> á// search again >> ásearcher = new IndexSearcher(dir); >> ádocs = searcher.search(new TermQuery(new Term("numeric", >> á á áNumericUtils.intToPrefixCoded(42))), 1); >> áif (docs.totalHits != 1) { >> á á áthrow new AssertionError(docs.totalHits); >> á} >> >> >> That doesn't seem to work however. It seems I need to get the >> NumericField rematerialized in the document passed to >> updateDocument(). I was hoping to avoid that if possible so >> I'm looking for any suggestions someone might offer. >> >> >> >> >> >> >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > ---------------------------------------------------------------------
-
Re: update/re-add an existing document with numeric fieldsMichael McCandless 2012-05-10, 09:58
This is actually due to a bug:
https://issues.apache.org/jira/browse/LUCENE-3065 which was ixed in 3.2. The bug is that, prior to Lucene 3.2, if you stored a NumericField, when you later load that document, the field is converted to an ordinary Field (no longer numeric), so when you then index that retrieved document you lost its numeric-ness. That said, retrieving a doc and reindexing it is dangerous because in general Lucene does not ensure all details are preserved. For example, boost is never returned correctly, whether a field was indexed, and whether term vectors were indexed, are all not preserved. So in general you shouldn't assume you can just load a document, modify it a bit, re-index it, and not lose something... Mike McCandless http://blog.mikemccandless.com On Wed, May 9, 2012 at 1:33 PM, Tim Eck <[EMAIL PROTECTED]> wrote: > Note: I'm bound to lucene 3.0.3 for the context of this question, but > I would be interested to know if newer versions would help me here. > > I have an existing document in my directory that has one regular > String field and one numeric field. I naively thought I could update > that document to change the String field with code like this: > > FSDirectory dir = FSDirectory.open(...); > IndexWriter writer = new IndexWriter(dir, new > StandardAnalyzer(Version.LUCENE_30), MaxFieldLength.UNLIMITED); > > // doc has 2 fields, one String and the other numeric > Document doc = new Document(); > doc.add(new Field("string", "value", Store.YES, > Index.ANALYZED_NO_NORMS)); > NumericField nf = new NumericField("numeric", Field.Store.YES, > true); > nf.setIntValue(42); > doc.add(nf); > writer.addDocument(doc); > writer.commit(); > > // make sure we can query on the numeric field > IndexSearcher searcher = new IndexSearcher(dir); > TopDocs docs = searcher.search(new TermQuery(new Term("numeric", > NumericUtils.intToPrefixCoded(42))), 1); > if (docs.totalHits != 1) { > throw new AssertionError(); > } > doc = searcher.doc(docs.scoreDocs[0].doc); > searcher.close(); > > // update document with new value for string field > doc.removeField("string"); > doc.add(new Field("string", "value2", Store.YES, > Index.ANALYZED_NO_NORMS)); > writer.updateDocument(new Term("string", "value"), doc); > writer.commit(); > > // search again > searcher = new IndexSearcher(dir); > docs = searcher.search(new TermQuery(new Term("numeric", > NumericUtils.intToPrefixCoded(42))), 1); > if (docs.totalHits != 1) { > throw new AssertionError(docs.totalHits); > } > > > That doesn't seem to work however. It seems I need to get the > NumericField rematerialized in the document passed to > updateDocument(). I was hoping to avoid that if possible so > I'm looking for any suggestions someone might offer. --------------------------------------------------------------------- |