|
|
-
Multi field search with values
Deb Lucene 2012-03-14, 15:32
Hi Group,
I am working on a Lucene search solution for multiple fields. So far, if the fields are of string type I am having no difficulties in retrieving using the MultiFieldQueryParser. For example, my indexing and searching logic look like this -
indexing - I am indexing a corpus on the content of the documents and some keywords of the documents.
********************************************** String doc = getText(id) ; List<String> keywords = getKeywords(doc); document.add(new Field("content", doc, Field.Store.NO,Field.Index.ANALYZED, Field.TermVector.YES)); for ( String keyword : keywords ) { document.add(new Field("keyword", keyword, Field.Store.NO, Field.Index.ANALYZED, Field.TermVector.YES)); } ********************************************* I am searching over the indexes using some query text and predefined keywords searching : ******************************************** String queryText = getQuery(); String keyword = getKeyword(); BooleanClause.Occur[] flags {BooleanClause.Occur.SHOULD,BooleanClause.Occur.SHOULD}; Query query = MultiFieldQueryParser.parse(Version.LUCENE_33, new String[] {queryText, keyword}, new String[]{"content","keywords"}, flags, stAnalyzer); [stAnalyzer is the standard analyzer]
TopDocs hits = isearcher.search(query, 20);
********************************************
This code is working fine. But now suppose I add one more field (a "threshold" set on some prior calculation) which is of numeric type. NumericField field = new NumericField("threshold") ; document.add(field.setDoubleValue(threhold));
Now can I search over multiple fields using the "string" type (i.e. content and keywords) with the "double" type (i.e. the threshold)? I am particularly looking for a query such as - query - "some content" and "some keywords" and threshold > 0.5.
I surmise I need to use the "numeric field search" technique but not sure how to add the functionality in MultiFieldQueryParser.
Thanks in advance, --d
-
Re: Multi field search with values
Ian Lea 2012-03-14, 15:52
It the keywords are already in the document body (field "content") I don't see what you gain by indexing them separately and using MFQP. But that isn't what you are asking. To add a threshold to the query do something like this:
BooleanQuery bq = new BooleanQuery(); Query qm = build existing query as now; bq.add(qm, ....); Query qthresh = NumericRangeQuery,whatever(whatever...); bq.add(qthresh, ...)
and use bq in the search call. -- Ian. On Wed, Mar 14, 2012 at 3:32 PM, Deb Lucene <[EMAIL PROTECTED]> wrote: > Hi Group, > > I am working on a Lucene search solution for multiple fields. So far, if > the fields are of string type I am having no difficulties in retrieving > using the MultiFieldQueryParser. For example, my indexing and searching > logic look like this - > > indexing > - I am indexing a corpus on the content of the documents and some keywords > of the documents. > > ********************************************** > String doc = getText(id) ; > List<String> keywords = getKeywords(doc); > document.add(new Field("content", doc, Field.Store.NO,Field.Index.ANALYZED, > Field.TermVector.YES)); > for ( String keyword : keywords ) > { > document.add(new Field("keyword", keyword, Field.Store.NO, > Field.Index.ANALYZED, Field.TermVector.YES)); > } > ********************************************* > I am searching over the indexes using some query text and predefined > keywords > searching : > ******************************************** > String queryText = getQuery(); > String keyword = getKeyword(); > BooleanClause.Occur[] flags > {BooleanClause.Occur.SHOULD,BooleanClause.Occur.SHOULD}; > Query query = MultiFieldQueryParser.parse(Version.LUCENE_33, new String[] > {queryText, keyword}, > new String[]{"content","keywords"}, flags, stAnalyzer); > [stAnalyzer is the standard analyzer] > > TopDocs hits = isearcher.search(query, 20); > > ******************************************** > > This code is working fine. But now suppose I add one more field (a > "threshold" set on some prior calculation) which is of numeric type. > NumericField field = new NumericField("threshold") ; > document.add(field.setDoubleValue(threhold)); > > Now can I search over multiple fields using the "string" type (i.e. content > and keywords) with the "double" type (i.e. the threshold)? > I am particularly looking for a query such as - > query - "some content" and "some keywords" and threshold > 0.5. > > I surmise I need to use the "numeric field search" technique but not sure > how to add the functionality in MultiFieldQueryParser. > > Thanks in advance, > --d
---------------------------------------------------------------------
-
Re: Multi field search with values
Deb Lucene 2012-03-14, 17:03
Hi Ian,
thanks a lot for your idea. Yes, it is working now. thanks again
--d
On Wed, Mar 14, 2012 at 11:52 AM, Ian Lea <[EMAIL PROTECTED]> wrote:
> It the keywords are already in the document body (field "content") I > don't see what you gain by indexing them separately and using MFQP. > But that isn't what you are asking. To add a threshold to the query > do something like this: > > BooleanQuery bq = new BooleanQuery(); > Query qm = build existing query as now; > bq.add(qm, ....); > Query qthresh = NumericRangeQuery,whatever(whatever...); > bq.add(qthresh, ...) > > and use bq in the search call. > > > -- > Ian. > > > On Wed, Mar 14, 2012 at 3:32 PM, Deb Lucene <[EMAIL PROTECTED]> wrote: > > Hi Group, > > > > I am working on a Lucene search solution for multiple fields. So far, if > > the fields are of string type I am having no difficulties in retrieving > > using the MultiFieldQueryParser. For example, my indexing and searching > > logic look like this - > > > > indexing > > - I am indexing a corpus on the content of the documents and some > keywords > > of the documents. > > > > ********************************************** > > String doc = getText(id) ; > > List<String> keywords = getKeywords(doc); > > document.add(new Field("content", doc, Field.Store.NO > ,Field.Index.ANALYZED, > > Field.TermVector.YES)); > > for ( String keyword : keywords ) > > { > > document.add(new Field("keyword", keyword, Field.Store.NO, > > Field.Index.ANALYZED, Field.TermVector.YES)); > > } > > ********************************************* > > I am searching over the indexes using some query text and predefined > > keywords > > searching : > > ******************************************** > > String queryText = getQuery(); > > String keyword = getKeyword(); > > BooleanClause.Occur[] flags > > {BooleanClause.Occur.SHOULD,BooleanClause.Occur.SHOULD}; > > Query query = MultiFieldQueryParser.parse(Version.LUCENE_33, new > String[] > > {queryText, keyword}, > > new String[]{"content","keywords"}, flags, stAnalyzer); > > [stAnalyzer is the standard analyzer] > > > > TopDocs hits = isearcher.search(query, 20); > > > > ******************************************** > > > > This code is working fine. But now suppose I add one more field (a > > "threshold" set on some prior calculation) which is of numeric type. > > NumericField field = new NumericField("threshold") ; > > document.add(field.setDoubleValue(threhold)); > > > > Now can I search over multiple fields using the "string" type (i.e. > content > > and keywords) with the "double" type (i.e. the threshold)? > > I am particularly looking for a query such as - > > query - "some content" and "some keywords" and threshold > 0.5. > > > > I surmise I need to use the "numeric field search" technique but not sure > > how to add the functionality in MultiFieldQueryParser. > > > > Thanks in advance, > > --d > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >
-
Re: Multi field search with values
Deb Lucene 2012-03-20, 20:05
Hi group,
Is there any way to index a document based on a key value (key = text, value = double) pair? For example, we have a situation where -
document 1 IBM - 0.5 Google - 0.9 Apple - 0.3 document 2 IBM - 0.6 Google - 0.1 Apple - 0.4
now we need to search using two fields, the name (e.g. "IBM", "Apple") and the score ( > 0.5 etc). A typical search query would be - "name == "IBM" & value > 0.5 . Previously we have done experiments with MFQP and Numeric Field Query - but here we need to link the fields.
Thanks in advance. --d
|
|