|
|
-
search for alphabetic version of numbers
Alireza Salimi 2012-06-11, 18:41
Hi everybody,
I have the requirement to support searching for numbers with their alphabetic or by digits. For example, if we have a document with a field's value of '200', if we search for "two hundred", that document should match.
I haven't found anything like this yet. Do we have other option than define the most common numbers and their string versions as synonyms?
Thanks -- Alireza Salimi Java EE Developer
-
Re: search for alphabetic version of numbers
Jack Krupansky 2012-06-11, 19:41
You can certainly do a modest number of special cases as replacement synonyms, but if you are serious about arbitrary number support, it might be best to go with a custom update processor and query preprocessor that map text numbers to simple numeric form.
How about cases like 2,300 or 2,300.00 (embedded commas or even decimal point) - two thousand three hundred or 23 hundred or twenty three hundred?
Or 200 million vs 200,000,000 vs. 200000000?
In any case, synonyms get really messy really quickly, but with preprocessors you can do whatever you want
-- Jack Krupansky
-----Original Message----- From: Alireza Salimi Sent: Monday, June 11, 2012 2:41 PM To: [EMAIL PROTECTED] Subject: search for alphabetic version of numbers
Hi everybody,
I have the requirement to support searching for numbers with their alphabetic or by digits. For example, if we have a document with a field's value of '200', if we search for "two hundred", that document should match.
I haven't found anything like this yet. Do we have other option than define the most common numbers and their string versions as synonyms?
Thanks -- Alireza Salimi Java EE Developer
-
Re: search for alphabetic version of numbers
Chris Hostetter 2012-06-19, 15:50
: I have the requirement to support searching for numbers with their : alphabetic or by digits. : For example, if we have a document with a field's value of '200', : if we search for "two hundred", that document should match. : : I haven't found anything like this yet. Do we have other option than : define the most common numbers and their string versions as : synonyms?
the lucene test-framework actualy contains a class named org.apache.lucene.util.English which can convert int->String in english text.
You could try wrapping that up in a TokenFilter?
-Hoss
|
|
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by
Sematext