Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Plain View
Solr, mail # user - Providing token variants at index time


Copy link to this message
-
Providing token variants at index time
Paul Dlug 2010-07-22, 18:27
Is there a tokenizer that supports providing variants of the tokens at
index time? I'm looking for something that could take a syntax like:

International|I Business|B Machines|M

Which would take each pipe delimited token and preserve its position
so that phrase queries work properly. The above would result in
queries for "International Business Machines" as well as "I B M" or
any variants. The point is that the variants would be generated
externally as part of the indexing process so they may not be as
simple as the above.

Any ideas or do I have to write a custom tokenizer to do this?
Thanks,
Paul
+
Jonathan Rochkind 2010-07-22, 20:01
+
Paul Dlug 2010-07-22, 20:22
+
Jonathan Rochkind 2010-07-22, 21:08