|
|
-
Re: Analyzer thread safety; Stop wordsYonik Seeley 2006-11-24, 15:27
On 11/24/06, Antony Bowesman <[EMAIL PROTECTED]> wrote:
> Two points about Analyzers: > > Does anyone have any experience with thread safety of Analyzer implementations. > Apart from PerFieldAnalyzerWrapper, the analyzers seem to be thread safe, but > is there a requirement that analyzers should be thread safe? Yes, and they normally are thread safe as they create new Tokenizers and TokenFilters for each field value analyzed. > Secondly, has anyone thought that it would be a good idea to extend the Analyzer > interface (Abstract class) to allow a standard way to set stop words? There > seem to be two 'families' of stop word configuration via constructors. That belongs at the TokenFilter level (where it currently is). > The Set, File and String[] in Analyzers, such as StandardAnalyzer, StopAnalyzer > where the and then the Russian/Greek variants that do not have the same > Constructor signature to configure stopwords. > > It makes it messy to make analyzers pluggable in a generic way so that stopwords > can be configurable for any plugged analyzer. Things currently are pluggable: one makes new Analyzers by plugging together a Tokenizer followed by several TokeFilters. If you are talking about some sort of external configuration, take a look at Solr. -Yonik http://incubator.apache.org/solr Solr, the open-source Lucene search server --------------------------------------------------------------------- |