Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Lucene, mail # dev - Analyzer thread safety; Stop words


Copy link to this message
-
Re: Analyzer thread safety; Stop words
Yonik Seeley 2006-11-24, 15:27
On 11/24/06, Antony Bowesman <[EMAIL PROTECTED]> wrote:
> Two points about Analyzers:
>
> Does anyone have any experience with thread safety of Analyzer implementations.
>   Apart from PerFieldAnalyzerWrapper, the analyzers seem to be thread safe, but
> is there a requirement that analyzers should be thread safe?

Yes, and they normally are thread safe as they create new Tokenizers
and TokenFilters for each field value analyzed.

> Secondly, has anyone thought that it would be a good idea to extend the Analyzer
> interface (Abstract class) to allow a standard way to set stop words?  There
> seem to be two 'families' of stop word configuration via constructors.

That belongs at the TokenFilter level (where it currently is).

> The Set, File and String[] in Analyzers, such as StandardAnalyzer, StopAnalyzer
> where the and then the Russian/Greek variants that do not have the same
> Constructor signature to configure stopwords.
>
> It makes it messy to make analyzers pluggable in a generic way so that stopwords
> can be configurable for any plugged analyzer.

Things currently are pluggable: one makes new Analyzers by plugging
together a Tokenizer followed by several TokeFilters.

If you are talking about some sort of external configuration, take a
look at Solr.

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------