-Re: which German stemmer to use?
Paul Libbrecht 2011-03-24, 07:38
In our ActiveMath project, we have had positive feedback in Lucene with the
which is probably one of the two below.
I note that you may want to be careful to use one field with exact matching (e.g. whitespace analyzer and lowercase filter) an done field with stemmed matches. That's two fields in the index and a query-expansion mechanism such as dismax to
(add the phonetic...)
One of the biggest issues that our testers formulated is that compound words should be split. I believe this issue is also very present in technology texts. Thus far only the compound-words analyzer can do such a split and you need the compounds to be manually input. Maybe that's doable?
Le 24 mars 2011 à 00:14, Christopher Bottaro a écrit :
> The wiki lists 5 available, but doesn't do a good job at explaining or
> recommending one:
> SnowballPorterFilterFactory (German)
> SnowballPorterFilterFactory (German2)
> Which is the best one to use in general? Which is the best to use when the
> content being indexed is German technology articles?
> Thanks for the help.