Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Mahout, mail # user - Tags generation?


Copy link to this message
-
Re: Tags generation?
Dawid Weiss 2012-08-03, 19:05
> Unstemming is pretty simple.  Just build an unstemming dictionary based on
> seeing what word forms have lead to a stemmed form.  Include frequencies.

This can lead to very funny (or not, depends how you look at it)
mistakes when different lemmas stem to the same token. How frequent
and important this phenomenon is varies from language to language (and
can be calculated apriori).

Dawid