Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Solr, mail # user - Stemmer Question


Copy link to this message
-
Stemmer Question
Jamie Johnson 2012-03-08, 13:18
I was previously using the PorterStemmer to do stemming and ran into
an issue where it was overly aggressive with some words or
abbreviations which I needed to stop.  I have recently switched to
KStem and I believe the issue is less, but I was wondering still if
there was a way to set a number of stop words for which you didn't
want stemming to occur or if there was a way to tell the Stemmer to
store the unstemmed version as well.  So for instance if a query came
in for "Ahmed", the PorterStemmer would turn that into Ahm, while in
this case Ahmed is a name and I want to search that unstemmed.  If
there was a stop word list I could attempt to compile a list of words
I didn't want stem or if there was a way to say also say create a
token for the unstemmed word so what went into the index for Ahmed
would be "ahmed" "ahm" so we'd cover both cases.  What are the draw
backs of providing both.