|
|
-
Stemmer QuestionJamie Johnson 2012-03-08, 13:18
I was previously using the PorterStemmer to do stemming and ran into
an issue where it was overly aggressive with some words or abbreviations which I needed to stop. I have recently switched to KStem and I believe the issue is less, but I was wondering still if there was a way to set a number of stop words for which you didn't want stemming to occur or if there was a way to tell the Stemmer to store the unstemmed version as well. So for instance if a query came in for "Ahmed", the PorterStemmer would turn that into Ahm, while in this case Ahmed is a name and I want to search that unstemmed. If there was a stop word list I could attempt to compile a list of words I didn't want stem or if there was a way to say also say create a token for the unstemmed word so what went into the index for Ahmed would be "ahmed" "ahm" so we'd cover both cases. What are the draw backs of providing both. +
Ahmet Arslan 2012-03-08, 13:36
+
Jamie Johnson 2012-03-08, 15:40
+
Ahmet Arslan 2012-03-08, 16:16
+
Jamie Johnson 2012-03-09, 03:58
+
Ahmet Arslan 2012-03-09, 14:53
+
Jamie Johnson 2012-03-09, 19:27
+
Jamie Johnson 2012-03-09, 19:53
+
Jamie Johnson 2012-03-09, 21:04
+
Jamie Johnson 2012-03-11, 02:36
|