Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Solr, mail # user - Spell checking: Is there a way to exclude words known to be wrong?


Copy link to this message
-
Re: Spell checking: Is there a way to exclude words known to be wrong?
Erik Hatcher 2009-07-14, 13:07
Use the stopwords feature with a custom mispeled_words.txt and a  
StopFilterFactory on the spell check field ;)

Erik
On Jul 13, 2009, at 8:27 PM, Jay Hill wrote:

> We're building a spell index from a field in our main index with the
> following configuration:
> <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
> <str name="queryAnalyzerFieldType">textSpell</str>
> <lst name="spellchecker">
>   <str name="name">default</str>
>   <str name="field">spell</str>
>   <str name="spellcheckIndexDir">./spellchecker</str>
>   <str name="buildOnCommit">true</str>
> </lst>
> </searchComponent>
>
> This works great and re-builds the spelling index on commits as  
> expected.
> However, we know there are misspellings in the "spell" field of our  
> main
> index. We could remove these from the spelling index using Luke,  
> however
> they will be added again on commits. What we need is something  
> similar to
> how the protwords.txt file is used. So that when we notice  
> misspelled words
> such as "beginnning" being pulled from our main index we could add  
> them to
> an exclusion file so they are not added to the spelling index again.
>
> Any tricks to make this possible?
>
> -Jay