Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Solr, mail # user - can solr automatically search for different punctuation of a word


Copy link to this message
-
Re: can solr automatically search for different punctuation of a word
alxsss@... 2012-01-31, 03:38

 Hi Chantal,

In the readme file at  solr/contrib/analysis-extras/README.txt it says to add the ICU library (in lib/)

Do I need also add <dependecy>... and where?

Thanks.
Alex.

 

 

-----Original Message-----
From: Chantal Ackermann <[EMAIL PROTECTED]>
To: solr-user <[EMAIL PROTECTED]>
Sent: Fri, Jan 13, 2012 1:52 am
Subject: Re: can solr automatically search for different punctuation of a word
Hi Alex,

for me, ICUFoldingFilterFactory works very good. It does lowercasing and

removes diacritica (this is how umlauts and accenting of letters is

called - punctuation means comma, points etc.). It will work for any any

language, not only German. And it will also handle apostrophs as in

"C'est bien".

ICU requires additional libraries in the classpath. For an in-built solr

solution have a look at ASCIIFoldingFilterFactory.

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ASCIIFoldingFilterFactory

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUFoldingFilterFactory

Example configuration:

<fieldType name="text_sort" class="solr.TextField"

positionIncrementGap="100">

<analyzer>

<tokenizer class="solr.KeywordTokenizerFactory" />

<filter class="solr.ICUFoldingFilterFactory" />

</analyzer>

</fieldType>

And dependencies (example for Maven) in addition to solr-core:

<dependency>

<groupId>org.apache.lucene</groupId>

<artifactId>lucene-icu</artifactId>

<version>${solr.version}</version>

<scope>runtime</scope>

</dependency>

<dependency>

<groupId>org.apache.solr</groupId>

<artifactId>solr-analysis-extras</artifactId>

<version>${solr.version}</version>

<scope>runtime</scope>

</dependency>

Cheers,

Chantal

On Fri, 2012-01-13 at 00:09 +0100, [EMAIL PROTECTED] wrote:

> Hello,

>

> I would like to know if solr has a functionality to automatically search for a

different punctuation of a word.

> For example if I if a user searches for a word Uber, and stemmer is german

lang, then solr looks for both Uber and  Über,  like in synonyms.

>

> Is it possible to give a file with a list of possible substitutions of letters

to solr and have it search for all possible punctuations?

>

>

> Thanks.

> Alex.