Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Solr, mail # user - Ngram autocompleter and term frequency boosting


Copy link to this message
-
Re: Ngram autocompleter and term frequency boosting
Otis Gospodnetic 2012-01-20, 04:45
Cuong,

If when you are indexing your AC suggestions you know "Java Developer" appears twice in the index, why not give it appropriate index-time boost?  Wouldn't that work for you?
Otis

----
Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html

----- Original Message -----
> From: Cuong Hoang <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Cc:
> Sent: Thursday, January 19, 2012 12:01 AM
> Subject: Ngram autocompleter and term frequency boosting
>
> Hi guys,
>
> I'm trying to build a Ngram-based autocompleter that takes term frequency
> into account.
>
> Let's say I have the following documents:
>
> D1: title => "Java Developer"
> D2: title => "Java Programmer"
> D3: title => "Java Developer"
>
> When the user types in "Java", I want to display
>
> 1. "Java Developer"
> 2. "Java Programmer"
>
> Basically "Java Developer" ranks first because it appears twice in the
> index while "Java Programmer" only appears once. Is it possible?
>
> I'm using the following config for "title" field:
>
>     <fieldType name="text_pre" class="solr.TextField"
> omitNorms="false">
>       <analyzer type="index">
>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.EdgeNGramFilterFactory"
> minGramSize="1"
> maxGramSize="25" side="front"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>     </fieldType>
>
> Thanks
>