Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Solr, mail # user - Skip first word


Copy link to this message
-
Re: Skip first word
Finotti Simone 2012-07-27, 09:46
Brilliant!
Thank you very much :)

________________________________________
Inizio: Chantal Ackermann [[EMAIL PROTECTED]]
Inviato: venerdì 27 luglio 2012 11.20
Fine: [EMAIL PROTECTED]
Oggetto: Re: Skip first word

Hi Simone,

no I meant that you populate the two fields with the same input - best done via copyField directive.

The first field will contain ngrams of size 1 and 2. The other field will contain ngrams of size 3 and longer (you might want to set a decent maxsize there).

The query for the autocomplete list uses the first field when the input (typed in by the user) is one or two characters long. Your example was: "D", "G", or than "Do" or "Ga". The result would search only on the single token field that contains for the input "Dolce & Gabbana" only the ngrams "D" and "Do". So, only the input "D" or "Do" would result in a hit on "Dolce & Gabbana".
Once the user has typed in the third letter: "Dol" or "Gab", you query the second, more tokenized field which would contain for "Dolce & Gabbana" the ngrams "Dol" "Dolc" "Dolce" "Gab" "Gabb" "Gabba" etc.
Both inputs "Gab" and "Dol" would then return "Dolce & Gabbana".

1. First  field type:

<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="2" side="front"/>

2. Secong field type:

<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- maybe add WordDelimiter etc. -->
<filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="10" side="front"/>

3. field declarations:

<field name="short_prefix" type="short_ngram" … />
<field name="long_prefix" type="long_ngram" … />

<copyField source="short_prefix" dest="long_prefix" />
Chantal

Am 27.07.2012 um 11:05 schrieb Finotti Simone:

> Hi Chantal,
>
> if I understand correctly, this implies that I have to populate different fields according to their lenght. Since I'm not aware of any logical condition you can apply to copyField directive, it means that this logic has to be implementend by the process that populates the Solr core. Is this assumption correct?
>
> That's kind of bad, because I'd like to have this kind of "rules" in the Solr configuration. Of course, if that's the only way... :)
>
> Thank you
>
> ________________________________________
> Inizio: Chantal Ackermann [[EMAIL PROTECTED]]
> Inviato: giovedì 26 luglio 2012 18.32
> Fine: [EMAIL PROTECTED]
> Oggetto: Re: Skip first word
>
> Hi,
>
> use two fields:
> 1. KeywordTokenizer (= single token) with ngram minsize=1 and maxsize=2 for inputs of length < 3,
> 2. the other one tokenized as appropriate with minsize=3 and longer for all longer inputs
>
>
> Cheers,
> Chantal
>
>
> Am 26.07.2012 um 09:05 schrieb Finotti Simone:
>
>> Hi Ahmet,
>> business asked me to apply EdgeNGram with minGramSize=1 on the first term and with minGramSize=3 on the latter terms.
>>
>> We are developing a search suggestion mechanism, the idea is that if the user types "D", the engine should suggest "Dolce & Gabbana", but if we type "G", it should suggest other brands. Only if users type "Gab" it should suggest "Dolce & Gabbana".
>>
>> Thanks
>> S
>> ________________________________________
>> Inizio: Ahmet Arslan [[EMAIL PROTECTED]]
>> Inviato: mercoledì 25 luglio 2012 18.10
>> Fine: [EMAIL PROTECTED]
>> Oggetto: Re: Skip first word
>>
>>> is there a tokenizer and/or a combination of filter to
>>> remove the first term from a field?
>>>
>>> For example:
>>> The quick brown fox
>>>
>>> should be tokenized as:
>>> quick
>>> brown
>>> fox
>>
>> There is no such filter that i know of. Though, you can implement one with modifying source code of LengthFilterFactory or StopFilterFactory. They both remove tokens. Out of curiosity, what is the use case for this?
>>
>>
>>
>>
>
>
>
>
>