Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Solr, mail # user - Highlighter - multiple instances of term being combined


Copy link to this message
-
Re: Highlighter - multiple instances of term being combined
Lance Norskog 2010-11-10, 03:11
Have you looked at solr/admin/analysis.jsp? This is 'Analysis' link
off the main solr admin page. It will show you how text is broken up
for both the indexing and query processes. You might get some insight
about how these words are torn apart and assigned positions. Trying
the different Analyzers and options might get you there.

But to be frank- highlighting is a tough problem and has always had a
lot of edge cases.

On Tue, Nov 9, 2010 at 6:08 PM, Sasank Mudunuri <[EMAIL PROTECTED]> wrote:
> I'm finding that if a keyword appears in a field multiple times very close
> together, it will get highlighted as a phrase even though there are other
> terms between the two instances. So this search:
>
> http://localhost:8983/solr/select/?
>
> hl=true&
> hl.snippets=1&
> q=residue&
> hl.fragsize=0&
> mergeContiguous=false&
> indent=on&
> hl.usePhraseHighlighter=false&
> debugQuery=on&
> hl.fragmenter=gap&
> hl.highlightMultiTerm=false
>
> Highlights as:
> What does "low-residue" mean? Like low-residue diet?
>
> Trying to get it to highlight as:
> What does "low-residue" mean? Like low-residue diet?
> I've tried playing with various combinations of mergeContiguous,
> highlightMultiTerm, and usePhraseHighlighter, but they all yield the same
> output.
>
> For reference, field type uses a StandardTokenizerFactory and
> SynonymFilterFactory, StopFilterFactory, StandardFilterFactory and
> SnowballFilterFactory. I've confirmed that the intermediate words don't
> appear in either the synonym or the stop words list. I can post the full
> definition if helpful.
>
> Any pointers as to how to debug this would be greatly appreciated!
> sasank
>

--
Lance Norskog
[EMAIL PROTECTED]