|
|
-
HTML tags and Lucene highlighting
okayndc 2012-04-05, 17:34
Hello,
I currently use Lucene version 3.0...probably need to upgrade to a more current version soon. The problem that I have is when I test search for a an HTML tag (ex. <strong>), Lucene returns the highlighted HTML tag ~ which is what I DO NOT want. Is there a way to "filter" HTML tags? I have read up on HTMLStripChar filter (packaged with Solr) and wondered if this is the way to go?
Any help will be greatly appreciated, Thanks
-
RE: HTML tags and Lucene highlighting
Steven A Rowe 2012-04-05, 19:24
Hi okayndc,
What *do* you want?
Steve
-----Original Message----- From: okayndc [mailto:[EMAIL PROTECTED]] Sent: Thursday, April 05, 2012 1:34 PM To: [EMAIL PROTECTED] Subject: HTML tags and Lucene highlighting
Hello,
I currently use Lucene version 3.0...probably need to upgrade to a more current version soon. The problem that I have is when I test search for a an HTML tag (ex. <strong>), Lucene returns the highlighted HTML tag ~ which is what I DO NOT want. Is there a way to "filter" HTML tags? I have read up on HTMLStripChar filter (packaged with Solr) and wondered if this is the way to go?
Any help will be greatly appreciated, Thanks
---------------------------------------------------------------------
-
Re: HTML tags and Lucene highlighting
okayndc 2012-04-05, 19:36
Hello,
I want to ignore HTML tags within a search. ~ I should not be able to search for a HTML tag (ex. <strong>) and get back the highlighted HTML tag (ex. <span class="highlighted"><strong></span>) in a result set.
Thanks On Thu, Apr 5, 2012 at 3:24 PM, Steven A Rowe <[EMAIL PROTECTED]> wrote:
> Hi okayndc, > > What *do* you want? > > Steve > > -----Original Message----- > From: okayndc [mailto:[EMAIL PROTECTED]] > Sent: Thursday, April 05, 2012 1:34 PM > To: [EMAIL PROTECTED] > Subject: HTML tags and Lucene highlighting > > Hello, > > I currently use Lucene version 3.0...probably need to upgrade to a more > current version soon. > The problem that I have is when I test search for a an HTML tag (ex. > <strong>), Lucene returns > the highlighted HTML tag ~ which is what I DO NOT want. Is there a way to > "filter" HTML tags? > I have read up on HTMLStripChar filter (packaged with Solr) and wondered > if this is the way to go? > > Any help will be greatly appreciated, > Thanks > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >
-
RE: HTML tags and Lucene highlighting
Steven A Rowe 2012-04-05, 19:44
okayndc,
A field configured to use HTMLStripCharFilter as part of its index-time analyzer will strip out HTML tags before index terms are created by the tokenizer, so HTML tags will not be put into the index. As a result, queries for HTML tags cannot match the original documents' HTML tags (in the field configured to use HTMLStripCharFilter, anyway).
So HTMLStripCharFilter should do what you want.
Steve
From: okayndc [mailto:[EMAIL PROTECTED]] Sent: Thursday, April 05, 2012 3:36 PM To: Steven A Rowe Cc: [EMAIL PROTECTED] Subject: Re: HTML tags and Lucene highlighting
Hello,
I want to ignore HTML tags within a search. ~ I should not be able to search for a HTML tag (ex. <strong>) and get back the highlighted HTML tag (ex. <span class="highlighted"><strong></span>) in a result set.
Thanks
On Thu, Apr 5, 2012 at 3:24 PM, Steven A Rowe <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Hi okayndc,
What *do* you want?
Steve
-----Original Message----- From: okayndc [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>] Sent: Thursday, April 05, 2012 1:34 PM To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: HTML tags and Lucene highlighting
Hello,
I currently use Lucene version 3.0...probably need to upgrade to a more current version soon. The problem that I have is when I test search for a an HTML tag (ex. <strong>), Lucene returns the highlighted HTML tag ~ which is what I DO NOT want. Is there a way to "filter" HTML tags? I have read up on HTMLStripChar filter (packaged with Solr) and wondered if this is the way to go?
Any help will be greatly appreciated, Thanks ---------------------------------------------------------------------
-
Re: HTML tags and Lucene highlighting
okayndc 2012-04-05, 20:34
I want to retain the formatted HTML in a result but, want to ignore (or filter out) HTML tags in a search, if this makes sense?
On Thu, Apr 5, 2012 at 3:44 PM, Steven A Rowe <[EMAIL PROTECTED]> wrote:
> okayndc, > > A field configured to use HTMLStripCharFilter as part of its index-time > analyzer will strip out HTML tags before index terms are created by the > tokenizer, so HTML tags will not be put into the index. As a result, > queries for HTML tags cannot match the original documents' HTML tags (in > the field configured to use HTMLStripCharFilter, anyway). > > So HTMLStripCharFilter should do what you want. > > Steve > > From: okayndc [mailto:[EMAIL PROTECTED]] > Sent: Thursday, April 05, 2012 3:36 PM > To: Steven A Rowe > Cc: [EMAIL PROTECTED] > Subject: Re: HTML tags and Lucene highlighting > > Hello, > > I want to ignore HTML tags within a search. ~ I should not be able to > search for a HTML tag (ex. <strong>) and get back the highlighted HTML tag > (ex. <span class="highlighted"><strong></span>) in a result set. > > Thanks > > On Thu, Apr 5, 2012 at 3:24 PM, Steven A Rowe <[EMAIL PROTECTED]<mailto: > [EMAIL PROTECTED]>> wrote: > Hi okayndc, > > What *do* you want? > > Steve > > -----Original Message----- > From: okayndc [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>] > Sent: Thursday, April 05, 2012 1:34 PM > To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> > Subject: HTML tags and Lucene highlighting > > Hello, > > I currently use Lucene version 3.0...probably need to upgrade to a more > current version soon. > The problem that I have is when I test search for a an HTML tag (ex. > <strong>), Lucene returns > the highlighted HTML tag ~ which is what I DO NOT want. Is there a way to > "filter" HTML tags? > I have read up on HTMLStripChar filter (packaged with Solr) and wondered > if this is the way to go? > > Any help will be greatly appreciated, > Thanks > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED]<mailto: > [EMAIL PROTECTED]> > For additional commands, e-mail: [EMAIL PROTECTED]<mailto: > [EMAIL PROTECTED]> > >
-
Re: HTML tags and Lucene highlighting
Koji Sekiguchi 2012-04-05, 21:52
(12/04/06 2:34), okayndc wrote: > Hello, > > I currently use Lucene version 3.0...probably need to upgrade to a more > current version soon. > The problem that I have is when I test search for a an HTML tag (ex. > <strong>), Lucene returns > the highlighted HTML tag ~ which is what I DO NOT want. Is there a way to > "filter" HTML tags? > I have read up on HTMLStripChar filter (packaged with Solr) and wondered if > this is the way to go? > > Any help will be greatly appreciated, > Thanks > There is a way to encode HTML tags: https://builds.apache.org/job/Lucene-3.x/javadoc/contrib-highlighter/org/apache/lucene/search/highlight/SimpleHTMLEncoder.htmlkoji -- Query Log Visualizer for Apache Solr http://soleami.com/---------------------------------------------------------------------
|
|