|
|
-
org.apache.lucene.search.highlight.Highlighter
Karthik N S 2004-05-19, 08:17
Hey Guys Found some Highlighter Package on CVS Directory Was Investigating,found some Compile time error.. Please some body tell me what this The Code:- private IndexReader reader=null; private Highlighter highlighter = null; public SearchFiles() { } public void searchIndex0(String srchkey,String pathfile)throws Exception { IndexSearcher searcher = new IndexSearcher(pathfile); Query query = QueryParser.parse(srchkey,"bookid", analyzer); query=query.rewrite(reader); //required to expand search terms Hits hits = searcher.search(query); highlighter = new Highlighter(this,new QueryScorer(query)); for (int i = 0; i < hits.length(); i++) { String text = hits.doc(i).get(bookid); TokenStream tokenStream=analyzer.tokenStream(bookid,new StringReader(text)); // Get 3 best fragments and seperate with a "..." String result = highlighter.getBestFragments(tokenStream,text,3,"..."); System.out.println(result); } } The Error:- src\org\apache\lucene\search\higlight\SearchFiles.java:46: cannot resolve symbol symbol : constructor Highlighter (com.controlnet.higlight.SearchFiles,com.controlnet.higlight.QueryScorer) location: class org.apache.lucene.search.highlight.Highlighter highlighter =new Highlighter(this,new QueryScorer(query)); Also Reffrells to URL from archives lucene-dev is not avaliable for proper documentation http://home.clara.net/markharwood/lucene/highlight.htm WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK]
-
Re: org.apache.lucene.search.highlight.Highlighter
markharw00d@... 2004-05-19, 20:37
>>Was Investigating,found some Compile time error.. I see the code you have is taken from the example in the javadocs. Unfortunately that example wasn't complete because the class didnt include the method defined in the Formatter interface. I have updated the Javadocs to correct this oversight.
To correct your problem either make your class implement the Formatter interface to perform your choice of custom formatting or remove the "this" parameter from your call to create a new Highlighter with the default Formatter implementation.
Thanks for "highlighting" the problem with the Javadocs...
Cheers Mark ---------------------------------------------------------------------
-
RE: org.apache.lucene.search.highlight.Highlighter
Bruce Ritchie 2004-05-19, 21:18
> Thanks for "highlighting" the problem with the Javadocs...
Groan. :) Regards,
Bruce Ritchie
-
RE: org.apache.lucene.search.highlight.Highlighter
Karthik N S 2004-05-21, 09:10
Hi
Please can some body give me a simple Example of org.apache.lucene.search.highlight.Highlighter
I am trying to use it but unsucessfull Karthik -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Sent: Thursday, May 20, 2004 2:08 AM To: [EMAIL PROTECTED] Subject: Re: org.apache.lucene.search.highlight.Highlighter >>Was Investigating,found some Compile time error..
I see the code you have is taken from the example in the javadocs. Unfortunately that example wasn't complete because the class didnt include the method defined in the Formatter interface. I have updated the Javadocs to correct this oversight.
To correct your problem either make your class implement the Formatter interface to perform your choice of custom formatting or remove the "this" parameter from your call to create a new Highlighter with the default Formatter implementation.
Thanks for "highlighting" the problem with the Javadocs...
Cheers Mark --------------------------------------------------------------------- ---------------------------------------------------------------------
-
org.apache.lucene.search.highlight.Highlighter
Karthik N S 2004-05-21, 11:29
Hi
Please can some body give me a simple Example of
org.apache.lucene.search.highlight.Highlighter
I am trying to use it but unsucessfull
Karthik
WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK]
-
Re: org.apache.lucene.search.highlight.Highlighter
Claude Devarenne 2004-05-21, 16:22
Hi,
Here is the documentation Mark Harwood included in the original package. I followed his directorions and it worked for me. Let me know if this doesn't do it for you.
Claude
-
Re: org.apache.lucene.search.highlight.Highlighter
Claude Devarenne 2004-05-21, 16:26
Arrgh the attachment didn't make it here it goes, sorry: //perform a standard lucene query searcher = new IndexSearcher(ramDir); Analyzer analyzer=new StandardAnalyzer(); Query query = QueryParser.parse("Kenne*", FIELD_NAME, analyzer); query=query.rewrite(reader); //necessary to expand search terms Hits hits = searcher.search(query);
//create an instance of the highlighter with the tags used to surround highlighted text QueryHighlightExtractor highlighter new QueryHighlightExtractor(query, new StandardAnalyzer(), "<b>", "</b>"); for (int i = 0; i < hits.length(); i++) { String text = hits.doc(i).get(FIELD_NAME); //call to highlight text with chosen tags String highlightedText = highlighter.highlightText(text); System.out.println(highlightedText); } If your documents are large you can select only the best fragments from each document like this: //...as above example int highlightFragmentSizeInBytes = 80; int maxNumFragmentsRequired = 4; String fragmentSeparator="..."; for (int i = 0; i < hits.length(); i++) { String text = hits.doc(i).get(FIELD_NAME); String highlightedText = highlighter.getBestFragments(text, highlightFragmentSizeInBytes,maxNumFragmentsRequired,fragmentSeparator); System.out.println(highlightedText); }
On May 21, 2004, at 9:22 AM, Claude Devarenne wrote:
> Hi, > > Here is the documentation Mark Harwood included in the original > package. I followed his directorions and it worked for me. Let me > know if this doesn't do it for you. > > Claude > > > > On May 21, 2004, at 4:29 AM, Karthik N S wrote: > >> >> >> >> Hi >> >> Please can some body give me a simple Example of >> >> org.apache.lucene.search.highlight.Highlighter >> >> I am trying to use it but unsucessfull >> >> >> >> Karthik >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> <image.tiff> >> WITH WARM REGARDS >> HAVE A NICE DAY >> [ N.S.KARTHIK] >> > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED]
-
Re: org.apache.lucene.search.highlight.Highlighter
markharw00d@... 2004-05-21, 19:03
Hi Claude, that example code you provided is out of date. For all concerned - the highlighter code was refactored about a month ago and then moved into the Sandbox. Want the latest version? - get the latest code from the sandbox CVS. Want the latest docs? - Run javadoc on the above. There is a basic example of highlighter use in the package-level javadocs and more extensive examples in the JUnit test that accompanies the source code. Hope this helps clarify things. Mark ps Bruce, I know you were interested in providing an alternative Fragmenter implementation for the highlighter that detects sentence boundaries. You may want to look at LingPipe which has "a heuristic sentence boundary detector". ( http://threattracker.com:8080/lingpipe-demo/demo.html ) I took a quick look at it but it has its own tokenizer that would be difficult to make work with the tokenstream used to identify query terms. At least the code gives some examples of the heuristics involved in detecting sentence boundaries. For my own apps I find the standard Fragmenter implementation suffices. ---------------------------------------------------------------------
-
org.apache.lucene.search.highlight.Highlighter
Karthik N S 2004-05-24, 05:24
Hi Lucene Developers Using org.apache.lucene.search.highlight.Highlighter SRC for Search The Package.html displays something like this String text = hits.doc(i).get(FIELD_NAME); TokenStream tokenStream=analyzer.tokenStream(FIELD_NAME,new StringReader(text)); On using this SRC My Code Raises an "NullPointerException " [ The text on hits.doc(i) is returning this exception ] I have a piece of code "(refrence from Orielly.com) CustomAnalyser " an being using it other then org.apache.lucene.analysis.standard.StandardAnalyzer() , 1) In the first case [CustomAnalyzer() ] the text returns me NULL , the Hits return me 707. 2) In second case [ StandardAnalyzer() ] No hits are encountered , the Hits return's me 0. 3) But on using a normal SearchFiles from demo ( org.apache.lucene.demo) revels all the correct 707 hits probables. Please somebody look into this........ Karthik -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Sent: Saturday, May 22, 2004 12:34 AM To: [EMAIL PROTECTED] Subject: Re: org.apache.lucene.search.highlight.Highlighter Hi Claude, that example code you provided is out of date. For all concerned - the highlighter code was refactored about a month ago and then moved into the Sandbox. Want the latest version? - get the latest code from the sandbox CVS. Want the latest docs? - Run javadoc on the above. There is a basic example of highlighter use in the package-level javadocs and more extensive examples in the JUnit test that accompanies the source code. Hope this helps clarify things. Mark ps Bruce, I know you were interested in providing an alternative Fragmenter implementation for the highlighter that detects sentence boundaries. You may want to look at LingPipe which has "a heuristic sentence boundary detector". ( http://threattracker.com:8080/lingpipe-demo/demo.html ) I took a quick look at it but it has its own tokenizer that would be difficult to make work with the tokenstream used to identify query terms. At least the code gives some examples of the heuristics involved in detecting sentence boundaries. For my own apps I find the standard Fragmenter implementation suffices. --------------------------------------------------------------------- ---------------------------------------------------------------------
-
RE: org.apache.lucene.search.highlight.Highlighter
Karthik N S 2004-05-24, 09:11
Hey Lucene-Developers
I was broswing thru CVS and found the SRC for "IndexWriter2.java written by Ivaylo Zlatev on feb 2002,
My concern is, Does this piece of code really work ,
if so state an example [ present Lucene-final 1.3 version ] or Is it discarded from the [ present Lucene-final 1.3 version ] The Tecnique of using RAMDirectory, my Query has really become faster access , So hence plan to use it during Indexing process also.
karthik
---------------------------------------------------------------------
-
RE: org.apache.lucene.search.highlight.Highlighter
Otis Gospodnetic 2004-05-24, 16:33
That version of IndexWriter was never included in Lucene. Use various IndexWriter parameters (instance variables) to tune indexing. One of my articles desribes how to use them, if Javadocs are too terse.
Otis
--- Karthik N S <[EMAIL PROTECTED]> wrote: > Hey > Lucene-Developers > > I was broswing thru CVS and found the SRC for "IndexWriter2.java > written > by Ivaylo Zlatev on feb 2002, > > My concern is, Does this piece of code really work , > > if so state an example [ present Lucene-final 1.3 version ] > or > Is it discarded from the [ present Lucene-final 1.3 version ] > > > The Tecnique of using RAMDirectory, my Query has really become faster > access > , > So hence plan to use it during Indexing process also. > > > > karthik > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > ---------------------------------------------------------------------
-
Re: org.apache.lucene.search.highlight.Highlighter
Erik Hatcher 2004-05-24, 17:10
On May 24, 2004, at 5:11 AM, Karthik N S wrote: > I was broswing thru CVS and found the SRC for "IndexWriter2.java > written > by Ivaylo Zlatev on feb 2002,
Where do you see this? It is not in the current CVS that I can tell.
> The Tecnique of using RAMDirectory, my Query has really become faster > access > , > So hence plan to use it during Indexing process also.
I'm confused by what you're after. You can index into a RAMDirectory, no problem, and then persist it to a FSDirectory when you are done with the current codebase.
Erik ---------------------------------------------------------------------
-
RE: org.apache.lucene.search.highlight.Highlighter
Karthik N S 2004-05-25, 03:52
Hi Sorry Apologies Please The SRC [ IndexWriter2.java ] I had mentioned, is in the mail archive and not on the CVS " http://www.mail-archive.com/[EMAIL PROTECTED]/msg00735.html" with regards Karthik -----Original Message----- From: Erik Hatcher [mailto:[EMAIL PROTECTED]] Sent: Monday, May 24, 2004 10:40 PM To: Lucene Users List Subject: Re: org.apache.lucene.search.highlight.Highlighter On May 24, 2004, at 5:11 AM, Karthik N S wrote: > I was broswing thru CVS and found the SRC for "IndexWriter2.java > written > by Ivaylo Zlatev on feb 2002, Where do you see this? It is not in the current CVS that I can tell. > The Tecnique of using RAMDirectory, my Query has really become faster > access > , > So hence plan to use it during Indexing process also. I'm confused by what you're after. You can index into a RAMDirectory, no problem, and then persist it to a FSDirectory when you are done with the current codebase. Erik --------------------------------------------------------------------- ---------------------------------------------------------------------
-
FW: org.apache.lucene.search.highlight.Highlighter
Karthik N S 2004-05-25, 03:58
Hi Lucene Developers Using org.apache.lucene.search.highlight.Highlighter SRC for Search The Package.html displays something like this String text = hits.doc(i).get(FIELD_NAME); TokenStream tokenStream=analyzer.tokenStream(FIELD_NAME,new StringReader(text)); On using this SRC My Code raises an "NullPointerException " [ The text on hits.doc(i) is returning this exception ] Why am I getting null for the text , Is it linked to type of field type during indexing process or ..... or is it due to ... a piece of code "(refrence from Orielly.com) CustomAnalyser " an being using it other then org.apache.lucene.analysis.standard.StandardAnalyzer() , 1) In the first case [using CustomAnalyzer() ] the text returns me NULL , the Hits return me 707. 2) In second case [ using StandardAnalyzer() ] No hits are encountered , the Hits return's me 0. 3) But on using a normal SearchFiles with StandardAnalyzer() from demo ( org.apache.lucene.demo) revels all the correct 707 hits probables. Please somebody look into this........ Karthik -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Sent: Saturday, May 22, 2004 12:34 AM To: [EMAIL PROTECTED] Subject: Re: org.apache.lucene.search.highlight.Highlighter Hi Claude, that example code you provided is out of date. For all concerned - the highlighter code was refactored about a month ago and then moved into the Sandbox. Want the latest version? - get the latest code from the sandbox CVS. Want the latest docs? - Run javadoc on the above. There is a basic example of highlighter use in the package-level javadocs and more extensive examples in the JUnit test that accompanies the source code. Hope this helps clarify things. Mark ps Bruce, I know you were interested in providing an alternative Fragmenter implementation for the highlighter that detects sentence boundaries. You may want to look at LingPipe which has "a heuristic sentence boundary detector". ( http://threattracker.com:8080/lingpipe-demo/demo.html ) I took a quick look at it but it has its own tokenizer that would be difficult to make work with the tokenstream used to identify query terms. At least the code gives some examples of the heuristics involved in detecting sentence boundaries. For my own apps I find the standard Fragmenter implementation suffices. --------------------------------------------------------------------- ---------------------------------------------------------------------
-
RE: org.apache.lucene.search.highlight.Highlighter
Karthik N S 2004-05-25, 08:52
Hey Lucene-Developers Finally found the problem with Highlighter SRC
The Search SRC using search.highlight.Highlighter depends on storage of the HTML Content (FIELD_NAME) while Indexing,
If the Content is Stored as
FileInputStream is = new FileInputStream(File); reader = new BufferedReader(new InputStreamReader(is)); doc.add(Field.Text("contents", reader));
then the search.highlight.Highlighter raises a null Pointer Exception on the FIELD_NAME "Content" java.lang.NullPointerException at search.highlight.Highlighter.getBestDocFragments(Highlighter.java:141) at search.highlight.Highlighter.getBestFragments(Highlighter.java:80) at search.highlight.Highlighter.getBestFragments(Highlighter.java:328) at org.apache.lucene.demo.Search.searchIndex1(Search.java:84) atorg.apache.lucene.demo.Search.main(Search.java:107) But if u use
Field ff = new Field("contents", proceStr, true, true, true);
(Where proceStr = Contents of HTML)
Then in such case search.highlight.Highlighter returns a correct Search + Highlighter (bold) implementation of the Indexed segment.
Now Please some body who is mature more enough to improve this code please do. Peace at last ............. :) Karthik -----Original Message----- From: Erik Hatcher [mailto:[EMAIL PROTECTED]] Sent: Monday, May 24, 2004 10:40 PM To: Lucene Users List Subject: Re: org.apache.lucene.search.highlight.Highlighter On May 24, 2004, at 5:11 AM, Karthik N S wrote: > I was broswing thru CVS and found the SRC for "IndexWriter2.java > written > by Ivaylo Zlatev on feb 2002,
Where do you see this? It is not in the current CVS that I can tell.
> The Tecnique of using RAMDirectory, my Query has really become faster > access > , > So hence plan to use it during Indexing process also.
I'm confused by what you're after. You can index into a RAMDirectory, no problem, and then persist it to a FSDirectory when you are done with the current codebase.
Erik --------------------------------------------------------------------- ---------------------------------------------------------------------
-
RE: org.apache.lucene.search.highlight.Highlighter
markharw00d@... 2004-05-25, 18:39
>>If the Content is Stored as... >>doc.add(Field.Text("contents", reader));
Thats just it. It's not stored : see the javadocs for Field.text(string,reader): "Constructs a Reader-valued Field that is tokenized and indexed, but is not stored in the index"
As opposed to : Field.Text(String name, String value) which says: "Constructs a String-valued Field that is tokenized and indexed, and is stored in the index, for return with hits."
So, you're getting nulls because you're not storing the field for subsequent retrieval.
>>Now Please some body who is >>mature more enough to improve this code please do.
Are you deliberately trying to be obnoxious or is it just a natural gift? You'll find people here more helpful if you dont resort to insulting them. :-) ---------------------------------------------------------------------
-
RE: org.apache.lucene.search.highlight.Highlighter
Karthik N S 2004-06-15, 04:48
Hey Guys
Sombody please tell me why the "[pad]" is been displayed between words on searching the indexed file Content : Digital Cameras[pad][Digital Cameras][pad][EZ Dual Cam USB Digital Video/Still Camera][pad... Cam USB Digital How do I correct this problem
[Note: - I am using the Highlighter package from the Sandbox for highlighting the required word from an html ] with regards Karthik
-----Original Message----- From: Karthik N S [mailto:[EMAIL PROTECTED]] Sent: Monday, May 24, 2004 2:41 PM To: Lucene Users List Subject: RE: org.apache.lucene.search.highlight.Highlighter Hey Lucene-Developers
I was broswing thru CVS and found the SRC for "IndexWriter2.java written by Ivaylo Zlatev on feb 2002,
My concern is, Does this piece of code really work ,
if so state an example [ present Lucene-final 1.3 version ] or Is it discarded from the [ present Lucene-final 1.3 version ] The Tecnique of using RAMDirectory, my Query has really become faster access , So hence plan to use it during Indexing process also.
karthik
--------------------------------------------------------------------- ---------------------------------------------------------------------
-
Re: org.apache.lucene.search.highlight.Highlighter
Erik Hatcher 2004-06-15, 08:38
On Jun 15, 2004, at 12:48 AM, Karthik N S wrote: > Sombody please tell me why the "[pad]" is been displayed between > words on > searching the indexed file > > > Content : Digital Cameras[pad][Digital Cameras][pad][EZ Dual Cam USB > Digital Video/Still Camera][pad... Cam USB Digital > > > How do I correct this problem > > [Note: - I am using the Highlighter package from the Sandbox for > highlighting the required word from an html ]
Please show us the relevant (succinct) piece of code that created the text output above.
Erik
---------------------------------------------------------------------
|
|