|
|
-
Possible Unhandled Exception in org.apache.lucene.benchmark.byTask.feeds.DemoHTMLParserjake dsouza 2012-04-16, 17:00
Hi All ,
I am trying to index the Trec GOV2 data set and I am getting a few Exceptions from this class . Please see the Stack Trace Below java.lang.NullPointerException Apr 16, 2012 5:32:55 AM at org.apache.lucene.benchmark.byTask.feeds.DemoHTMLParser.parse(DemoHTMLParser.java:55) Apr 16, 2012 5:32:55 AM at org.apache.lucene.benchmark.byTask.feeds.TrecGov2Parser.parse(TrecGov2Parser.java:56) Apr 16, 2012 5:32:55 AM at org.apache.lucene.benchmark.byTask.feeds.TrecParserByPath.parse(TrecParserByPath.java:30) Apr 16, 2012 5:32:55 AM at org.apache.lucene.benchmark.byTask.feeds.TrecContentSource.getNextDocData(TrecContentSource.java:292) Apr 16, 2012 5:32:55 AM at com.Gov2Reader.indexDocs(Gov2Reader.java:117) >From what I noticed , in line 56 of DemoHTMLParser we have date dateFormat.parse(props.getProperty("date").trim()); but in this case , dateFormat = null , due to which the exception was thrown . The parse method in TrecGov2Parser passes null to the DemoHTMLParser.parse method . Due to this exception , some documents are missed from being indexed . Regards Jake |