Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Lucene, mail # user - Possible Unhandled Exception in org.apache.lucene.benchmark.byTask.feeds.DemoHTMLParser


Copy link to this message
-
Possible Unhandled Exception in org.apache.lucene.benchmark.byTask.feeds.DemoHTMLParser
jake dsouza 2012-04-16, 17:00
Hi All ,

I am trying to index the Trec GOV2 data set and I am getting a few
Exceptions from this class . Please see the Stack Trace Below

java.lang.NullPointerException
Apr 16, 2012 5:32:55 AM
at
org.apache.lucene.benchmark.byTask.feeds.DemoHTMLParser.parse(DemoHTMLParser.java:55)
Apr 16, 2012 5:32:55 AM
at
org.apache.lucene.benchmark.byTask.feeds.TrecGov2Parser.parse(TrecGov2Parser.java:56)
Apr 16, 2012 5:32:55 AM
at
org.apache.lucene.benchmark.byTask.feeds.TrecParserByPath.parse(TrecParserByPath.java:30)
Apr 16, 2012 5:32:55 AM
at
org.apache.lucene.benchmark.byTask.feeds.TrecContentSource.getNextDocData(TrecContentSource.java:292)
Apr 16, 2012 5:32:55 AM
at com.Gov2Reader.indexDocs(Gov2Reader.java:117)

>From what I noticed , in line 56 of DemoHTMLParser we have   date dateFormat.parse(props.getProperty("date").trim()); but in this case ,
dateFormat = null , due to which the exception was thrown . The parse
method in TrecGov2Parser passes null to the DemoHTMLParser.parse method .

Due to this exception , some documents are missed from being indexed .

Regards
Jake