|
|
-
Re: parse.ParserFactoryTolga 2012-05-29, 06:31
Hi,
I know this issue should have been closed, but I thought I'd continue this rather than starting a new thread. Anyway, I'm getting this: parse.ParserFactory - ParserFactory: Plugin: parse-pdf mapped to contentType application/pdf via parse-plugins.xml, but not enabled via plugin.includes in nutch-default.xml and I have tika in my nutch-default.xml: <value>protocol-http|urlfilter-regex|parse-(html|tika|js|swf|zip|xml)|index-(basic|anchor)|scoring-opic|urlnormalizer-(pass|regex|basic)</value>. What's the point of seeing this warning if I already have tika? This should be removed IMHO. Regards, On 5/23/12 12:27 AM, Lewis John Mcgibbney wrote: > Unless your using<= Nutch 1.2 you should not be using > msexcel|mspowerpoint|msword|oo|pdf| within your plugin.includes... all > of these document formats are (and have been for some time) > implemented as Apache Tika parsers. > > hth > > > > On Tue, May 22, 2012 at 9:20 PM, Tolga<[EMAIL PROTECTED]> wrote: >> Hi, >> >> I crawl / index PDF files just fine, but I get the following warning. >> >> parse.ParserFactory - ParserFactory: Plugin: parse-pdf mapped to contentType >> application/pdf via parse-plugins.xml, but not enabled via plugin.includes >> in nutch-default.xml. >> >> I've got the value >> protocol-http|urlfilter-regex|parse-(html|tika|js|msexcel|mspowerpoint|msword|oo|pdf|swf|zip)|index-(basic|anchor)|scoring-opic|urlnormalizer-(pass|regex|basic) >> for plugin.includes property in nutch-default.xml. What am I missing? >> >> Regards, > > |