-Re: Problem with overriding built-in parser
Stephan Mühlstrasser 2012-02-16, 16:07
thanks for your reply.
Am 16.02.12 16:51, schrieb Nick Burch:
> On Tue, 14 Feb 2012, Stephan Mühlstrasser wrote:
>> The problem is that using the proposed method does not work for me.
>> Any use of the configuration file apparently sends Tika into an
>> endless recursion, even without overriding a built-in parser in the
>> configuration file.
> Are you able to produce a unit test that shows the problem?
That's what I was trying to provide with the example in my previous message:
>> If I understand it correctly, the following configuration file should
>> have the same effect as the built-in configuration:
>>> $ cat tika-config.xml
>>> <parser class="org.apache.tika.parser.DefaultParser"/>
If you invoke the Tika CLI application with this configuration file, the
error happens. Just start it like this: "java
-Dtika.config=tika-config.xml -jar tika-app-1.0.jar --list-parsers" and
the error will happen.
> Ah, I'm not sure that's correct. I think you also need to give a
> mimetypes and a detector. Looking at lines 145 to 172 of TikaConfig, it
> seems that you either get the defaults with no config, or specify them
> all with your own config
Ok, I see now in the source what you mean. Then the example in TIKA-527
is not complete, as it does not have mimetypes and a detector.
In the meantime since yesterday I got my override working by packaging a
META-INF/services/org.apache.tika.parser.Parser into the JAR file
together with my parser. So I don't need the configuration file approach
anymore. But I think it still could be considered a bug if an
incorrect/insufficient configuration file sends Tika into an endless
recursion instead of producing a meaningful error message.
Stephan Mühlstrasser [EMAIL PROTECTED] www.pdflib.com
PDFlib GmbH, Franziska-Bilek-Weg 9, 80339 München, Germany
Court of registry/Amtsgericht München HRB 129497
Managing Directors/Geschäftsführer: Thomas Merz, Petra Porst
PDFlib: powerful toolkits for PDF developers since 1997
_______ See www.pdflib.com/products for product details________