Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Lucene and all its subprojects:

Switch to Threaded View
Tika >> mail # user >> Problem with overriding built-in parser


Copy link to this message
-
Re: Problem with overriding built-in parser
Hi Nick,

thanks for your reply.

Am 16.02.12 16:51, schrieb Nick Burch:
> On Tue, 14 Feb 2012, Stephan Mühlstrasser wrote:
>> https://issues.apache.org/jira/browse/TIKA-527
>...
>
>> The problem is that using the proposed method does not work for me.
>> Any use of the configuration file apparently sends Tika into an
>> endless recursion, even without overriding a built-in parser in the
>> configuration file.
>
> Are you able to produce a unit test that shows the problem?

That's what I was trying to provide with the example in my previous message:

>
>> If I understand it correctly, the following configuration file should
>> have the same effect as the built-in configuration:
>>
>>> $ cat tika-config.xml
>>> <properties>
>>> <parsers>
>>> <parser class="org.apache.tika.parser.DefaultParser"/>
>>> </parsers>
>>> </properties>

If you invoke the Tika CLI application with this configuration file, the
error happens. Just start it like this: "java
-Dtika.config=tika-config.xml -jar tika-app-1.0.jar --list-parsers" and
the error will happen.

> Ah, I'm not sure that's correct. I think you also need to give a
> mimetypes and a detector. Looking at lines 145 to 172 of TikaConfig, it
> seems that you either get the defaults with no config, or specify them
> all with your own config
>

Ok, I see now in the source what you mean. Then the example in TIKA-527
is not complete, as it does not have mimetypes and a detector.

In the meantime since yesterday I got my override working by packaging a
META-INF/services/org.apache.tika.parser.Parser into the JAR file
together with my parser. So I don't need the configuration file approach
anymore. But I think it still could be considered a bug if an
incorrect/insufficient configuration file sends Tika into an endless
recursion instead of producing a meaningful error message.

Thanks
Stephan

--
_______________________________________________________________
Stephan Mühlstrasser   [EMAIL PROTECTED]            www.pdflib.com
   PDFlib GmbH, Franziska-Bilek-Weg 9, 80339 München,  Germany
        Court of registry/Amtsgericht München HRB 129497
  Managing Directors/Geschäftsführer: Thomas Merz, Petra Porst
---------------------------------------------------------------
     PDFlib: powerful toolkits for PDF developers since 1997
_______ See www.pdflib.com/products for product details________
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB