Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Tika, mail # dev - Convert file before Tika processes it?


Copy link to this message
-
Re: Convert file before Tika processes it?
Nick Burch 2012-06-21, 17:07
On Wed, 20 Jun 2012, 122jxgcn wrote:
> Hi, I'm currently working on Tika to properly process custom file type
> (*.hwp file) I have a binary executable file which converts hwp file
> into xml file. I'm not sure how can I include this binary file so that
> when Tika encounters hwp file, it can automatically convert in to xml
> file using the binary, and pass the document to XMLParser. Any
> suggestions?

I'd suggest you do a custom parser for your file format, which first calls
out to your custom program, then feeds the result directly to Tika's
XMLParser.

The website has a good guide on writing your own custom parsers:
    http://tika.apache.org/1.1/parser_guide.html

Nick