Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Tika, mail # user - content detection problem using tika-app


Copy link to this message
-
Re: content detection problem using tika-app
Nick Burch 2011-11-20, 19:31
On Sun, 20 Nov 2011, John M wrote:
> I have a .ppt file that I've renamed to be a .doc file (by only changing
> its extension).  If I use the Tika GUI, or the command line, to extract
> the file metadata, then Tika correctly identifies the content type as a
> Powerpoint file.  However, if I use the command line -d option to detect
> its content type, the application returns "application/msword", which is
> of course only superficially correct.

What version of Tika are you trying with? If it isn't 1.0, I'd suggest you
upgrade and re-test. (We've made detectors pluggable like parsers fairly
recently, which changed how the container aware detectors were made
available and used)

Nick