Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Tika, mail # user - XML charset verification ignoring xml tag


Copy link to this message
-
XML charset verification ignoring xml tag
Ramon Rosa da Silva 2012-08-20, 14:05
Hello,
I'm using CharsetDetector for determine charset of any file type.
But in XML type and when it have tag <?xml version="1.0" encoding="ISO-8859-1"?> tika api use this information to determine charset.
This behavior not work in my scenario, because my customers sometimes sending file with one xml encoding and other real charset.

Can I disable xml verification by tag xml encoding?

Regards,

[cid:[EMAIL PROTECTED]6B40]

Ramon Rosa da Silva

Developer | Archictecture and Frameworks
[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
Skype: ramon.silva.neogrid.com