Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Tika, mail # user - XML charset verification ignoring xml tag


Copy link to this message
-
XML charset verification ignoring xml tag
Ramon Rosa da Silva 2012-08-21, 12:13
Hello,

I'm using CharsetDetector for determine charset of any file type.
But in XML type and when it have tag <?xml version="1.0" encoding="ISO-8859-1"?> tika api use this information to determine charset.
This behavior not work in my scenario, because my customers sometimes sending file with one xml encoding and other real charset.
Can I disable xml verification by tag xml encoding?
Regards, 
Ramon Rosa da Silva
Developer | Archictecture and Frameworks
[EMAIL PROTECTED]
Skype: ramon.silva.neogrid.com