-Parsing large xlsx file takes much longer (and usually crashes) with tika than directly with POI
nutch.buddy@...) 2012-04-11, 11:36
I'm trying to use tika-parsers to parse a 100mb xlsx file.
I find myself waiting a lot of time (maybe an hour or two) and rarely have
the file parsed.
usually i get a "gc overhead limit exceeded" exception.
When I parse the same file with a few lines of code using POI library, the
file is pared successfully, and relatively fast.
Any inputs on this?
I use tika-core-0.10 and tika-parsers-0.10 when I use tika and poi-3.8-beta3
when I use POI.
View this message in context: http://lucene.472066.n3.nabble.com/Parsing-large-xlsx-file-takes-much-longer-and-usually-crashes-with-tika-than-directly-with-POI-tp3902267p3902267.html
Sent from the Apache Tika - Development mailing list archive at Nabble.com.