Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Plain View
Tika, mail # user - RE: Excel Parser - Blank Cell


+
Gangwal, Adish 2012-01-26, 21:21
Copy link to this message
-
RE: Excel Parser - Blank Cell
Nick Burch 2012-01-27, 12:03
On Thu, 26 Jan 2012, Gangwal, Adish (IS Consultant) wrote:
> When I parse the excel which has an empty cell, it doesn't create a
> extra tab character.
>
> If there are three cells of which middle one is empty, it skips the
> middle cell and only outputs 1st and 3rd cell with a tab

Tika itself doesn't generate tab characters, it generates xhtml table
elements. It's the text content handler that does tabs

In general though, Tika will generate the text that is present.

If you're trying to generate a CSV or similar, and want full control over
what shows up, missing cells etc, then I'd suggest you look at using
Apache POI directly.

Nick
+
Gangwal, Adish 2012-01-27, 22:14
+
Nick Burch 2012-01-30, 12:40