Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Tika, mail # user - RE: Excel Parser - Blank Cell


Copy link to this message
-
RE: Excel Parser - Blank Cell
Nick Burch 2012-01-27, 12:03
On Thu, 26 Jan 2012, Gangwal, Adish (IS Consultant) wrote:
> When I parse the excel which has an empty cell, it doesn't create a
> extra tab character.
>
> If there are three cells of which middle one is empty, it skips the
> middle cell and only outputs 1st and 3rd cell with a tab

Tika itself doesn't generate tab characters, it generates xhtml table
elements. It's the text content handler that does tabs

In general though, Tika will generate the text that is present.

If you're trying to generate a CSV or similar, and want full control over
what shows up, missing cells etc, then I'd suggest you look at using
Apache POI directly.

Nick