|
|
-
Re: Parse metadata onlyNick Burch 2012-05-29, 13:50
On Tue, 29 May 2012, Thinus Prinsloo wrote:
> I would like to parse the meta-data of a massive amount of PDF files > only. I do not want to extract the text, not yet anyway, only get > meta-data information such as "Creation-Date", etc. Is it possible for > Tika to provide the meta-data without doing a parse on the whole > document (with a content handler, etc.)? At the moment, that's not possible. Most file formats don't have all their metadata in entirely separate places, so you end up having to process almost all of the file anyway. (There has been talk about implementing this in the past, but this problem has largely meant it hasn't been tackled) If you don't want the text, you can just pass in a content handler that ignores everything Nick |