Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: .   Results from 101 to 110 from 410 (0.172s).
Loading phrases to help you
refine your search...
Re: Tika API and field postprocessing - Tika - [mail # user]
...On Sun, 27 May 2012, Raphaᅵl wrote:  I believe you'll need to ask on the SOLR list about this, as it's likely  to be specific to ExtractingRequestHandler which is maintained by S...
   Author: Nick Burch, 2012-05-27, 21:28
RE: A plan to improve the metadata property definitions - Tika - [mail # dev]
...On Tue, 22 May 2012, Joerg Ehrlich wrote:  The only thing the current setup won't support is Structured Properties.  (That hasn't changed). That will need more work, but hopefully ...
   Author: Nick Burch, 2012-05-23, 15:23
Re: Unable to read default mimetypes error message - Tika - [mail # user]
...On Fri, 18 May 2012, Karthik Deivasigamani wrote:  Are you sure you haven't got any other Tika jars on your classpath? And  have you done something bizzare with the XML parser that...
   Author: Nick Burch, 2012-05-22, 00:04
Re: Tika fails to extract text from very large files - Tika - [mail # user]
...On Thu, 17 May 2012, Alec Swan wrote:  In that kind of situation, you should be looking at using something like the fork parser or the tika server   That looks like a PDFBox bug, y...
   Author: Nick Burch, 2012-05-17, 16:12
Re: A plan to improve the metadata property definitions - Tika - [mail # dev]
...On Thu, 17 May 2012, Mattmann, Chris A (388J) wrote:  We've tried to keep all the issues and commits nice and small, so they're  easy to review, but we did end up on an epic 10 hou...
   Author: Nick Burch, 2012-05-17, 02:57
Re: Tika fails to extract text from very large files - Tika - [mail # user]
...On Wed, 16 May 2012, Alec Swan wrote:  Not all file formats support stream based parsing, many can only be  sensibly parsed in a DOM-like way. For those, the who file needs to be &...
   Author: Nick Burch, 2012-05-16, 23:07
Re: Tika fails to extract text from very large files - Tika - [mail # user]
...On Wed, 16 May 2012, Alec Swan wrote:  There is absolutely no way that you're going to be able to parse a PDF,  DOC/DOCX or PPT/PPTX of more than about 20mb in size on a 128mb heap...
   Author: Nick Burch, 2012-05-16, 22:08
Re: Tika fails to extract text from very large files - Tika - [mail # user]
...On Wed, 16 May 2012, Alec Swan wrote:  Are you running out of memory? PPT/PPTX, DOC/DOCX and PDF are all formats  which can only be parsed by building a DOM-like structure in memor...
   Author: Nick Burch, 2012-05-16, 21:45
A plan to improve the metadata property definitions - Tika - [mail # dev]
...Hi All  I've just been brainstorming with Ray Gauss, and we think we've come up  with a way to move towards cleaner and clearer metadata property  definitions (prefixes, prope...
   Author: Nick Burch, 2012-05-16, 15:50
[TIKA-917] Parser for executables (metadata) - Tika - [issue]
...Based on the investigations for TIKA-913, it should be fairly easy to implement a parser to extract metadata from executables (PE and ELF). This could give us a similar level of information ...
http://issues.apache.org/jira/browse/TIKA-917    Author: Nick Burch, 2012-05-13, 19:48
Sort:
project
Tika (410)
Lucene (22)
type
mail # user (211)
mail # dev (132)
issue (67)
date
last 7 days (0)
last 30 days (0)
last 90 days (15)
last 6 months (43)
last 9 months (410)
author
Jukka Zitting (530)
Nick Burch (410)
Mattmann, Chris A (324)
Michael McCandless (176)
Ken Krugler (161)
buildbot@...)
Oleg Tikhonov (58)
Markus Jelsma (56)
Mark Kerzner (53)
Dave Meikle (48)
Maxim Valyanskiy (46)
Keith R. Bennett (45)
Ray Gauss II (40)
Antoni Mylka (37)
Benson Margulies (37)