Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: .   Results from 131 to 140 from 161 (3.33s).
Loading phrases to help you
refine your search...
jempbox missing from Apache Maven repo? - Tika - [mail # dev]
...Has anybody else noticed that Maven never finds jempbox when resolving   dependencies?  If I look in the Apache snapshots repo:  https://repository.apache.org/content/groups/s...
   Author: Ken Krugler, 2010-02-23, 18:45
Re: Character encodings on the web - Tika - [mail # dev]
...On Jan 29, 2010, at 4:15am, Jukka Zitting wrote:   Yes (they are infrequent). In fact I've never seen a page encoded as   any of the other possible transformations for Unicode. &nb...
   Author: Ken Krugler, 2010-01-29, 13:58
Re: Timeout support with parsers - Tika - [mail # dev]
...Hi Jukka,   Could you provide more details about the problems here?  E.g. correctly interrupting the read requests when a timeout happens?   Or something else.  Thanks, &...
   Author: Ken Krugler, 2010-01-26, 21:56
Timeout support with parsers - Tika - [mail # dev]
...I've run into a number of documents that cause Tika to hang. This is   especially true when I'm only fetching the first 8K bytes, so the   parsers are often dealing with an incompl...
   Author: Ken Krugler, 2010-01-25, 21:57
[TIKA-357] Increase buffer size for meta tag sniffing - Tika - [issue]
...Some web pages (such as makler.su, see attached) have lots of script data before the body of the HTML.When this happens, the sniffing code fails to find the charset info in the meta tag, bec...
http://issues.apache.org/jira/browse/TIKA-357    Author: Ken Krugler, 2010-01-20, 05:50
Re: Extracting dublin core metadata in HtmlParser? - Tika - [mail # dev]
...Hi Nick,  On Jan 19, 2010, at 5:41am, Nick Burch wrote:   Only location & encoding are explicitly looked for, but all meta tag   values get put into the metadata map.  Se...
   Author: Ken Krugler, 2010-01-19, 14:01
Re: Tika command line performance - Tika - [mail # dev]
...On Jan 15, 2010, at 11:27am, Doug Carter wrote:   In that case, another cheesy solution is to have the Java process   watch a specific directory. Whenever a new file (with the appr...
   Author: Ken Krugler, 2010-01-15, 19:37
Re: Tika command line performance - Tika - [mail # dev]
...On Jan 15, 2010, at 11:07am, Doug Carter wrote:   If you have a set of documents, easiest would be to pass in a   directory to tika-app (extend it a bit) so that one invocation of ...
   Author: Ken Krugler, 2010-01-15, 19:19
Re: PDF parser exception - Tika - [mail # dev]
...Hi Doug,   Acrobat 9 was a known problem for PDFBox, which is the PDF parser that   Tika wraps.  But according to http://issues.apache.org/jira/browse/PDFBOX-361, this   ...
   Author: Ken Krugler, 2010-01-13, 00:12
Re: PDF parser exception - Tika - [mail # dev]
...Hi Doug,  On Jan 12, 2010, at 11:37am, Doug Carter wrote:   Is this the case with any and all PDF files?  Based on the stack trace below, it sure looks like a busted file, but...
   Author: Ken Krugler, 2010-01-12, 22:18
Sort:
project
Tika (161)
Solr (160)
Nutch (90)
Mahout (56)
Lucene (52)
Droids (4)
type
mail # dev (110)
issue (51)
date
last 7 days (0)
last 30 days (0)
last 90 days (1)
last 6 months (7)
last 9 months (161)
author
Jukka Zitting (530)
Nick Burch (410)
Mattmann, Chris A (324)
Michael McCandless (176)
Ken Krugler (161)
buildbot@...)
Oleg Tikhonov (58)
Markus Jelsma (56)
Mark Kerzner (53)
Dave Meikle (49)
Maxim Valyanskiy (46)
Keith R. Bennett (45)
Ray Gauss II (40)
Antoni Mylka (37)
Benson Margulies (37)