| clear query|facets|time |
Search criteria: .
Results from 131 to 140 from
161 (3.33s).
|
|
|
Loading phrases to help you refine your search...
|
|
jempbox missing from Apache Maven repo? - Tika - [mail # dev]
|
|
...Has anybody else noticed that Maven never finds jempbox when resolving dependencies? If I look in the Apache snapshots repo: https://repository.apache.org/content/groups/s...
|
|
|
Author: Ken Krugler,
2010-02-23, 18:45
|
|
|
Re: Character encodings on the web - Tika - [mail # dev]
|
|
...On Jan 29, 2010, at 4:15am, Jukka Zitting wrote: Yes (they are infrequent). In fact I've never seen a page encoded as any of the other possible transformations for Unicode. &nb...
|
|
|
Author: Ken Krugler,
2010-01-29, 13:58
|
|
|
Re: Timeout support with parsers - Tika - [mail # dev]
|
|
...Hi Jukka, Could you provide more details about the problems here? E.g. correctly interrupting the read requests when a timeout happens? Or something else. Thanks, &...
|
|
|
Author: Ken Krugler,
2010-01-26, 21:56
|
|
|
Timeout support with parsers - Tika - [mail # dev]
|
|
...I've run into a number of documents that cause Tika to hang. This is especially true when I'm only fetching the first 8K bytes, so the parsers are often dealing with an incompl...
|
|
|
Author: Ken Krugler,
2010-01-25, 21:57
|
|
|
[TIKA-357] Increase buffer size for meta tag sniffing - Tika - [issue]
|
|
...Some web pages (such as makler.su, see attached) have lots of script data before the body of the HTML.When this happens, the sniffing code fails to find the charset info in the meta tag, bec...
|
|
|
http://issues.apache.org/jira/browse/TIKA-357
Author: Ken Krugler,
2010-01-20, 05:50
|
|
|
Re: Extracting dublin core metadata in HtmlParser? - Tika - [mail # dev]
|
|
...Hi Nick, On Jan 19, 2010, at 5:41am, Nick Burch wrote: Only location & encoding are explicitly looked for, but all meta tag values get put into the metadata map. Se...
|
|
|
Author: Ken Krugler,
2010-01-19, 14:01
|
|
|
Re: Tika command line performance - Tika - [mail # dev]
|
|
...On Jan 15, 2010, at 11:27am, Doug Carter wrote: In that case, another cheesy solution is to have the Java process watch a specific directory. Whenever a new file (with the appr...
|
|
|
Author: Ken Krugler,
2010-01-15, 19:37
|
|
|
Re: Tika command line performance - Tika - [mail # dev]
|
|
...On Jan 15, 2010, at 11:07am, Doug Carter wrote: If you have a set of documents, easiest would be to pass in a directory to tika-app (extend it a bit) so that one invocation of ...
|
|
|
Author: Ken Krugler,
2010-01-15, 19:19
|
|
|
Re: PDF parser exception - Tika - [mail # dev]
|
|
...Hi Doug, Acrobat 9 was a known problem for PDFBox, which is the PDF parser that Tika wraps. But according to http://issues.apache.org/jira/browse/PDFBOX-361, this ...
|
|
|
Author: Ken Krugler,
2010-01-13, 00:12
|
|
|
Re: PDF parser exception - Tika - [mail # dev]
|
|
...Hi Doug, On Jan 12, 2010, at 11:37am, Doug Carter wrote: Is this the case with any and all PDF files? Based on the stack trace below, it sure looks like a busted file, but...
|
|
|
Author: Ken Krugler,
2010-01-12, 22:18
|
|
|
|