Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: tika 0.8.   Results from 91 to 100 from 136 (4.787s).
Loading phrases to help you
refine your search...
[TIKA-239] System.err prints from XmlRootExtractor - Tika - [issue]
... messages to System.err, as shown below: $ java -jar tika-app-0.4-SNAPSHOT.jar --text lucene-2.2.0-src.zip > /dev/null java.io.EOFException at com.sun.org.apache.xerces...
....internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522) at javax.xml.parsers.SAXParser.parse(SAXParser.java:395) at javax.xml.parsers.SAXParser.parse(SAXParser.java:198) at org.apache.tika.detect.Xml...
[+ show more]
http://issues.apache.org/jira/browse/TIKA-239    Author: Jukka Zitting, 2010-01-27, 16:02
Re: [DISCUSS] Release Candidate for 1.3? - Tika - [mail # dev]
...Hi,  On Tue, Jan 8, 2013 at 11:56 PM, Dave Meikle  wrote:   ones (TIKA-962, TIKA-963) fixed on trunk, so I was wondering if it was time  +1 It's high time for us to release again!  Re...
...: binary compatibility; Before cutting the release it would be a good idea to update the clirr plugin configuration to use Tika 1.2 instead of 1.0 when checking for binary compatibility.  Also...
   Author: Jukka Zitting, 2013-01-09, 11:14
Re: Not Parsing HTML Elements with a class - Tika - [mail # user]
...Hi,  On Mon, Apr 8, 2013 at 9:32 PM, Jason Tesser  wrote:  I see two options:  1) Use the IdentityHtmlMapper strategy to have Tika pass you all HTML elements as-is. Then you can explicitly...
...="donotparse" strategy you describe. This approach requires changes in Tika, so you might want to consider submitting a patch of your (ideally backwards-compatible) changes.  BR,  Jukka Zitting ...
   Author: Jukka Zitting, 2013-04-09, 04:49
Re: Fails to detect language for UTF-8 file, but it works for ISO-latin - Tika - [mail # user]
...Hi,  On Sat, Aug 21, 2010 at 5:55 PM, Jan Høydahl / Cominvent  wrote:  The tika-app jar doesn't do language detection by default. The language metadata you're seeing is a result...
   Author: Jukka Zitting, 2010-08-24, 15:00
Re: Problem detecting Microsoft Office formats from InputStream - Tika - [mail # user]
...Hi,  On Sun, Sep 23, 2012 at 8:07 PM, naskoo  wrote:  It doesn't add extra metadata (unless explicitly requested). Instead the TikaInputStream class allows Tika parsers and detectors to use...
... random access for reading the underlying file.  The MS Office detectors (and a few other features in Tika) rely on that functionality, and thus won't give as accurate results when given just...
   Author: Jukka Zitting, 2012-09-23, 19:33
Towards 1.0 - Tika - [mail # dev]
...Hi,  It's a few months since 0.9 and our Tika in Action book is soon ready for print, so I think it's good time to start planning for the 1.0 release.  There are a few odds and ends that I...
... release about Tika reaching 1.0 status.  BR,  Jukka Zitting ...
[+ show more]
   Author: Jukka Zitting, 2011-05-20, 16:01
Re: Which mime type in ParseUtils.getStringContent() ? - Tika - [mail # user]
....apache.org/0.9/api/org/apache/tika/Tika.html  BR,  Jukka Zitting ...
...Hi,  On Thu, Apr 7, 2011 at 9:52 PM, Mark  wrote:  Please use the org.apache.tika.Tika facade class instead of the old ParseUtils class.  The code to parse an unknown file or an input...
[+ show more]
   Author: Jukka Zitting, 2011-04-07, 20:25
Re: Problem detecting XML - Tika - [mail # user]
...Hi,  On Tue, Apr 17, 2012 at 6:06 PM, Taylor, Wade  wrote:  That's the UTF-8 byte order mark. I guess Tika should be able to deal with that, but AFAICT it currently doesn't. Would you mind...
... filing a bug report about this?   Hmm, can you verify that the returned input stream actually contains what you expect it to?  Also, you can check the difference of how Tika detects full files...
[+ show more]
   Author: Jukka Zitting, 2012-04-17, 16:33
Re: 1.0 release, and graduation - ManifoldCF - [mail # dev]
... is roughly similar to what we experienced during the incubation of Apache Tika. In the last year before graduation (2008) I was responsible for about 87% of all commits, which raised similar concerns...
... wrong.  Since then Lucene has shed out most subprojects to avoid being too large to manage, and by the time Tika in 2010 became a TLP by itself my share of all commits had shrunk to a still high...
[+ show more]
   Author: Jukka Zitting, 2011-09-21, 09:41
Re: Towards 1.0 - Tika - [mail # dev]
... releases after that, so I wouldn't put any single issue as a blocker. On the other hand this will probably be the first Tika release that many new users will encounter, so we should strive to make...
... that are already using Tika. Any takers? I can probably get Adobe on board.  BR,  Jukka Zitting ...
[+ show more]
   Author: Jukka Zitting, 2011-05-23, 13:45
Sort:
project
Tika (122)
Lucene (11)
ManifoldCF (2)
Solr (1)
type
mail # dev (56)
mail # user (47)
issue (23)
mail # general (10)
date
last 7 days (0)
last 30 days (0)
last 90 days (1)
last 6 months (4)
last 9 months (136)
author
Mattmann, Chris A (165)
Jukka Zitting (136)
Grant Ingersoll (128)
Ken Krugler (64)
Nick Burch (56)
Michael McCandless (46)
Julien Nioche (38)
Markus Jelsma (34)
Lewis John Mcgibbney (31)
Oleg Tikhonov (23)
Uwe Schindler (16)
brad (16)
Dave Meikle (15)
Jack Krupansky (15)
Andrzej Bialecki (14)