| clear query|facets|time |
Search criteria: tika 0.8.
Results from 91 to 100 from
136 (4.787s).
|
|
|
Loading phrases to help you refine your search...
|
|
[TIKA-239] System.err prints from XmlRootExtractor - Tika - [issue]
|
|
... messages to System.err, as shown below:
$ java -jar tika-app-0.4-SNAPSHOT.jar --text lucene-2.2.0-src.zip > /dev/null
java.io.EOFException
at com.sun.org.apache.xerces...
|
|
....internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:395)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:198)
at org.apache.tika.detect.Xml...
|
[+ show more]
[- hide]
| ...RootExtractor.extractRootElement(XmlRootExtractor.java:55)
at org.apache.tika.mime.MimeTypes.getMimeType(MimeTypes.java:219)
at org.apache.tika.mime.MimeTypes.detect(MimeTypes.java:514)
at org.apache.tika... |
| ....parser.AutoDetectParser.parse(AutoDetectParser.java:76)
at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:83)
at org.apache.tika.parser.pkg.PackageParser.parseEntry(PackageParser.java:65)
at org.apache.tika... |
| ....parser.pkg.ZipParser.parse(ZipParser.java:56)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:85)
at org.apache.tika... |
|
|
http://issues.apache.org/jira/browse/TIKA-239
Author: Jukka Zitting,
2010-01-27, 16:02
|
|
|
Re: [DISCUSS] Release Candidate for 1.3? - Tika - [mail # dev]
|
|
...Hi, On Tue, Jan 8, 2013 at 11:56 PM, Dave Meikle wrote: ones (TIKA-962, TIKA-963) fixed on trunk, so I was wondering if it was time +1 It's high time for us to release again! Re...
|
|
...: binary compatibility; Before cutting the release it would be a good idea to update the clirr plugin configuration to use Tika 1.2 instead of 1.0 when checking for binary compatibility. Also...
|
|
|
Author: Jukka Zitting,
2013-01-09, 11:14
|
|
|
Re: Not Parsing HTML Elements with a class - Tika - [mail # user]
|
|
...Hi, On Mon, Apr 8, 2013 at 9:32 PM, Jason Tesser wrote: I see two options: 1) Use the IdentityHtmlMapper strategy to have Tika pass you all HTML elements as-is. Then you can explicitly...
|
|
...="donotparse" strategy you describe. This approach requires changes in Tika, so you might want to consider submitting a patch of your (ideally backwards-compatible) changes. BR, Jukka Zitting ...
|
|
|
Author: Jukka Zitting,
2013-04-09, 04:49
|
|
|
Re: Fails to detect language for UTF-8 file, but it works for ISO-latin - Tika - [mail # user]
|
|
...Hi, On Sat, Aug 21, 2010 at 5:55 PM, Jan Høydahl / Cominvent wrote: The tika-app jar doesn't do language detection by default. The language metadata you're seeing is a result...
|
|
|
Author: Jukka Zitting,
2010-08-24, 15:00
|
|
|
Re: Problem detecting Microsoft Office formats from InputStream - Tika - [mail # user]
|
|
...Hi, On Sun, Sep 23, 2012 at 8:07 PM, naskoo wrote: It doesn't add extra metadata (unless explicitly requested). Instead the TikaInputStream class allows Tika parsers and detectors to use...
|
|
... random access for reading the underlying file. The MS Office detectors (and a few other features in Tika) rely on that functionality, and thus won't give as accurate results when given just...
|
|
|
Author: Jukka Zitting,
2012-09-23, 19:33
|
|
|
Towards 1.0 - Tika - [mail # dev]
|
|
...Hi, It's a few months since 0.9 and our Tika in Action book is soon ready for print, so I think it's good time to start planning for the 1.0 release. There are a few odds and ends that I...
|
|
... release about Tika reaching 1.0 status. BR, Jukka Zitting ...
|
[+ show more]
[- hide]
| ...'d still like to sort out in the trunk, but overall I think we're in a pretty much ready for the switch from 0.x to 1.x. One major issue to be decided is whether we want to follow up... |
| ... with the earlier intention of dropping deprecated functionality (like the three-argument parse() method) before the 1.0 release. I think we should do that and also make some other backwards... |
|
|
Author: Jukka Zitting,
2011-05-20, 16:01
|
|
|
Re: Which mime type in ParseUtils.getStringContent() ? - Tika - [mail # user]
|
|
....apache.org/0.9/api/org/apache/tika/Tika.html BR, Jukka Zitting ...
|
|
...Hi, On Thu, Apr 7, 2011 at 9:52 PM, Mark wrote: Please use the org.apache.tika.Tika facade class instead of the old ParseUtils class. The code to parse an unknown file or an input...
|
[+ show more]
[- hide]
| ... stream with the Tika facade is simply: String text = new Tika().parseToString(file); or String text = new Tika().parseToString(stream); See [1] for the details. [1] http://tika... |
|
|
Author: Jukka Zitting,
2011-04-07, 20:25
|
|
|
Re: Problem detecting XML - Tika - [mail # user]
|
|
...Hi, On Tue, Apr 17, 2012 at 6:06 PM, Taylor, Wade wrote: That's the UTF-8 byte order mark. I guess Tika should be able to deal with that, but AFAICT it currently doesn't. Would you mind...
|
|
... filing a bug report about this? Hmm, can you verify that the returned input stream actually contains what you expect it to? Also, you can check the difference of how Tika detects full files...
|
[+ show more]
[- hide]
| ... (with the extra file name hint) and plain byte streams by comparing the output of the following two commands: java -jar tika-app-1.1.jar --detect sample_fixed.wde java -jar tika-app-1... |
|
|
Author: Jukka Zitting,
2012-04-17, 16:33
|
|
|
Re: 1.0 release, and graduation - ManifoldCF - [mail # dev]
|
|
... is roughly similar to what we experienced during the incubation of Apache Tika. In the last year before graduation (2008) I was responsible for about 87% of all commits, which raised similar concerns...
|
|
... wrong. Since then Lucene has shed out most subprojects to avoid being too large to manage, and by the time Tika in 2010 became a TLP by itself my share of all commits had shrunk to a still high...
|
[+ show more]
[- hide]
| .... Some of the key things I did in Tika to help reduce my central role there were to lower the barriers of entry by working on things like the Getting Started page [3] and adding tools like... |
| ... the runnable tika-app jar and the simple GUI interface that make it trivially easy for someone to get started using Tika. The Build and Deploy guide in ManifoldCF [4] and the start.jar mechanism... |
| .... Thus I think these are areas that we should try to focus on in near future. [1] http://en.wikipedia.org/wiki/Bus_factor [2] http://markmail.org/message/bvqs2zv762fmlyv5 [3] http://tika... |
|
|
Author: Jukka Zitting,
2011-09-21, 09:41
|
|
|
Re: Towards 1.0 - Tika - [mail # dev]
|
|
... releases after that, so I wouldn't put any single issue as a blocker. On the other hand this will probably be the first Tika release that many new users will encounter, so we should strive to make...
|
|
... that are already using Tika. Any takers? I can probably get Adobe on board. BR, Jukka Zitting ...
|
[+ show more]
[- hide]
| ...Hi, Thanks for the feedback! It sounds like we should be good to go for the 1.0 release in about a month from now. The release doesn't need to be perfect as we can always do 1.1 and other... |
| ... it as good as we can. It sounds like we have consensus to get rid of the deprecated stuff before 1.0. I don't think a separate 0.9.9 or 0.10 release for that is really needed, but it would be good... |
| ... to create a 0.x branch right before the backwards-incompatible changes so people who have trouble with the upgrade still have something more recent than 0.9 to work with. We'll take lead with Chris... |
|
|
Author: Jukka Zitting,
2011-05-23, 13:45
|
|
|
|