Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Plain View
Tika, mail # user - Testing Tika


+
Mark Kerzner 2011-09-07, 01:29
+
Julien Nioche 2011-09-07, 07:36
+
Michael McCandless 2011-09-07, 12:29
+
Steve Aulenbach 2011-09-07, 17:04
Copy link to this message
-
Re: Testing Tika
Michael McCandless 2011-09-07, 17:30
Sorry, I don't understand what this output is telling me?

Ie these 5 files are Tika's sources.... but, what's wrong with them?

I thought we were talking about certain emails from the Enron corpus
where Tika hits an exception or fails to extract text...

Mike McCandless

http://blog.mikemccandless.com

On Wed, Sep 7, 2011 at 1:04 PM, Steve Aulenbach <[EMAIL PROTECTED]> wrote:
> Hi Mike,
> Here you go. I ran a quick analysis on revision 1166216 and saw the
> following:
>
> Analysis Summary:
>
> Files: 510
>
> *** Warning *** File(s) Not Found 5:
>
> /tika-parsers/src/main/java/org/apache/tika/detect/ContainerAwareDetector.java
>
> /tika-parsers/src/main/java/org/apache/tika/detect/POIFSContainerDetector.java
>
> /tika-parsers/src/main/java/org/apache/tika/detect/ZipContainerDetector.java
>
> /tika-parsers/src/test/java/org/apache/tika/parser/chm/TestUtils.java
>
> /tika-parsers/target/surefire-reports/TEST-org.apache.tika.parser.chm.TestUtils.xml
>
> Thanks,
> Steve
>
>
> On Wed, Sep 7, 2011 at 6:29 AM, Michael McCandless
> <[EMAIL PROTECTED]> wrote:
>>
>> On Tue, Sep 6, 2011 at 9:29 PM, Mark Kerzner <[EMAIL PROTECTED]>
>> wrote:
>>
>> > Is anybody interested in the results of all the testing that
>> > I am doing, and if yes, how should I report my findings?
>>
>> I'm interested!  This sounds great....
>>
>> Tika should strive to have no errors on any valid documents... so if
>> you (or anyone) are hitting bugs in Tika/POI/PDFBox/etc., let's
>> characterize them, open issues, and get them fixed :)
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>
>
+
Steve Aulenbach 2011-09-07, 18:04
+
Mark Kerzner 2011-09-08, 12:43
+
Michael McCandless 2011-09-15, 10:26
+
Mark Kerzner 2011-09-15, 13:02
+
Albretch Mueller 2011-09-17, 07:08
+
Mark Kerzner 2011-09-18, 01:45