|
Mark Kerzner
2011-08-30, 16:07
Michael McCandless
2011-08-30, 16:15
Mark Kerzner
2011-08-30, 16:20
Michael McCandless
2011-08-30, 17:25
Nick Burch
2011-08-30, 17:26
Mark Kerzner
2011-08-30, 17:27
Mark Kerzner
2011-08-30, 17:36
Mark Kerzner
2011-08-30, 17:51
Michael McCandless
2011-08-30, 18:01
Michael McCandless
2011-08-30, 18:57
Mark Kerzner
2011-08-30, 19:04
Mark Kerzner
2011-08-30, 19:15
Jukka Zitting
2011-08-30, 21:19
Mark Kerzner
2011-08-30, 21:40
Nick Burch
2011-08-30, 21:49
Nick Burch
2011-08-30, 22:25
Mark Kerzner
2011-08-31, 00:08
Mark Kerzner
2011-08-31, 00:20
Mark Kerzner
2011-08-31, 02:10
Mark Kerzner
2011-08-31, 02:43
Mark Kerzner
2011-08-31, 04:27
Mark Kerzner
2011-08-31, 04:39
Jukka Zitting
2011-08-31, 16:43
Uwe Schindler
2011-08-31, 16:53
Jukka Zitting
2011-08-31, 17:25
Uwe Schindler
2011-08-31, 17:29
Michael McCandless
2011-09-01, 09:36
Jukka Zitting
2011-09-01, 09:56
Uwe Schindler
2011-09-01, 10:00
Michael McCandless
2011-09-01, 10:18
Uwe Schindler
2011-09-01, 10:41
Torsten Krah
2011-09-06, 09:19
Torsten Krah
2011-09-06, 12:16
Nick Burch
2011-09-06, 12:56
Zabrane Mickael
2011-09-06, 12:58
Kevin Clark
2011-09-06, 18:45
Zabrane Mickael
2011-09-06, 18:52
|
-
Tika leaves files openMark Kerzner 2011-08-30, 16:07
Hi,
I am using the tika-app-1.0-SNAPSHOT.jar from 08/02, and it leaves some files open, as you can see below. Once I parse enough files, I get a "too many files open" error. I used the snapshot because of a feature that I had there (don't remember which one right now). Any advice? Thank you, Mark mark@mark-desktop:/proc$ ls -l 27933/fd total 0 lr-x------ 1 mark mark 64 2011-08-30 10:58 0 -> pipe:[19113179] l-wx------ 1 mark mark 64 2011-08-30 10:58 1 -> pipe:[19113180] lr-x------ 1 mark mark 64 2011-08-30 10:58 10 -> /home/mark/NetBeansProjects/FreeEed/lib/jackson-core-asl-1.5.2.jar lr-x------ 1 mark mark 64 2011-08-30 10:58 11 -> /home/mark/NetBeansProjects/FreeEed/lib/jackson-mapper-asl-1.5.2.jar lr-x------ 1 mark mark 64 2011-08-30 10:58 12 -> /home/mark/NetBeansProjects/FreeEed/lib/commons-configuration-1.6.jar lr-x------ 1 mark mark 64 2011-08-30 10:58 13 -> /home/mark/NetBeansProjects/FreeEed/lib/tika-app-1.0-SNAPSHOT.jar lr-x------ 1 mark mark 64 2011-08-30 10:58 14 -> /home/mark/NetBeansProjects/FreeEed/lib/commons-lang-2.6.jar lr-x------ 1 mark mark 64 2011-08-30 10:58 15 -> /home/mark/NetBeansProjects/FreeEed/lib/commons-collections-3.2.1.jar lr-x------ 1 mark mark 64 2011-08-30 10:58 16 -> /home/mark/NetBeansProjects/FreeEed/lib/commons-digester-2.1.jar lr-x------ 1 mark mark 64 2011-08-30 10:58 17 -> /home/mark/NetBeansProjects/FreeEed/lib/lucene-core-3.0.3.jar lr-x------ 1 mark mark 64 2011-08-30 10:58 18 -> /home/mark/NetBeansProjects/FreeEed/lib/junit-4.8.2.jar lr-x------ 1 mark mark 64 2011-08-30 10:58 19 -> /home/mark/NetBeansProjects/FreeEed/lib/guava-r09.jar l-wx------ 1 mark mark 64 2011-08-30 10:58 2 -> pipe:[19113181] lr-x------ 1 mark mark 64 2011-08-30 10:58 20 -> /home/mark/NetBeansProjects/FreeEed/lib/truezip-samples-7.3-rc-1-jar-with-dependencies.jar lr-x------ 1 mark mark 64 2011-08-30 10:58 21 -> /usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/jce.jar lr-x------ 1 mark mark 64 2011-08-30 10:58 22 -> /home/mark/NetBeansProjects/FreeEed/freeeed_output/staging/inventory lr-x------ 1 mark mark 64 2011-08-30 10:58 23 -> /home/mark/NetBeansProjects/FreeEed/freeeed_output/staging/input00001.zip lr-x------ 1 mark mark 64 2011-08-30 10:58 25 -> /dev/random lr-x------ 1 mark mark 64 2011-08-30 10:58 26 -> /dev/urandom lr-x------ 1 mark mark 64 2011-08-30 10:58 27 -> /usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/ext/sunpkcs11.jar lr-x------ 1 mark mark 64 2011-08-30 10:58 28 -> /tmp/apache-tika-363283955479395764.tmp (deleted) lr-x------ 1 mark mark 64 2011-08-30 10:58 29 -> /tmp/apache-tika-363283955479395764.tmp (deleted) l-wx------ 1 mark mark 64 2011-08-30 10:58 3 -> /usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/rt.jar lr-x------ 1 mark mark 64 2011-08-30 10:58 30 -> socket:[19118543] lr-x------ 1 mark mark 64 2011-08-30 10:58 31 -> /usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/resources.jar lr-x------ 1 mark mark 64 2011-08-30 10:58 35 -> /usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/charsets.jar lr-x------ 1 mark mark 64 2011-08-30 10:58 4 -> socket:[19113557] lr-x------ 1 mark mark 64 2011-08-30 10:59 5 -> /home/mark/NetBeansProjects/FreeEed/lib/commons-cli-1.2.jar lr-x------ 1 mark mark 64 2011-08-30 10:58 6 -> /home/mark/NetBeansProjects/FreeEed/lib/commons-httpclient-3.0.1.jar lr-x------ 1 mark mark 64 2011-08-30 10:58 7 -> /home/mark/NetBeansProjects/FreeEed/lib/commons-logging-1.0.4.jar lr-x------ 1 mark mark 64 2011-08-30 10:58 8 -> /home/mark/NetBeansProjects/FreeEed/lib/hadoop-core-0.20.2+737.jar lr-x------ 1 mark mark 64 2011-08-30 10:58 9 -> /home/mark/NetBeansProjects/FreeEed/lib/log4j-1.2.15.jar mark@mark-desktop:/proc$
-
Re: Tika leaves files openMichael McCandless 2011-08-30, 16:15
Hmm any idea which document types are leading to the open files?
Or, did you hit any exceptions while parsing the docs? Might help us narrow down which parser isn't closing its temp file... Mike McCandless http://blog.mikemccandless.com On Tue, Aug 30, 2011 at 12:07 PM, Mark Kerzner <[EMAIL PROTECTED]> wrote: > Hi, > I am using the tika-app-1.0-SNAPSHOT.jar from 08/02, and it leaves some > files open, as you can see below. Once I parse enough files, I get a "too > many files open" error. I used the snapshot because of a feature that I had > there (don't remember which one right now). > Any advice? > Thank you, > Mark > > > mark@mark-desktop:/proc$ ls -l 27933/fd > total 0 > lr-x------ 1 mark mark 64 2011-08-30 10:58 0 -> pipe:[19113179] > l-wx------ 1 mark mark 64 2011-08-30 10:58 1 -> pipe:[19113180] > lr-x------ 1 mark mark 64 2011-08-30 10:58 10 -> > /home/mark/NetBeansProjects/FreeEed/lib/jackson-core-asl-1.5.2.jar > lr-x------ 1 mark mark 64 2011-08-30 10:58 11 -> > /home/mark/NetBeansProjects/FreeEed/lib/jackson-mapper-asl-1.5.2.jar > lr-x------ 1 mark mark 64 2011-08-30 10:58 12 -> > /home/mark/NetBeansProjects/FreeEed/lib/commons-configuration-1.6.jar > lr-x------ 1 mark mark 64 2011-08-30 10:58 13 -> > /home/mark/NetBeansProjects/FreeEed/lib/tika-app-1.0-SNAPSHOT.jar > lr-x------ 1 mark mark 64 2011-08-30 10:58 14 -> > /home/mark/NetBeansProjects/FreeEed/lib/commons-lang-2.6.jar > lr-x------ 1 mark mark 64 2011-08-30 10:58 15 -> > /home/mark/NetBeansProjects/FreeEed/lib/commons-collections-3.2.1.jar > lr-x------ 1 mark mark 64 2011-08-30 10:58 16 -> > /home/mark/NetBeansProjects/FreeEed/lib/commons-digester-2.1.jar > lr-x------ 1 mark mark 64 2011-08-30 10:58 17 -> > /home/mark/NetBeansProjects/FreeEed/lib/lucene-core-3.0.3.jar > lr-x------ 1 mark mark 64 2011-08-30 10:58 18 -> > /home/mark/NetBeansProjects/FreeEed/lib/junit-4.8.2.jar > lr-x------ 1 mark mark 64 2011-08-30 10:58 19 -> > /home/mark/NetBeansProjects/FreeEed/lib/guava-r09.jar > l-wx------ 1 mark mark 64 2011-08-30 10:58 2 -> pipe:[19113181] > lr-x------ 1 mark mark 64 2011-08-30 10:58 20 -> > /home/mark/NetBeansProjects/FreeEed/lib/truezip-samples-7.3-rc-1-jar-with-dependencies.jar > lr-x------ 1 mark mark 64 2011-08-30 10:58 21 -> > /usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/jce.jar > lr-x------ 1 mark mark 64 2011-08-30 10:58 22 -> > /home/mark/NetBeansProjects/FreeEed/freeeed_output/staging/inventory > lr-x------ 1 mark mark 64 2011-08-30 10:58 23 -> > /home/mark/NetBeansProjects/FreeEed/freeeed_output/staging/input00001.zip > lr-x------ 1 mark mark 64 2011-08-30 10:58 25 -> /dev/random > lr-x------ 1 mark mark 64 2011-08-30 10:58 26 -> /dev/urandom > lr-x------ 1 mark mark 64 2011-08-30 10:58 27 -> > /usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/ext/sunpkcs11.jar > lr-x------ 1 mark mark 64 2011-08-30 10:58 28 -> > /tmp/apache-tika-363283955479395764.tmp (deleted) > lr-x------ 1 mark mark 64 2011-08-30 10:58 29 -> > /tmp/apache-tika-363283955479395764.tmp (deleted) > l-wx------ 1 mark mark 64 2011-08-30 10:58 3 -> > /usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/rt.jar > lr-x------ 1 mark mark 64 2011-08-30 10:58 30 -> socket:[19118543] > lr-x------ 1 mark mark 64 2011-08-30 10:58 31 -> > /usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/resources.jar > lr-x------ 1 mark mark 64 2011-08-30 10:58 35 -> > /usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/charsets.jar > lr-x------ 1 mark mark 64 2011-08-30 10:58 4 -> socket:[19113557] > lr-x------ 1 mark mark 64 2011-08-30 10:59 5 -> > /home/mark/NetBeansProjects/FreeEed/lib/commons-cli-1.2.jar > lr-x------ 1 mark mark 64 2011-08-30 10:58 6 -> > /home/mark/NetBeansProjects/FreeEed/lib/commons-httpclient-3.0.1.jar > lr-x------ 1 mark mark 64 2011-08-30 10:58 7 -> > /home/mark/NetBeansProjects/FreeEed/lib/commons-logging-1.0.4.jar > lr-x------ 1 mark mark 64 2011-08-30 10:58 8 -> > /home/mark/NetBeansProjects/FreeEed/lib/hadoop-core-0.20.2+737.jar > lr-x------ 1 mark mark 64 2011-08-30 10:58 9 -> > /home/mark/NetBeansProjects/FreeEed/lib/log4j-1.2.15.jar
-
Re: Tika leaves files openMark Kerzner 2011-08-30, 16:20
Yes, I know the precise line (from stepping through in the debugger)
String text = tika.parseToString(new FileInputStream(new File(fileName)), metadata); Thank you, Mark On Tue, Aug 30, 2011 at 11:15 AM, Michael McCandless < [EMAIL PROTECTED]> wrote: > Hmm any idea which document types are leading to the open files? > > Or, did you hit any exceptions while parsing the docs? Might help us > narrow down which parser isn't closing its temp file... > > Mike McCandless > > http://blog.mikemccandless.com > > On Tue, Aug 30, 2011 at 12:07 PM, Mark Kerzner <[EMAIL PROTECTED]> > wrote: > > Hi, > > I am using the tika-app-1.0-SNAPSHOT.jar from 08/02, and it leaves some > > files open, as you can see below. Once I parse enough files, I get a "too > > many files open" error. I used the snapshot because of a feature that I > had > > there (don't remember which one right now). > > Any advice? > > Thank you, > > Mark > > > > > > mark@mark-desktop:/proc$ ls -l 27933/fd > > total 0 > > lr-x------ 1 mark mark 64 2011-08-30 10:58 0 -> pipe:[19113179] > > l-wx------ 1 mark mark 64 2011-08-30 10:58 1 -> pipe:[19113180] > > lr-x------ 1 mark mark 64 2011-08-30 10:58 10 -> > > /home/mark/NetBeansProjects/FreeEed/lib/jackson-core-asl-1.5.2.jar > > lr-x------ 1 mark mark 64 2011-08-30 10:58 11 -> > > /home/mark/NetBeansProjects/FreeEed/lib/jackson-mapper-asl-1.5.2.jar > > lr-x------ 1 mark mark 64 2011-08-30 10:58 12 -> > > /home/mark/NetBeansProjects/FreeEed/lib/commons-configuration-1.6.jar > > lr-x------ 1 mark mark 64 2011-08-30 10:58 13 -> > > /home/mark/NetBeansProjects/FreeEed/lib/tika-app-1.0-SNAPSHOT.jar > > lr-x------ 1 mark mark 64 2011-08-30 10:58 14 -> > > /home/mark/NetBeansProjects/FreeEed/lib/commons-lang-2.6.jar > > lr-x------ 1 mark mark 64 2011-08-30 10:58 15 -> > > /home/mark/NetBeansProjects/FreeEed/lib/commons-collections-3.2.1.jar > > lr-x------ 1 mark mark 64 2011-08-30 10:58 16 -> > > /home/mark/NetBeansProjects/FreeEed/lib/commons-digester-2.1.jar > > lr-x------ 1 mark mark 64 2011-08-30 10:58 17 -> > > /home/mark/NetBeansProjects/FreeEed/lib/lucene-core-3.0.3.jar > > lr-x------ 1 mark mark 64 2011-08-30 10:58 18 -> > > /home/mark/NetBeansProjects/FreeEed/lib/junit-4.8.2.jar > > lr-x------ 1 mark mark 64 2011-08-30 10:58 19 -> > > /home/mark/NetBeansProjects/FreeEed/lib/guava-r09.jar > > l-wx------ 1 mark mark 64 2011-08-30 10:58 2 -> pipe:[19113181] > > lr-x------ 1 mark mark 64 2011-08-30 10:58 20 -> > > > /home/mark/NetBeansProjects/FreeEed/lib/truezip-samples-7.3-rc-1-jar-with-dependencies.jar > > lr-x------ 1 mark mark 64 2011-08-30 10:58 21 -> > > /usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/jce.jar > > lr-x------ 1 mark mark 64 2011-08-30 10:58 22 -> > > /home/mark/NetBeansProjects/FreeEed/freeeed_output/staging/inventory > > lr-x------ 1 mark mark 64 2011-08-30 10:58 23 -> > > /home/mark/NetBeansProjects/FreeEed/freeeed_output/staging/input00001.zip > > lr-x------ 1 mark mark 64 2011-08-30 10:58 25 -> /dev/random > > lr-x------ 1 mark mark 64 2011-08-30 10:58 26 -> /dev/urandom > > lr-x------ 1 mark mark 64 2011-08-30 10:58 27 -> > > /usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/ext/sunpkcs11.jar > > lr-x------ 1 mark mark 64 2011-08-30 10:58 28 -> > > /tmp/apache-tika-363283955479395764.tmp (deleted) > > lr-x------ 1 mark mark 64 2011-08-30 10:58 29 -> > > /tmp/apache-tika-363283955479395764.tmp (deleted) > > l-wx------ 1 mark mark 64 2011-08-30 10:58 3 -> > > /usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/rt.jar > > lr-x------ 1 mark mark 64 2011-08-30 10:58 30 -> socket:[19118543] > > lr-x------ 1 mark mark 64 2011-08-30 10:58 31 -> > > /usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/resources.jar > > lr-x------ 1 mark mark 64 2011-08-30 10:58 35 -> > > /usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/charsets.jar > > lr-x------ 1 mark mark 64 2011-08-30 10:58 4 -> socket:[19113557] > > lr-x------ 1 mark mark 64 2011-08-30 10:59 5 -> > > /home/mark/NetBeansProjects/FreeEed/lib/commons-cli-1.2.jar > > lr-x------ 1 mark mark 64 2011-08-30 10:58 6 ->
-
Re: Tika leaves files openMichael McCandless 2011-08-30, 17:25
Well, that's a high-level entry point into Tika... the question is,
inside that method, which parser was invoked. Were there any exceptions in your run? Mike McCandless http://blog.mikemccandless.com On Tue, Aug 30, 2011 at 12:20 PM, Mark Kerzner <[EMAIL PROTECTED]> wrote: > Yes, I know the precise line (from stepping through in the debugger) > String text = tika.parseToString(new FileInputStream(new File(fileName)), > metadata); > Thank you, > Mark > > On Tue, Aug 30, 2011 at 11:15 AM, Michael McCandless > <[EMAIL PROTECTED]> wrote: >> >> Hmm any idea which document types are leading to the open files? >> >> Or, did you hit any exceptions while parsing the docs? Might help us >> narrow down which parser isn't closing its temp file... >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> On Tue, Aug 30, 2011 at 12:07 PM, Mark Kerzner <[EMAIL PROTECTED]> >> wrote: >> > Hi, >> > I am using the tika-app-1.0-SNAPSHOT.jar from 08/02, and it leaves some >> > files open, as you can see below. Once I parse enough files, I get a >> > "too >> > many files open" error. I used the snapshot because of a feature that I >> > had >> > there (don't remember which one right now). >> > Any advice? >> > Thank you, >> > Mark >> > >> > >> > mark@mark-desktop:/proc$ ls -l 27933/fd >> > total 0 >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 0 -> pipe:[19113179] >> > l-wx------ 1 mark mark 64 2011-08-30 10:58 1 -> pipe:[19113180] >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 10 -> >> > /home/mark/NetBeansProjects/FreeEed/lib/jackson-core-asl-1.5.2.jar >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 11 -> >> > /home/mark/NetBeansProjects/FreeEed/lib/jackson-mapper-asl-1.5.2.jar >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 12 -> >> > /home/mark/NetBeansProjects/FreeEed/lib/commons-configuration-1.6.jar >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 13 -> >> > /home/mark/NetBeansProjects/FreeEed/lib/tika-app-1.0-SNAPSHOT.jar >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 14 -> >> > /home/mark/NetBeansProjects/FreeEed/lib/commons-lang-2.6.jar >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 15 -> >> > /home/mark/NetBeansProjects/FreeEed/lib/commons-collections-3.2.1.jar >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 16 -> >> > /home/mark/NetBeansProjects/FreeEed/lib/commons-digester-2.1.jar >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 17 -> >> > /home/mark/NetBeansProjects/FreeEed/lib/lucene-core-3.0.3.jar >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 18 -> >> > /home/mark/NetBeansProjects/FreeEed/lib/junit-4.8.2.jar >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 19 -> >> > /home/mark/NetBeansProjects/FreeEed/lib/guava-r09.jar >> > l-wx------ 1 mark mark 64 2011-08-30 10:58 2 -> pipe:[19113181] >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 20 -> >> > >> > /home/mark/NetBeansProjects/FreeEed/lib/truezip-samples-7.3-rc-1-jar-with-dependencies.jar >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 21 -> >> > /usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/jce.jar >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 22 -> >> > /home/mark/NetBeansProjects/FreeEed/freeeed_output/staging/inventory >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 23 -> >> > >> > /home/mark/NetBeansProjects/FreeEed/freeeed_output/staging/input00001.zip >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 25 -> /dev/random >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 26 -> /dev/urandom >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 27 -> >> > /usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/ext/sunpkcs11.jar >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 28 -> >> > /tmp/apache-tika-363283955479395764.tmp (deleted) >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 29 -> >> > /tmp/apache-tika-363283955479395764.tmp (deleted) >> > l-wx------ 1 mark mark 64 2011-08-30 10:58 3 -> >> > /usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/rt.jar >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 30 -> socket:[19118543] >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 31 -> >> > /usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/resources.jar
-
Re: Tika leaves files openNick Burch 2011-08-30, 17:26
On Tue, 30 Aug 2011, Mark Kerzner wrote:
> String text = tika.parseToString(new FileInputStream(new File(fileName)), > metadata); Is that in your code our Tika? If you open a FileInputStream, then you yourself need to close it too Also, if you have a File, you're better off wrapping it in a TikaInputStream rather than a FileInputStream, as some parsers prefer a File and Tika can then use that Nick
-
Re: Tika leaves files openMark Kerzner 2011-08-30, 17:27
No, no exception. I am parsing *.eml files. I get them out of readpst.
Mark On Tue, Aug 30, 2011 at 12:25 PM, Michael McCandless < [EMAIL PROTECTED]> wrote: > Well, that's a high-level entry point into Tika... the question is, > inside that method, which parser was invoked. > > Were there any exceptions in your run? > > Mike McCandless > > http://blog.mikemccandless.com > > On Tue, Aug 30, 2011 at 12:20 PM, Mark Kerzner <[EMAIL PROTECTED]> > wrote: > > Yes, I know the precise line (from stepping through in the debugger) > > String text = tika.parseToString(new FileInputStream(new File(fileName)), > > metadata); > > Thank you, > > Mark > > > > On Tue, Aug 30, 2011 at 11:15 AM, Michael McCandless > > <[EMAIL PROTECTED]> wrote: > >> > >> Hmm any idea which document types are leading to the open files? > >> > >> Or, did you hit any exceptions while parsing the docs? Might help us > >> narrow down which parser isn't closing its temp file... > >> > >> Mike McCandless > >> > >> http://blog.mikemccandless.com > >> > >> On Tue, Aug 30, 2011 at 12:07 PM, Mark Kerzner <[EMAIL PROTECTED]> > >> wrote: > >> > Hi, > >> > I am using the tika-app-1.0-SNAPSHOT.jar from 08/02, and it leaves > some > >> > files open, as you can see below. Once I parse enough files, I get a > >> > "too > >> > many files open" error. I used the snapshot because of a feature that > I > >> > had > >> > there (don't remember which one right now). > >> > Any advice? > >> > Thank you, > >> > Mark > >> > > >> > > >> > mark@mark-desktop:/proc$ ls -l 27933/fd > >> > total 0 > >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 0 -> pipe:[19113179] > >> > l-wx------ 1 mark mark 64 2011-08-30 10:58 1 -> pipe:[19113180] > >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 10 -> > >> > /home/mark/NetBeansProjects/FreeEed/lib/jackson-core-asl-1.5.2.jar > >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 11 -> > >> > /home/mark/NetBeansProjects/FreeEed/lib/jackson-mapper-asl-1.5.2.jar > >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 12 -> > >> > /home/mark/NetBeansProjects/FreeEed/lib/commons-configuration-1.6.jar > >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 13 -> > >> > /home/mark/NetBeansProjects/FreeEed/lib/tika-app-1.0-SNAPSHOT.jar > >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 14 -> > >> > /home/mark/NetBeansProjects/FreeEed/lib/commons-lang-2.6.jar > >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 15 -> > >> > /home/mark/NetBeansProjects/FreeEed/lib/commons-collections-3.2.1.jar > >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 16 -> > >> > /home/mark/NetBeansProjects/FreeEed/lib/commons-digester-2.1.jar > >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 17 -> > >> > /home/mark/NetBeansProjects/FreeEed/lib/lucene-core-3.0.3.jar > >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 18 -> > >> > /home/mark/NetBeansProjects/FreeEed/lib/junit-4.8.2.jar > >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 19 -> > >> > /home/mark/NetBeansProjects/FreeEed/lib/guava-r09.jar > >> > l-wx------ 1 mark mark 64 2011-08-30 10:58 2 -> pipe:[19113181] > >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 20 -> > >> > > >> > > /home/mark/NetBeansProjects/FreeEed/lib/truezip-samples-7.3-rc-1-jar-with-dependencies.jar > >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 21 -> > >> > /usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/jce.jar > >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 22 -> > >> > /home/mark/NetBeansProjects/FreeEed/freeeed_output/staging/inventory > >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 23 -> > >> > > >> > > /home/mark/NetBeansProjects/FreeEed/freeeed_output/staging/input00001.zip > >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 25 -> /dev/random > >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 26 -> /dev/urandom > >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 27 -> > >> > /usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/ext/sunpkcs11.jar > >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 28 -> > >> > /tmp/apache-tika-363283955479395764.tmp (deleted) > >> > lr-x------ 1 mark mark 64 2011-08-30 10:58 29 ->
-
Re: Tika leaves files openMark Kerzner 2011-08-30, 17:36
Nick,
the documentation specifically says that tika closes this input stream. I used to close it myself, but having read this documentation page, took this closing out. I will try TikaInputStream, to see if this fixes the problem. Mark On Tue, Aug 30, 2011 at 12:26 PM, Nick Burch <[EMAIL PROTECTED]>wrote: > On Tue, 30 Aug 2011, Mark Kerzner wrote: > >> String text = tika.parseToString(new FileInputStream(new File(fileName)), >> metadata); >> > > Is that in your code our Tika? > > If you open a FileInputStream, then you yourself need to close it too > > Also, if you have a File, you're better off wrapping it in a > TikaInputStream rather than a FileInputStream, as some parsers prefer a File > and Tika can then use that > > Nick >
-
Re: Tika leaves files openMark Kerzner 2011-08-30, 17:51
I tried TikaInputStream, and I also close it, but I still get the same
behavior. You can see the deleted but open files in the attached screen image Mark On Tue, Aug 30, 2011 at 12:36 PM, Mark Kerzner <[EMAIL PROTECTED]>wrote: > Nick, > > the documentation specifically says that tika closes this input stream. I > used to close it myself, but having read this documentation page, took this > closing out. > > I will try TikaInputStream, to see if this fixes the problem. > > Mark > > > On Tue, Aug 30, 2011 at 12:26 PM, Nick Burch <[EMAIL PROTECTED]>wrote: > >> On Tue, 30 Aug 2011, Mark Kerzner wrote: >> >>> String text = tika.parseToString(new FileInputStream(new File(fileName)), >>> metadata); >>> >> >> Is that in your code our Tika? >> >> If you open a FileInputStream, then you yourself need to close it too >> >> Also, if you have a File, you're better off wrapping it in a >> TikaInputStream rather than a FileInputStream, as some parsers prefer a File >> and Tika can then use that >> >> Nick >> > >
-
Re: Tika leaves files openMichael McCandless 2011-08-30, 18:01
I think Tika.parseToString (static sugar method) closes the
InputStream for you, while the Parser.parse method does not? Kinda confusing! Mark, do you have specific docs that show this? Then we can boil this down to a test case... Mike McCandless http://blog.mikemccandless.com On Tue, Aug 30, 2011 at 1:51 PM, Mark Kerzner <[EMAIL PROTECTED]> wrote: > I tried TikaInputStream, and I also close it, but I still get the same > behavior. You can see the deleted but open files in the attached screen > image > Mark > > On Tue, Aug 30, 2011 at 12:36 PM, Mark Kerzner <[EMAIL PROTECTED]> > wrote: >> >> Nick, >> the documentation specifically says that tika closes this input stream. I >> used to close it myself, but having read this documentation page, took this >> closing out. >> I will try TikaInputStream, to see if this fixes the problem. >> Mark >> >> On Tue, Aug 30, 2011 at 12:26 PM, Nick Burch <[EMAIL PROTECTED]> >> wrote: >>> >>> On Tue, 30 Aug 2011, Mark Kerzner wrote: >>>> >>>> String text = tika.parseToString(new FileInputStream(new >>>> File(fileName)), >>>> metadata); >>> >>> Is that in your code our Tika? >>> >>> If you open a FileInputStream, then you yourself need to close it too >>> >>> Also, if you have a File, you're better off wrapping it in a >>> TikaInputStream rather than a FileInputStream, as some parsers prefer a File >>> and Tika can then use that >>> >>> Nick >> > >
-
Re: Tika leaves files openMichael McCandless 2011-08-30, 18:57
One thing I noticed is, in TemporaryFiles.dispose, we call
file.delete, which returns false if the file could not be deleted. On Windows this will fail (return false) if we still have the file open somewhere, or if it had already been deleted. So I think we should add an assert that the return value is true (file was successfully deleted)? This way when we run tests on Windows we'll see tests fail if a parser didn't close the opened temp files... Mike McCandless http://blog.mikemccandless.com On Tue, Aug 30, 2011 at 2:01 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > I think Tika.parseToString (static sugar method) closes the > InputStream for you, while the Parser.parse method does not? Kinda > confusing! > > Mark, do you have specific docs that show this? Then we can boil this > down to a test case... > > Mike McCandless > > http://blog.mikemccandless.com > > On Tue, Aug 30, 2011 at 1:51 PM, Mark Kerzner <[EMAIL PROTECTED]> wrote: >> I tried TikaInputStream, and I also close it, but I still get the same >> behavior. You can see the deleted but open files in the attached screen >> image >> Mark >> >> On Tue, Aug 30, 2011 at 12:36 PM, Mark Kerzner <[EMAIL PROTECTED]> >> wrote: >>> >>> Nick, >>> the documentation specifically says that tika closes this input stream. I >>> used to close it myself, but having read this documentation page, took this >>> closing out. >>> I will try TikaInputStream, to see if this fixes the problem. >>> Mark >>> >>> On Tue, Aug 30, 2011 at 12:26 PM, Nick Burch <[EMAIL PROTECTED]> >>> wrote: >>>> >>>> On Tue, 30 Aug 2011, Mark Kerzner wrote: >>>>> >>>>> String text = tika.parseToString(new FileInputStream(new >>>>> File(fileName)), >>>>> metadata); >>>> >>>> Is that in your code our Tika? >>>> >>>> If you open a FileInputStream, then you yourself need to close it too >>>> >>>> Also, if you have a File, you're better off wrapping it in a >>>> TikaInputStream rather than a FileInputStream, as some parsers prefer a File >>>> and Tika can then use that >>>> >>>> Nick >>> >> >> >
-
Re: Tika leaves files openMark Kerzner 2011-08-30, 19:04
Mike,
I've isolated the problem. Here is my code, import java.io.File; import java.io.IOException; import org.apache.tika.Tika; import org.apache.tika.exception.TikaException; import org.apache.tika.io.TikaInputStream; import org.apache.tika.metadata.Metadata; /** * This class is separate to have all Tika-related stuff in a one place * It may contain more parsing specifics later on */ public class DocumentParser { private static DocumentParser instance = new DocumentParser(); private Tika tika; public static DocumentParser getInstance() { return instance; } private DocumentParser() { tika = new Tika(); tika.setMaxStringLength(10 * 1024 * 1024); } public void parse(String fileName, Metadata metadata) { try { // the given input stream is closed by the parseToString method (see Tike documentation) TikaInputStream tikaInputStream = TikaInputStream.get(new File(fileName)); String text = tika.parseToString(tikaInputStream, metadata); metadata.set(DocumentMetadataKeys.DOCUMENT_TEXT, text); tikaInputStream.close(); } catch (Exception e) { e.printStackTrace(System.out); } } public static void main(String argv[]) { Metadata metadata = new Metadata(); String fileName = "7"; getInstance().parse(fileName, metadata); System.out.println(metadata); } } and I am attaching a input file. It is an email out of public Enron email corpus. Thank you, Mark On Tue, Aug 30, 2011 at 1:57 PM, Michael McCandless < [EMAIL PROTECTED]> wrote: > One thing I noticed is, in TemporaryFiles.dispose, we call > file.delete, which returns false if the file could not be deleted. > > On Windows this will fail (return false) if we still have the file > open somewhere, or if it had already been deleted. > > So I think we should add an assert that the return value is true (file > was successfully deleted)? This way when we run tests on Windows > we'll see tests fail if a parser didn't close the opened temp files... > > Mike McCandless > > http://blog.mikemccandless.com > > On Tue, Aug 30, 2011 at 2:01 PM, Michael McCandless > <[EMAIL PROTECTED]> wrote: > > I think Tika.parseToString (static sugar method) closes the > > InputStream for you, while the Parser.parse method does not? Kinda > > confusing! > > > > Mark, do you have specific docs that show this? Then we can boil this > > down to a test case... > > > > Mike McCandless > > > > http://blog.mikemccandless.com > > > > On Tue, Aug 30, 2011 at 1:51 PM, Mark Kerzner <[EMAIL PROTECTED]> > wrote: > >> I tried TikaInputStream, and I also close it, but I still get the same > >> behavior. You can see the deleted but open files in the attached screen > >> image > >> Mark > >> > >> On Tue, Aug 30, 2011 at 12:36 PM, Mark Kerzner <[EMAIL PROTECTED]> > >> wrote: > >>> > >>> Nick, > >>> the documentation specifically says that tika closes this input stream. > I > >>> used to close it myself, but having read this documentation page, took > this > >>> closing out. > >>> I will try TikaInputStream, to see if this fixes the problem. > >>> Mark > >>> > >>> On Tue, Aug 30, 2011 at 12:26 PM, Nick Burch <[EMAIL PROTECTED]> > >>> wrote: > >>>> > >>>> On Tue, 30 Aug 2011, Mark Kerzner wrote: > >>>>> > >>>>> String text = tika.parseToString(new FileInputStream(new > >>>>> File(fileName)), > >>>>> metadata); > >>>> > >>>> Is that in your code our Tika? > >>>> > >>>> If you open a FileInputStream, then you yourself need to close it too > >>>> > >>>> Also, if you have a File, you're better off wrapping it in a > >>>> TikaInputStream rather than a FileInputStream, as some parsers prefer > a File > >>>> and Tika can then use that > >>>> > >>>> Nick > >>> > >> > >> > > >
-
Re: Tika leaves files openMark Kerzner 2011-08-30, 19:15
Okay, I got more info:
this happens when the eml files has attachments. As we know, Tika extracts text from the attachments (which is great, this is what I need), but it seems like it does not close those attachments, although it does delete them. Mark On Tue, Aug 30, 2011 at 2:04 PM, Mark Kerzner <[EMAIL PROTECTED]> wrote: > Mike, > > I've isolated the problem. Here is my code, > > import java.io.File; > import java.io.IOException; > import org.apache.tika.Tika; > import org.apache.tika.exception.TikaException; > import org.apache.tika.io.TikaInputStream; > import org.apache.tika.metadata.Metadata; > > /** > * This class is separate to have all Tika-related stuff in a one place > * It may contain more parsing specifics later on > */ > public class DocumentParser { > private static DocumentParser instance = new DocumentParser(); > private Tika tika; > > public static DocumentParser getInstance() { > return instance; > } > > private DocumentParser() { > tika = new Tika(); > tika.setMaxStringLength(10 * 1024 * 1024); > } > public void parse(String fileName, Metadata metadata) { > try { > // the given input stream is closed by the parseToString method > (see Tike documentation) > TikaInputStream tikaInputStream = TikaInputStream.get(new > File(fileName)); > String text = tika.parseToString(tikaInputStream, metadata); > > metadata.set(DocumentMetadataKeys.DOCUMENT_TEXT, text); > > tikaInputStream.close(); > } > catch (Exception e) { > e.printStackTrace(System.out); > } > } > public static void main(String argv[]) { > Metadata metadata = new Metadata(); > String fileName = "7"; > getInstance().parse(fileName, metadata); > System.out.println(metadata); > } > } > > and I am attaching a input file. It is an email out of public Enron email > corpus. > > Thank you, > Mark > > > On Tue, Aug 30, 2011 at 1:57 PM, Michael McCandless < > [EMAIL PROTECTED]> wrote: > >> One thing I noticed is, in TemporaryFiles.dispose, we call >> file.delete, which returns false if the file could not be deleted. >> >> On Windows this will fail (return false) if we still have the file >> open somewhere, or if it had already been deleted. >> >> So I think we should add an assert that the return value is true (file >> was successfully deleted)? This way when we run tests on Windows >> we'll see tests fail if a parser didn't close the opened temp files... >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> On Tue, Aug 30, 2011 at 2:01 PM, Michael McCandless >> <[EMAIL PROTECTED]> wrote: >> > I think Tika.parseToString (static sugar method) closes the >> > InputStream for you, while the Parser.parse method does not? Kinda >> > confusing! >> > >> > Mark, do you have specific docs that show this? Then we can boil this >> > down to a test case... >> > >> > Mike McCandless >> > >> > http://blog.mikemccandless.com >> > >> > On Tue, Aug 30, 2011 at 1:51 PM, Mark Kerzner <[EMAIL PROTECTED]> >> wrote: >> >> I tried TikaInputStream, and I also close it, but I still get the same >> >> behavior. You can see the deleted but open files in the attached screen >> >> image >> >> Mark >> >> >> >> On Tue, Aug 30, 2011 at 12:36 PM, Mark Kerzner <[EMAIL PROTECTED]> >> >> wrote: >> >>> >> >>> Nick, >> >>> the documentation specifically says that tika closes this input >> stream. I >> >>> used to close it myself, but having read this documentation page, took >> this >> >>> closing out. >> >>> I will try TikaInputStream, to see if this fixes the problem. >> >>> Mark >> >>> >> >>> On Tue, Aug 30, 2011 at 12:26 PM, Nick Burch <[EMAIL PROTECTED] >> > >> >>> wrote: >> >>>> >> >>>> On Tue, 30 Aug 2011, Mark Kerzner wrote: >> >>>>> >> >>>>> String text = tika.parseToString(new FileInputStream(new >> >>>>> File(fileName)), >> >>>>> metadata);
-
Re: Tika leaves files openJukka Zitting 2011-08-30, 21:19
Hi,
On Tue, Aug 30, 2011 at 9:15 PM, Mark Kerzner <[EMAIL PROTECTED]> wrote: > this happens when the eml files has attachments. As we know, Tika extracts > text from the attachments (which is great, this is what I need), but it > seems like it does not close those attachments, although it does delete > them. Yes, I think you're right. I believe the problem here is the openContainer field within TikaInputStream where the container-aware type detection code stores the already opened container (in this case an NPOIFSFileSystem object) to avoid having to duplicate the parsing work. Unfortunately there's no mechanism (except garbage collection by the JVM) by which the container object gets properly disposed when it's no longer needed, and I believe this is what's preventing the underlying temporary files from getting reclaimed. Perhaps we should extend the current TemporaryFiles mechanism to a more generic TemporaryResources class that could also take care of properly disposing also non-file resources associated with a TikaInputStream instance. BR, Jukka Zitting
-
Re: Tika leaves files openMark Kerzner 2011-08-30, 21:40
Jukka,
as a user of Tika, I would welcome this enhancement. Actually, the files are being deleted, it's just the the file handles in the java code are not being closed. For the time being, is there a workaround that I could use? Right now, this is a show-stopper for my application (open source eDiscovery - FreeEed<http://freeeed.org/> ). Thank you, Mark On Tue, Aug 30, 2011 at 4:19 PM, Jukka Zitting <[EMAIL PROTECTED]>wrote: > Hi, > > On Tue, Aug 30, 2011 at 9:15 PM, Mark Kerzner <[EMAIL PROTECTED]> > wrote: > > this happens when the eml files has attachments. As we know, Tika > extracts > > text from the attachments (which is great, this is what I need), but it > > seems like it does not close those attachments, although it does delete > > them. > > Yes, I think you're right. I believe the problem here is the > openContainer field within TikaInputStream where the container-aware > type detection code stores the already opened container (in this case > an NPOIFSFileSystem object) to avoid having to duplicate the parsing > work. Unfortunately there's no mechanism (except garbage collection by > the JVM) by which the container object gets properly disposed when > it's no longer needed, and I believe this is what's preventing the > underlying temporary files from getting reclaimed. > > Perhaps we should extend the current TemporaryFiles mechanism to a > more generic TemporaryResources class that could also take care of > properly disposing also non-file resources associated with a > TikaInputStream instance. > > BR, > > Jukka Zitting >
-
Re: Tika leaves files openNick Burch 2011-08-30, 21:49
On Tue, 30 Aug 2011, Jukka Zitting wrote:
> Yes, I think you're right. I believe the problem here is the > openContainer field within TikaInputStream where the container-aware > type detection code stores the already opened container (in this case an > NPOIFSFileSystem object) to avoid having to duplicate the parsing work. > Unfortunately there's no mechanism (except garbage collection by the > JVM) by which the container object gets properly disposed when it's no > longer needed, and I believe this is what's preventing the underlying > temporary files from getting reclaimed. I thought we closed the open container when we closed the TikaInputStream though? There certainly looks to be suitable code for doing that... Nick
-
Re: Tika leaves files openNick Burch 2011-08-30, 22:25
On Tue, 30 Aug 2011, Mark Kerzner wrote:
> For the time being, is there a workaround that I could use? Right now, this > is a show-stopper for my application Any chance you could do a svn checkout, build, and try with that? After my last email, I have a nagging feeling about the timing of making NPOIFS implement closable... I upgraded the POI dependency earlier today, so it's worth checking with Nick
-
Re: Tika leaves files openMark Kerzner 2011-08-31, 00:08
SUCCESS!!!!
Nick, not only it closes all files, but it feels to work much faster (I mean, in the debuggers, real performance may vary :) Thank you everybody for today's productive discussion and help. Mark PS. If anyone every gets sued, they should use FreeEed for eDiscovery and come back a winner! On Tue, Aug 30, 2011 at 5:25 PM, Nick Burch <[EMAIL PROTECTED]> wrote: > On Tue, 30 Aug 2011, Mark Kerzner wrote: > >> For the time being, is there a workaround that I could use? Right now, >> this >> is a show-stopper for my application >> > > Any chance you could do a svn checkout, build, and try with that? After my > last email, I have a nagging feeling about the timing of making NPOIFS > implement closable... I upgraded the POI dependency earlier today, so it's > worth checking with > > Nick >
-
Re: Tika leaves files openMark Kerzner 2011-08-31, 00:20
I do get an error in the build, but it create the core snapshot jar anyway.
Should I be concerned? Thank you, Mark [INFO] ------------------------------------------------------------- [ERROR] COMPILATION ERROR : [INFO] ------------------------------------------------------------- [ERROR] /home/mark/ThirdParty/tika-source/tika-site/tika-parsers/src/main/java/org/apache/tika/parser/image/ImageMetadataExtractor.java:[89,34] error: cannot access JPEGDecodeParam [INFO] 1 error [INFO] ------------------------------------------------------------- [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Apache Tika parent ................................ SUCCESS [32.118s] [INFO] Apache Tika core .................................. SUCCESS [15.994s] [INFO] Apache Tika parsers ............................... FAILURE [57.498s] [INFO] Apache Tika application ........................... SKIPPED [INFO] Apache Tika OSGi bundle ........................... SKIPPED [INFO] Apache Tika ....................................... SKIPPED [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 2:23.922s [INFO] Finished at: Tue Aug 30 18:52:20 CDT 2011 [INFO] Final Memory: 28M/156M [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.3.2:compile (default-compile) on project tika-parsers: Compilation failure [ERROR] /home/mark/ThirdParty/tika-source/tika-site/tika-parsers/src/main/java/org/apache/tika/parser/image/ImageMetadataExtractor.java:[89,34] error: cannot access JPEGDecodeParam [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn <goals> -rf :tika-parsers On Tue, Aug 30, 2011 at 7:08 PM, Mark Kerzner <[EMAIL PROTECTED]> wrote: > SUCCESS!!!! > > Nick, not only it closes all files, but it feels to work much faster (I > mean, in the debuggers, real performance may vary :) > > Thank you everybody for today's productive discussion and help. > > Mark > > PS. If anyone every gets sued, they should use FreeEed for eDiscovery and > come back a winner! > > > On Tue, Aug 30, 2011 at 5:25 PM, Nick Burch <[EMAIL PROTECTED]>wrote: > >> On Tue, 30 Aug 2011, Mark Kerzner wrote: >> >>> For the time being, is there a workaround that I could use? Right now, >>> this >>> is a show-stopper for my application >>> >> >> Any chance you could do a svn checkout, build, and try with that? After my >> last email, I have a nagging feeling about the timing of making NPOIFS >> implement closable... I upgraded the POI dependency earlier today, so it's >> worth checking with >> >> Nick >> > >
-
Re: Tika leaves files openMark Kerzner 2011-08-31, 02:10
Well,
that error WAS important. It compiles and pretends to work, but does not extract any text or metadata (that's why it is so fast!). Thank you, Mark On Tue, Aug 30, 2011 at 7:20 PM, Mark Kerzner <[EMAIL PROTECTED]> wrote: > I do get an error in the build, but it create the core snapshot jar anyway. > Should I be concerned? > > Thank you, > Mark > > [INFO] ------------------------------------------------------------- > [ERROR] COMPILATION ERROR : > [INFO] ------------------------------------------------------------- > [ERROR] > /home/mark/ThirdParty/tika-source/tika-site/tika-parsers/src/main/java/org/apache/tika/parser/image/ImageMetadataExtractor.java:[89,34] > error: cannot access JPEGDecodeParam > [INFO] 1 error > [INFO] ------------------------------------------------------------- > [INFO] > ------------------------------------------------------------------------ > [INFO] Reactor Summary: > [INFO] > [INFO] Apache Tika parent ................................ SUCCESS > [32.118s] > [INFO] Apache Tika core .................................. SUCCESS > [15.994s] > [INFO] Apache Tika parsers ............................... FAILURE > [57.498s] > [INFO] Apache Tika application ........................... SKIPPED > [INFO] Apache Tika OSGi bundle ........................... SKIPPED > [INFO] Apache Tika ....................................... SKIPPED > [INFO] > ------------------------------------------------------------------------ > [INFO] BUILD FAILURE > [INFO] > ------------------------------------------------------------------------ > [INFO] Total time: 2:23.922s > [INFO] Finished at: Tue Aug 30 18:52:20 CDT 2011 > [INFO] Final Memory: 28M/156M > [INFO] > ------------------------------------------------------------------------ > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-compiler-plugin:2.3.2:compile > (default-compile) on project tika-parsers: Compilation failure > [ERROR] > /home/mark/ThirdParty/tika-source/tika-site/tika-parsers/src/main/java/org/apache/tika/parser/image/ImageMetadataExtractor.java:[89,34] > error: cannot access JPEGDecodeParam > [ERROR] -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with the -e > switch. > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > [ERROR] > [ERROR] For more information about the errors and possible solutions, > please read the following articles: > [ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException > [ERROR] > [ERROR] After correcting the problems, you can resume the build with the > command > [ERROR] mvn <goals> -rf :tika-parsers > > > On Tue, Aug 30, 2011 at 7:08 PM, Mark Kerzner <[EMAIL PROTECTED]>wrote: > >> SUCCESS!!!! >> >> Nick, not only it closes all files, but it feels to work much faster (I >> mean, in the debuggers, real performance may vary :) >> >> Thank you everybody for today's productive discussion and help. >> >> Mark >> >> PS. If anyone every gets sued, they should use FreeEed for eDiscovery and >> come back a winner! >> >> >> On Tue, Aug 30, 2011 at 5:25 PM, Nick Burch <[EMAIL PROTECTED]>wrote: >> >>> On Tue, 30 Aug 2011, Mark Kerzner wrote: >>> >>>> For the time being, is there a workaround that I could use? Right now, >>>> this >>>> is a show-stopper for my application >>>> >>> >>> Any chance you could do a svn checkout, build, and try with that? After >>> my last email, I have a nagging feeling about the timing of making NPOIFS >>> implement closable... I upgraded the POI dependency earlier today, so it's >>> worth checking with >>> >>> Nick >>> >> >> >
-
Re: Tika leaves files openMark Kerzner 2011-08-31, 02:43
Okay,
the error was there because of Java 7. I heard about some weird Java 7 error and Lucene. Back to Java 6, and everything works fine: builds, extracts, closes files. Thank you, Mark On Tue, Aug 30, 2011 at 9:10 PM, Mark Kerzner <[EMAIL PROTECTED]> wrote: > Well, > > that error WAS important. It compiles and pretends to work, but does not > extract any text or metadata (that's why it is so fast!). > > Thank you, > Mark > > > On Tue, Aug 30, 2011 at 7:20 PM, Mark Kerzner <[EMAIL PROTECTED]>wrote: > >> I do get an error in the build, but it create the core snapshot jar >> anyway. Should I be concerned? >> >> Thank you, >> Mark >> >> [INFO] ------------------------------------------------------------- >> [ERROR] COMPILATION ERROR : >> [INFO] ------------------------------------------------------------- >> [ERROR] >> /home/mark/ThirdParty/tika-source/tika-site/tika-parsers/src/main/java/org/apache/tika/parser/image/ImageMetadataExtractor.java:[89,34] >> error: cannot access JPEGDecodeParam >> [INFO] 1 error >> [INFO] ------------------------------------------------------------- >> [INFO] >> ------------------------------------------------------------------------ >> [INFO] Reactor Summary: >> [INFO] >> [INFO] Apache Tika parent ................................ SUCCESS >> [32.118s] >> [INFO] Apache Tika core .................................. SUCCESS >> [15.994s] >> [INFO] Apache Tika parsers ............................... FAILURE >> [57.498s] >> [INFO] Apache Tika application ........................... SKIPPED >> [INFO] Apache Tika OSGi bundle ........................... SKIPPED >> [INFO] Apache Tika ....................................... SKIPPED >> [INFO] >> ------------------------------------------------------------------------ >> [INFO] BUILD FAILURE >> [INFO] >> ------------------------------------------------------------------------ >> [INFO] Total time: 2:23.922s >> [INFO] Finished at: Tue Aug 30 18:52:20 CDT 2011 >> [INFO] Final Memory: 28M/156M >> [INFO] >> ------------------------------------------------------------------------ >> [ERROR] Failed to execute goal >> org.apache.maven.plugins:maven-compiler-plugin:2.3.2:compile >> (default-compile) on project tika-parsers: Compilation failure >> [ERROR] >> /home/mark/ThirdParty/tika-source/tika-site/tika-parsers/src/main/java/org/apache/tika/parser/image/ImageMetadataExtractor.java:[89,34] >> error: cannot access JPEGDecodeParam >> [ERROR] -> [Help 1] >> [ERROR] >> [ERROR] To see the full stack trace of the errors, re-run Maven with the >> -e switch. >> [ERROR] Re-run Maven using the -X switch to enable full debug logging. >> [ERROR] >> [ERROR] For more information about the errors and possible solutions, >> please read the following articles: >> [ERROR] [Help 1] >> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException >> [ERROR] >> [ERROR] After correcting the problems, you can resume the build with the >> command >> [ERROR] mvn <goals> -rf :tika-parsers >> >> >> On Tue, Aug 30, 2011 at 7:08 PM, Mark Kerzner <[EMAIL PROTECTED]>wrote: >> >>> SUCCESS!!!! >>> >>> Nick, not only it closes all files, but it feels to work much faster (I >>> mean, in the debuggers, real performance may vary :) >>> >>> Thank you everybody for today's productive discussion and help. >>> >>> Mark >>> >>> PS. If anyone every gets sued, they should use FreeEed for eDiscovery and >>> come back a winner! >>> >>> >>> On Tue, Aug 30, 2011 at 5:25 PM, Nick Burch <[EMAIL PROTECTED]>wrote: >>> >>>> On Tue, 30 Aug 2011, Mark Kerzner wrote: >>>> >>>>> For the time being, is there a workaround that I could use? Right now, >>>>> this >>>>> is a show-stopper for my application >>>>> >>>> >>>> Any chance you could do a svn checkout, build, and try with that? After >>>> my last email, I have a nagging feeling about the timing of making NPOIFS >>>> implement closable... I upgraded the POI dependency earlier today, so it's >>>> worth checking with >>>> >>>> Nick >>>> >>> >>
-
Re: Tika leaves files openMark Kerzner 2011-08-31, 04:27
Guys,
the errors show up again. I already thanked everybody<http://shmsoft.blogspot.com/2011/08/freeeed-processing-is-stable.html>! I wonder how I can make good on this :) I think that in ParserContainerExtractor.parse you need to associated TikaInputStream with temporary files both ways: from the stream you already can find the file, but you should be able to find the stream from the file. Then, when you are deleting the file, you can also close the associated stream. Something like public void parse( InputStream stream, ContentHandler ignored, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException { TemporaryFiles tmp = new TemporaryFiles(); try { TikaInputStream tis = TikaInputStream.get(stream, tmp); // Figure out what we have to process String filename = metadata.get(Metadata.RESOURCE_NAME_KEY); MediaType type = detector.detect(tis, metadata); if (extractor == null) { // Let the handler process the embedded resource handler.handle(filename, type, tis); } else { // Use a temporary file to process the stream twice File file = tis.getFile(); // Let the handler process the embedded resource handler.handle(filename, type, TikaInputStream.get(file)); // Recurse extractor.extract(tis, extractor, handler); } } finally { tmp.closeStreams(); tmp.dispose(); } } Thank you, Mark On Tue, Aug 30, 2011 at 9:43 PM, Mark Kerzner <[EMAIL PROTECTED]> wrote: > Okay, > > the error was there because of Java 7. I heard about some weird Java 7 > error and Lucene. Back to Java 6, and everything works fine: builds, > extracts, closes files. > > Thank you, > Mark > > > On Tue, Aug 30, 2011 at 9:10 PM, Mark Kerzner <[EMAIL PROTECTED]>wrote: > >> Well, >> >> that error WAS important. It compiles and pretends to work, but does not >> extract any text or metadata (that's why it is so fast!). >> >> Thank you, >> Mark >> >> >> On Tue, Aug 30, 2011 at 7:20 PM, Mark Kerzner <[EMAIL PROTECTED]>wrote: >> >>> I do get an error in the build, but it create the core snapshot jar >>> anyway. Should I be concerned? >>> >>> Thank you, >>> Mark >>> >>> [INFO] ------------------------------------------------------------- >>> [ERROR] COMPILATION ERROR : >>> [INFO] ------------------------------------------------------------- >>> [ERROR] >>> /home/mark/ThirdParty/tika-source/tika-site/tika-parsers/src/main/java/org/apache/tika/parser/image/ImageMetadataExtractor.java:[89,34] >>> error: cannot access JPEGDecodeParam >>> [INFO] 1 error >>> [INFO] ------------------------------------------------------------- >>> [INFO] >>> ------------------------------------------------------------------------ >>> [INFO] Reactor Summary: >>> [INFO] >>> [INFO] Apache Tika parent ................................ SUCCESS >>> [32.118s] >>> [INFO] Apache Tika core .................................. SUCCESS >>> [15.994s] >>> [INFO] Apache Tika parsers ............................... FAILURE >>> [57.498s] >>> [INFO] Apache Tika application ........................... SKIPPED >>> [INFO] Apache Tika OSGi bundle ........................... SKIPPED >>> [INFO] Apache Tika ....................................... SKIPPED >>> [INFO] >>> ------------------------------------------------------------------------ >>> [INFO] BUILD FAILURE >>> [INFO] >>> ------------------------------------------------------------------------ >>> [INFO] Total time: 2:23.922s >>> [INFO] Finished at: Tue Aug 30 18:52:20 CDT 2011 >>> [INFO] Final Memory: 28M/156M >>> [INFO] >>> ------------------------------------------------------------------------ >>> [ERROR] Failed to execute goal
-
Re: Tika leaves files openMark Kerzner 2011-08-31, 04:39
I started trying to implement this, but obviously it's a bit more complex
than a 10-minute job and requires knowledge of the design. Mark On Tue, Aug 30, 2011 at 11:27 PM, Mark Kerzner <[EMAIL PROTECTED]>wrote: > Guys, > > the errors show up again. I already thanked everybody<http://shmsoft.blogspot.com/2011/08/freeeed-processing-is-stable.html>! I > wonder how I can make good on this :) > > I think that in ParserContainerExtractor.parse you need to associated > TikaInputStream with temporary files both ways: from the stream you already > can find the file, but you should be able to find the stream from the file. > Then, when you are deleting the file, you can also close the associated > stream. Something like > > public void parse( > InputStream stream, ContentHandler ignored, > Metadata metadata, ParseContext context) > throws IOException, SAXException, TikaException { > TemporaryFiles tmp = new TemporaryFiles(); > try { > TikaInputStream tis = TikaInputStream.get(stream, tmp); > > // Figure out what we have to process > String filename = metadata.get(Metadata.RESOURCE_NAME_KEY); > MediaType type = detector.detect(tis, metadata); > > if (extractor == null) { > // Let the handler process the embedded resource > handler.handle(filename, type, tis); > } else { > // Use a temporary file to process the stream twice > File file = tis.getFile(); > > // Let the handler process the embedded resource > handler.handle(filename, type, > TikaInputStream.get(file)); > > // Recurse > extractor.extract(tis, extractor, handler); > } > } finally { > tmp.closeStreams(); > tmp.dispose(); > > } > } > > Thank you, > Mark > > > On Tue, Aug 30, 2011 at 9:43 PM, Mark Kerzner <[EMAIL PROTECTED]>wrote: > >> Okay, >> >> the error was there because of Java 7. I heard about some weird Java 7 >> error and Lucene. Back to Java 6, and everything works fine: builds, >> extracts, closes files. >> >> Thank you, >> Mark >> >> >> On Tue, Aug 30, 2011 at 9:10 PM, Mark Kerzner <[EMAIL PROTECTED]>wrote: >> >>> Well, >>> >>> that error WAS important. It compiles and pretends to work, but does not >>> extract any text or metadata (that's why it is so fast!). >>> >>> Thank you, >>> Mark >>> >>> >>> On Tue, Aug 30, 2011 at 7:20 PM, Mark Kerzner <[EMAIL PROTECTED]>wrote: >>> >>>> I do get an error in the build, but it create the core snapshot jar >>>> anyway. Should I be concerned? >>>> >>>> Thank you, >>>> Mark >>>> >>>> [INFO] ------------------------------------------------------------- >>>> [ERROR] COMPILATION ERROR : >>>> [INFO] ------------------------------------------------------------- >>>> [ERROR] >>>> /home/mark/ThirdParty/tika-source/tika-site/tika-parsers/src/main/java/org/apache/tika/parser/image/ImageMetadataExtractor.java:[89,34] >>>> error: cannot access JPEGDecodeParam >>>> [INFO] 1 error >>>> [INFO] ------------------------------------------------------------- >>>> [INFO] >>>> ------------------------------------------------------------------------ >>>> [INFO] Reactor Summary: >>>> [INFO] >>>> [INFO] Apache Tika parent ................................ SUCCESS >>>> [32.118s] >>>> [INFO] Apache Tika core .................................. SUCCESS >>>> [15.994s] >>>> [INFO] Apache Tika parsers ............................... FAILURE >>>> [57.498s] >>>> [INFO] Apache Tika application ........................... SKIPPED >>>> [INFO] Apache Tika OSGi bundle ........................... SKIPPED >>>> [INFO] Apache Tika ....................................... SKIPPED >>>> [INFO] >>>> ------------------------------------------------------------
-
Re: Tika leaves files openJukka Zitting 2011-08-31, 16:43
Hi,
On Tue, Aug 30, 2011 at 11:19 PM, Jukka Zitting <[EMAIL PROTECTED]> wrote: > Yes, I think you're right. I believe the problem here is the > openContainer field within TikaInputStream where the container-aware > type detection code stores the already opened container (in this case > an NPOIFSFileSystem object) to avoid having to duplicate the parsing > work. Unfortunately there's no mechanism (except garbage collection by > the JVM) by which the container object gets properly disposed when > it's no longer needed, and I believe this is what's preventing the > underlying temporary files from getting reclaimed. Actually the problem was much simpler than that. Code within the ParserContainerExtractor class creates a new TikaInputStream for processing an embedded resource, but then never closes that stream. This prevents the temporary file behind that stream from being removed on Windows. See the attached patch for a quick draft of a fix. BR, Jukka Zitting
-
RE: Tika leaves files openUwe Schindler 2011-08-31, 16:53
Hi Jukka,
For this typical try...finally code I suggest to use a pre-Java-7.0 workaround to make this behave more correct: If you do try...finally and another Exception occurs on close() in the finally block, you lose the first Exception. The new Java-7 try-with-resources block handles that fantastically (it also records all exceptions happening during closing inside the top-level exception). To achieve that with Java 5/Java 6 we added a class called IOUtils to Lucene that emulates this with only few more lines: https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/src/java/org/apache /lucene/util/IOUtils.java Code examples are in Javadocs but look very similar to try-with-resources (it mimics that without java syntax changes). The class also detects Java7 and records suppressed exceptions, otherwise ignores them completely (Java 7 support was added by me). Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: [EMAIL PROTECTED] > -----Original Message----- > From: Jukka Zitting [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, August 31, 2011 6:43 PM > To: [EMAIL PROTECTED] > Subject: Re: Tika leaves files open > > Hi, > > On Tue, Aug 30, 2011 at 11:19 PM, Jukka Zitting <[EMAIL PROTECTED]> > wrote: > > Yes, I think you're right. I believe the problem here is the > > openContainer field within TikaInputStream where the container-aware > > type detection code stores the already opened container (in this case > > an NPOIFSFileSystem object) to avoid having to duplicate the parsing > > work. Unfortunately there's no mechanism (except garbage collection by > > the JVM) by which the container object gets properly disposed when > > it's no longer needed, and I believe this is what's preventing the > > underlying temporary files from getting reclaimed. > > Actually the problem was much simpler than that. Code within the > ParserContainerExtractor class creates a new TikaInputStream for processing > an embedded resource, but then never closes that stream. > This prevents the temporary file behind that stream from being removed on > Windows. > > See the attached patch for a quick draft of a fix. > > BR, > > Jukka Zitting
-
Re: Tika leaves files openJukka Zitting 2011-08-31, 17:25
Hi,
On Wed, Aug 31, 2011 at 6:53 PM, Uwe Schindler <[EMAIL PROTECTED]> wrote: > For this typical try...finally code I suggest to use a pre-Java-7.0 > workaround to make this behave more correct: If you do try...finally and > another Exception occurs on close() in the finally block, you lose the first > Exception. Such a case is certainly possible scenario, but in my experience it practically never occurs. I've used the try { ... } finally { stream.close(); } pattern extensively for years, and I've never seen a case where information was lost because of this. So personally I consider the problem rather theoretical and would rather opt for cleaner code that avoids the extra constructs. BR, Jukka Zitting
-
RE: Tika leaves files openUwe Schindler 2011-08-31, 17:29
Except when you have to close multiple resources... Also, JDBC throws
Exceptions quite often on close :-) ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: [EMAIL PROTECTED] > -----Original Message----- > From: Jukka Zitting [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, August 31, 2011 7:25 PM > To: [EMAIL PROTECTED] > Subject: Re: Tika leaves files open > > Hi, > > On Wed, Aug 31, 2011 at 6:53 PM, Uwe Schindler <[EMAIL PROTECTED]> wrote: > > For this typical try...finally code I suggest to use a pre-Java-7.0 > > workaround to make this behave more correct: If you do try...finally > > and another Exception occurs on close() in the finally block, you lose > > the first Exception. > > Such a case is certainly possible scenario, but in my experience it practically > never occurs. I've used the try { ... } finally { stream.close(); } pattern > extensively for years, and I've never seen a case where information was lost > because of this. > > So personally I consider the problem rather theoretical and would rather opt > for cleaner code that avoids the extra constructs. > > BR, > > Jukka Zitting
-
Re: Tika leaves files openMichael McCandless 2011-09-01, 09:36
Actually Tika already has the method to do this: IOUtils.closeQuietly.
I think we should use it here? While rare for IS.close() to throw an exception, if it does, it's quite awful because it masks the original exception. It seems best to be defensive? Mike McCandless http://blog.mikemccandless.com On Wed, Aug 31, 2011 at 1:25 PM, Jukka Zitting <[EMAIL PROTECTED]> wrote: > Hi, > > On Wed, Aug 31, 2011 at 6:53 PM, Uwe Schindler <[EMAIL PROTECTED]> wrote: >> For this typical try...finally code I suggest to use a pre-Java-7.0 >> workaround to make this behave more correct: If you do try...finally and >> another Exception occurs on close() in the finally block, you lose the first >> Exception. > > Such a case is certainly possible scenario, but in my experience it > practically never occurs. I've used the try { ... } finally { > stream.close(); } pattern extensively for years, and I've never seen a > case where information was lost because of this. > > So personally I consider the problem rather theoretical and would > rather opt for cleaner code that avoids the extra constructs. > > BR, > > Jukka Zitting >
-
Re: Tika leaves files openJukka Zitting 2011-09-01, 09:56
Hi,
On Thu, Sep 1, 2011 at 11:36 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: > While rare for IS.close() to throw an exception, if it does, it's > quite awful because it masks the original exception. It seems best to > be defensive? With closeQuietly() you'd really be throwing information out in case where close() fails when no other exception has been thrown. Instead of one exception masking another, you'd have no exceptions masking one! BR, Jukka Zitting
-
RE: Tika leaves files openUwe Schindler 2011-09-01, 10:00
This is why we use the "modernized" Lucene IOUtils.closeSafely... (and
because Lucene has no external references). ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: [EMAIL PROTECTED] > -----Original Message----- > From: Jukka Zitting [mailto:[EMAIL PROTECTED]] > Sent: Thursday, September 01, 2011 11:56 AM > To: [EMAIL PROTECTED] > Subject: Re: Tika leaves files open > > Hi, > > On Thu, Sep 1, 2011 at 11:36 AM, Michael McCandless > <[EMAIL PROTECTED]> wrote: > > While rare for IS.close() to throw an exception, if it does, it's > > quite awful because it masks the original exception. It seems best to > > be defensive? > > With closeQuietly() you'd really be throwing information out in case where > close() fails when no other exception has been thrown. Instead of one exception > masking another, you'd have no exceptions masking one! > > BR, > > Jukka Zitting
-
Re: Tika leaves files openMichael McCandless 2011-09-01, 10:18
On Thu, Sep 1, 2011 at 5:56 AM, Jukka Zitting <[EMAIL PROTECTED]> wrote:
> Hi, > > On Thu, Sep 1, 2011 at 11:36 AM, Michael McCandless > <[EMAIL PROTECTED]> wrote: >> While rare for IS.close() to throw an exception, if it does, it's >> quite awful because it masks the original exception. It seems best to >> be defensive? > > With closeQuietly() you'd really be throwing information out in case > where close() fails when no other exception has been thrown. Instead > of one exception masking another, you'd have no exceptions masking > one! Duh, you're right: we don't want to use closeQuietly if there was no exception. For Lucene we do this: // open something boolean success = false; try { // do something success = true; } finally { if (!success) { closeQuietly(); } else { closeNormally(); } } This gets cleaner with Java 7 but it's some ways away before Tika can require Java 7... Mike
-
RE: Tika leaves files openUwe Schindler 2011-09-01, 10:41
With our internal Lucene IOUtils it's even simplier, see javadocs :-) This
is why I proposed to use it also for TIKA: Closeable resource1 = null, resource2 = null, resource3 = null; ExpectedException priorE = null; try { resource1 = ...; resource2 = ...; resource3 = ...; // Acquisition may throw ExpectedException ..do..stuff.. // May throw ExpectedException } catch (ExpectedException e) { priorE = e; } finally { IOUtils.closeSafely(priorE, resource1, resource2, resource3); } The above code is identical to the Java7 try-with-resources (100% identical behaviour, if Java7 is detected it will also log suppressed exceptions to the priorE stack trace). It's just a few lines more code. ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: [EMAIL PROTECTED] > -----Original Message----- > From: Michael McCandless [mailto:[EMAIL PROTECTED]] > Sent: Thursday, September 01, 2011 12:18 PM > To: [EMAIL PROTECTED] > Subject: Re: Tika leaves files open > > On Thu, Sep 1, 2011 at 5:56 AM, Jukka Zitting <[EMAIL PROTECTED]> > wrote: > > Hi, > > > > On Thu, Sep 1, 2011 at 11:36 AM, Michael McCandless > > <[EMAIL PROTECTED]> wrote: > >> While rare for IS.close() to throw an exception, if it does, it's > >> quite awful because it masks the original exception. It seems best > >> to be defensive? > > > > With closeQuietly() you'd really be throwing information out in case > > where close() fails when no other exception has been thrown. Instead > > of one exception masking another, you'd have no exceptions masking > > one! > > Duh, you're right: we don't want to use closeQuietly if there was no exception. > > For Lucene we do this: > > // open something > boolean success = false; > try { > // do something > success = true; > } finally { > if (!success) { > closeQuietly(); > } else { > closeNormally(); > } > } > > This gets cleaner with Java 7 but it's some ways away before Tika can require > Java 7... > > Mike
-
Re: Tika leaves files openTorsten Krah 2011-09-06, 09:19
Can someone summarize what must be done in latest Tika Release 0.9 do
workaround or fix this bug (without patching tika itself but via Reflection or some bugfix code - if this is even possible) - Tika is not really usable if running out of file descriptor at the moment in a webapp for example, because the proocess won't be closed - IOexception will be happening in the end everytime if running out of descriptors. So far i am calling "close" on the container object itself via reflection before calling close on the TikaInputStream - however there are still many "deleted" file descriptors pending to "apache-tika*tmp" files. lsof -f -p 2414 | grep deleted | wc -l 347 Anyone can recommend some workaround or strategy how to go on? thx Torsten
-
Re: Tika leaves files openTorsten Krah 2011-09-06, 12:16
Am Dienstag, den 06.09.2011, 11:19 +0200 schrieb Torsten Krah:
> lsof -f -p 2414 | grep deleted | wc -l > 347 For documentation purposes for anyone who uses 0.9 and got this problem: The ZipContainerDetector does not call close on the created ZipFile instance (at least in 0.9). This may lead to a RandomAccessFile FileDescriptor leak if the ZIP detection was not successful in the commons-compress code itself - close will force the descriptor to be closed there. I did test with my own ZipDetector which does: Zipfile zip = null; try { zipFile = ... } finally { ZipFile.closeQuietly(zip); } and no more leaks yet. Looking at the trunk it seems that already upstream but not in 0.9.x yet - maybe 1.0 will got this fix, we'll see.
-
Re: Tika leaves files openNick Burch 2011-09-06, 12:56
On Tue, 6 Sep 2011, Torsten Krah wrote:
> Can someone summarize what must be done in latest Tika Release 0.9 do > workaround or fix this bug Your best bet is probably just to use a recent nightly build / build from a recent SVN checkout. That will include the fix Nick
-
Re: Tika leaves files openZabrane Mickael 2011-09-06, 12:58
When the Tika 0.10 will be released guys?
Regards, Zabrane On Sep 6, 2011, at 2:56 PM, Nick Burch wrote: > On Tue, 6 Sep 2011, Torsten Krah wrote: >> Can someone summarize what must be done in latest Tika Release 0.9 do >> workaround or fix this bug > > Your best bet is probably just to use a recent nightly build / build from a recent SVN checkout. That will include the fix > > Nick
-
Re: Tika leaves files openKevin Clark 2011-09-06, 18:45
+1 for a new release. I'm looking forward to the pdfbox update.
On Tue, Sep 6, 2011 at 5:58 AM, Zabrane Mickael <[EMAIL PROTECTED]> wrote: > When the Tika 0.10 will be released guys? > Regards, > Zabrane > > On Sep 6, 2011, at 2:56 PM, Nick Burch wrote: > > On Tue, 6 Sep 2011, Torsten Krah wrote: > > Can someone summarize what must be done in latest Tika Release 0.9 do > > workaround or fix this bug > > Your best bet is probably just to use a recent nightly build / build from a > recent SVN checkout. That will include the fix > > Nick > > >
-
Re: Tika leaves files openZabrane Mickael 2011-09-06, 18:52
+1
Regards, Zabrane On Sep 6, 2011, at 8:45 PM, Kevin Clark wrote: > +1 for a new release. I'm looking forward to the pdfbox update. > > On Tue, Sep 6, 2011 at 5:58 AM, Zabrane Mickael <[EMAIL PROTECTED]> wrote: >> When the Tika 0.10 will be released guys? >> Regards, >> Zabrane >> >> On Sep 6, 2011, at 2:56 PM, Nick Burch wrote: >> >> On Tue, 6 Sep 2011, Torsten Krah wrote: >> >> Can someone summarize what must be done in latest Tika Release 0.9 do >> >> workaround or fix this bug >> >> Your best bet is probably just to use a recent nightly build / build from a >> recent SVN checkout. That will include the fix >> >> Nick >> >> >> |