|
Markus Jelsma
2010-06-21, 18:04
Mattmann, Chris A
2010-06-21, 18:07
Markus Jelsma
2010-06-21, 18:13
Alex McLintock
2010-06-22, 09:48
Markus Jelsma
2010-06-22, 10:02
Alex McLintock
2010-06-22, 10:35
Mattmann, Chris A
2010-06-22, 13:22
Markus Jelsma
2010-06-22, 13:38
|
-
The parse-tika plug-in in 1.1Markus Jelsma 2010-06-21, 18:04
Well, where is it now? The parse-plugins.xml still refers to it, but it's not present in the plugins/ directory.
-
Re: The parse-tika plug-in in 1.1Mattmann, Chris A 2010-06-21, 18:07
Hi Markus,
Hmmm: I see it here? http://svn.apache.org/repos/asf/nutch/tags/relase-1.1/src/plugin/parse-tika/ Where aren't you seeing it in? Cheers, Chris On 6/21/10 11:04 AM, "Markus Jelsma" <[EMAIL PROTECTED]> wrote: Well, where is it now? The parse-plugins.xml still refers to it, but it's not present in the plugins/ directory. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
RE: Re: The parse-tika plug-in in 1.1Markus Jelsma 2010-06-21, 18:13
Hmmm, i'm not building from source. I just download the package and get going! The jars and wars are/were always just there and i can/could use them instantly.
Maybe the compiled jar is just not included? -----Original message----- From: Mattmann, Chris A (388J) <[EMAIL PROTECTED]> Sent: Mon 21-06-2010 20:07 To: [EMAIL PROTECTED]; Subject: Re: The parse-tika plug-in in 1.1 Hi Markus, Hmmm: I see it here? http://svn.apache.org/repos/asf/nutch/tags/relase-1.1/src/plugin/parse-tika/ Where aren't you seeing it in? Cheers, Chris On 6/21/10 11:04 AM, "Markus Jelsma" <[EMAIL PROTECTED]> wrote: Well, where is it now? The parse-plugins.xml still refers to it, but it's not present in the plugins/ directory. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
Re: Re: The parse-tika plug-in in 1.1Alex McLintock 2010-06-22, 09:48
Hi Markus,
> The jars and wars are/were always just there and i can/could use them instantly. Sounds like we need to improve some documentation :-) I believe the package went to "source only" in the previous (1.0) version - so Chris is just following the current "best practice" by not creating all the jars in 1.1. I wasn't around for that decision but don't find it too onerous myself to run ant. Is it a problem for you? Alex
-
Re: The parse-tika plug-in in 1.1Markus Jelsma 2010-06-22, 10:02
Well, it's not that a big problem of course, just another step before it's
ready for testing. But i'm wondering, what would be a good reason not to ship the package as jar as well? I'd bet this is not going to be the first mail on this issue, it's not documentented and newcomers would probably not recognize the need to compile it before they can use Tika. On Tuesday 22 June 2010 11:48:40 Alex McLintock wrote: > Hi Markus, > > > The jars and wars are/were always just there and i can/could use them > > instantly. > > Sounds like we need to improve some documentation :-) > > > I believe the package went to "source only" in the previous (1.0) > version - so Chris is just following the current "best practice" by > not creating all the jars in 1.1. > > > I wasn't around for that decision but don't find it too onerous myself > to run ant. Is it a problem for you? > > Alex > Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
-
Re: The parse-tika plug-in in 1.1Alex McLintock 2010-06-22, 10:35
On 22 June 2010 11:02, Markus Jelsma <[EMAIL PROTECTED]> wrote:
> it's not documentented and newcomers would probably not recognize the need to > compile it before they can use Tika. I've been trying to update the Wiki with correct and improved documentation but if the documentation which comes with Nutch is wrong then please submit a bug report, or a patch, or just tell me so I can try to do the same :-) I don't like the fact that we (collectively) improve the code, but leave the documentation in an out of date state. Alex
-
Re: The parse-tika plug-in in 1.1Mattmann, Chris A 2010-06-22, 13:22
Hey Alex, and Markus,
In fact, it was my preference to go to source only, but I actually included both source and binary in the 1.1 release to please the people who were used to getting binary distributions :) However, in the future, I would prefer and intend to pursue only doing source releases for Nutch. It's a community though so I'll have to convince my committer compatriots :) Regardless, parse-tika is in fact included in the Nutch 1.1 binary distribution: wget "http://mirror.cloudera.com/apache/nutch/apache-nutch-1.1-bin.tar.gz" tar xvzf apache-nutch-1.1-bin.tar.gz cd apache-nutch-1.1-bin unzip -l nutch-1.1.job | grep parse-tika shows: 0 06-07-10 05:28 plugins/parse-tika/ 43033 06-07-10 05:28 plugins/parse-tika/asm-3.1.jar 189233 06-07-10 05:28 plugins/parse-tika/bcmail-jdk14-136.jar 229116 06-07-10 05:28 plugins/parse-tika/bcmail-jdk15-1.45.jar 1401560 06-07-10 05:28 plugins/parse-tika/bcprov-jdk14-136.jar 1663318 06-07-10 05:28 plugins/parse-tika/bcprov-jdk15-1.45.jar 143847 06-07-10 05:28 plugins/parse-tika/commons-compress-1.0.jar 60686 06-07-10 05:28 plugins/parse-tika/commons-logging-1.1.1.jar 313898 06-07-10 05:28 plugins/parse-tika/dom4j-1.6.1.jar 153220 06-07-10 05:28 plugins/parse-tika/fontbox-1.1.0.jar 28804 06-07-10 05:28 plugins/parse-tika/geronimo-stax-api_1.0_spec-1.0.1.jar 51211 06-07-10 05:28 plugins/parse-tika/jempbox-1.1.0.jar 90929 06-07-10 05:28 plugins/parse-tika/metadata-extractor-2.4.0-beta-1.jar 21227 06-07-10 05:28 plugins/parse-tika/parse-tika.jar 4709746 06-07-10 05:28 plugins/parse-tika/pdfbox-1.1.0.jar 2439 04-06-10 11:38 plugins/parse-tika/plugin.xml 1539291 06-07-10 05:28 plugins/parse-tika/poi-3.6.jar 412783 06-07-10 05:28 plugins/parse-tika/poi-ooxml-3.6.jar 3774332 06-07-10 05:28 plugins/parse-tika/poi-ooxml-schemas-3.6.jar 795888 06-07-10 05:28 plugins/parse-tika/poi-scratchpad-3.6.jar 90023 06-07-10 05:28 plugins/parse-tika/tagsoup-1.2.jar 215263 06-07-10 05:28 plugins/parse-tika/tika-parsers-0.7.jar 109318 06-07-10 05:28 plugins/parse-tika/xml-apis-1.0.b2.jar 2666695 06-07-10 05:28 plugins/parse-tika/xmlbeans-2.3.0.jar So, I'm not sure what you are seeing? Cheers, Chris On 6/22/10 2:48 AM, "Alex McLintock" <[EMAIL PROTECTED]> wrote: Hi Markus, > The jars and wars are/were always just there and i can/could use them instantly. Sounds like we need to improve some documentation :-) I believe the package went to "source only" in the previous (1.0) version - so Chris is just following the current "best practice" by not creating all the jars in 1.1. I wasn't around for that decision but don't find it too onerous myself to run ant. Is it a problem for you? Alex ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
Re: The parse-tika plug-in in 1.1Markus Jelsma 2010-06-22, 13:38
Ah, i understand now! I confused the slightly older nightly build with the new
release. And to make matters worse, i've also got an even older nightly build. One of them does not include the parse-tika plugin, for some reason... Anyway. this solves the case though ;) On Tuesday 22 June 2010 15:22:56 Mattmann, Chris A (388J) wrote: > Hey Alex, and Markus, > > In fact, it was my preference to go to source only, but I actually included > both source and binary in the 1.1 release to please the people who were > used to getting binary distributions :) However, in the future, I would > prefer and intend to pursue only doing source releases for Nutch. It's a > community though so I'll have to convince my committer compatriots :) > > Regardless, parse-tika is in fact included in the Nutch 1.1 binary > distribution: > > wget "http://mirror.cloudera.com/apache/nutch/apache-nutch-1.1-bin.tar.gz" > tar xvzf apache-nutch-1.1-bin.tar.gz > cd apache-nutch-1.1-bin > unzip -l nutch-1.1.job | grep parse-tika > > shows: > > 0 06-07-10 05:28 plugins/parse-tika/ > 43033 06-07-10 05:28 plugins/parse-tika/asm-3.1.jar > 189233 06-07-10 05:28 plugins/parse-tika/bcmail-jdk14-136.jar > 229116 06-07-10 05:28 plugins/parse-tika/bcmail-jdk15-1.45.jar > 1401560 06-07-10 05:28 plugins/parse-tika/bcprov-jdk14-136.jar > 1663318 06-07-10 05:28 plugins/parse-tika/bcprov-jdk15-1.45.jar > 143847 06-07-10 05:28 plugins/parse-tika/commons-compress-1.0.jar > 60686 06-07-10 05:28 plugins/parse-tika/commons-logging-1.1.1.jar > 313898 06-07-10 05:28 plugins/parse-tika/dom4j-1.6.1.jar > 153220 06-07-10 05:28 plugins/parse-tika/fontbox-1.1.0.jar > 28804 06-07-10 05:28 > plugins/parse-tika/geronimo-stax-api_1.0_spec-1.0.1.jar 51211 06-07-10 > 05:28 plugins/parse-tika/jempbox-1.1.0.jar > 90929 06-07-10 05:28 > plugins/parse-tika/metadata-extractor-2.4.0-beta-1.jar 21227 06-07-10 > 05:28 plugins/parse-tika/parse-tika.jar > 4709746 06-07-10 05:28 plugins/parse-tika/pdfbox-1.1.0.jar > 2439 04-06-10 11:38 plugins/parse-tika/plugin.xml > 1539291 06-07-10 05:28 plugins/parse-tika/poi-3.6.jar > 412783 06-07-10 05:28 plugins/parse-tika/poi-ooxml-3.6.jar > 3774332 06-07-10 05:28 plugins/parse-tika/poi-ooxml-schemas-3.6.jar > 795888 06-07-10 05:28 plugins/parse-tika/poi-scratchpad-3.6.jar > 90023 06-07-10 05:28 plugins/parse-tika/tagsoup-1.2.jar > 215263 06-07-10 05:28 plugins/parse-tika/tika-parsers-0.7.jar > 109318 06-07-10 05:28 plugins/parse-tika/xml-apis-1.0.b2.jar > 2666695 06-07-10 05:28 plugins/parse-tika/xmlbeans-2.3.0.jar > > So, I'm not sure what you are seeing? > > Cheers, > Chris > > > > On 6/22/10 2:48 AM, "Alex McLintock" <[EMAIL PROTECTED]> wrote: > > Hi Markus, > > > The jars and wars are/were always just there and i can/could use them > > instantly. > > Sounds like we need to improve some documentation :-) > > > I believe the package went to "source only" in the previous (1.0) > version - so Chris is just following the current "best practice" by > not creating all the jars in 1.1. > > > I wasn't around for that decision but don't find it too onerous myself > to run ant. Is it a problem for you? > > Alex > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: [EMAIL PROTECTED] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350 |