|
|
-
Nutch 2 plugin implementation ClassNotFoundException
Ake Tangkananond 2012-08-03, 11:49
Hello,
I have question on the Nutch 2 plugin implementation.
I am implementing an image parser. It used to work fine in Nutch 1.5, but after I migrate the code to Nutch 2.0, there are some errors which I spend several hours with it and I was unable to trace the cause of it yet. Would appreciate the insight here in the mailing list.
While I was parsing the content fetched, I got the following error in the logs/hadoop.log 2012-08-03 18:28:25,304 ERROR parse.ParserFactory - PluginRuntimeException org.apache.nutch.plugin.PluginRuntimeException: java.lang.ClassNotFoundException: <my plugin class name> at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:166) at org.apache.nutch.parse.ParserFactory.getFields(ParserFactory.java:209) at org.apache.nutch.parse.ParserJob.getFields(ParserJob.java:191) at org.apache.nutch.parse.ParserJob.run(ParserJob.java:243) at org.apache.nutch.parse.ParserJob.parse(ParserJob.java:257) at org.apache.nutch.parse.ParserJob.run(ParserJob.java:300) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.parse.ParserJob.main(ParserJob.java:304) Caused by: java.lang.ClassNotFoundException: <my plugin class name> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:156) ... 7 more 2012-08-03 18:28:25,654 INFO crawl.SignatureFactory - Using Signature impl: org.apache.nutch.crawl.MD5Signature
What I did is that I copied minimal necessary files from other plugin folders and modify it to what I need. Then I edited nutch-site.xml to include my plugin, edited parse-plugins.xml to register mimeType. I added parse-image into the 2 packageset under <nutch-source>/build.xml, and added ant target under deploy and clean in <nutch-source>/src/plugin/build.xml, then I rebuild all. (These what I did in Nutch 1.5 and it works, but no luck for Nutch 2)
Could you advise what else I miss, or what more information I should provide. Thank you very much ! Regards, Ake Tangkananond
-
Re: Nutch 2 plugin implementation ClassNotFoundException
Ferdy Galema 2012-08-03, 11:59
Hi,
Some quick pointers: Do you run it in local mode? Is your plugin's plugin.xml and parse-image.jar present in runtime/local/plugins after you build it? Do you use external libraries?
Ferdy.
On Fri, Aug 3, 2012 at 1:49 PM, Ake Tangkananond <[EMAIL PROTECTED]> wrote:
> Hello, > > I have question on the Nutch 2 plugin implementation. > > I am implementing an image parser. It used to work fine in Nutch 1.5, but > after I migrate the code to Nutch 2.0, there are some errors which I spend > several hours with it and I was unable to trace the cause of it yet. Would > appreciate the insight here in the mailing list. > > While I was parsing the content fetched, I got the following error in the > logs/hadoop.log > 2012-08-03 18:28:25,304 ERROR parse.ParserFactory - PluginRuntimeException > org.apache.nutch.plugin.PluginRuntimeException: > java.lang.ClassNotFoundException: <my plugin class name> > at > org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:166) > at > org.apache.nutch.parse.ParserFactory.getFields(ParserFactory.java:209) > at org.apache.nutch.parse.ParserJob.getFields(ParserJob.java:191) > at org.apache.nutch.parse.ParserJob.run(ParserJob.java:243) > at org.apache.nutch.parse.ParserJob.parse(ParserJob.java:257) > at org.apache.nutch.parse.ParserJob.run(ParserJob.java:300) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.nutch.parse.ParserJob.main(ParserJob.java:304) > Caused by: java.lang.ClassNotFoundException: <my plugin class name> > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:423) > at java.lang.ClassLoader.loadClass(ClassLoader.java:356) > at > org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:156) > ... 7 more > 2012-08-03 18:28:25,654 INFO crawl.SignatureFactory - Using Signature > impl: > org.apache.nutch.crawl.MD5Signature > > What I did is that I copied minimal necessary files from other plugin > folders and modify it to what I need. Then I edited nutch-site.xml to > include my plugin, edited parse-plugins.xml to register mimeType. I added > parse-image into the 2 packageset under <nutch-source>/build.xml, and added > ant target under deploy and clean in <nutch-source>/src/plugin/build.xml, > then I rebuild all. (These what I did in Nutch 1.5 and it works, but no > luck > for Nutch 2) > > Could you advise what else I miss, or what more information I should > provide. Thank you very much ! > > > Regards, > Ake Tangkananond > > >
-
Re: Nutch 2 plugin implementation ClassNotFoundException
Ake Tangkananond 2012-08-03, 12:56
Hello,
Thank you for a very quick reply. Yes I run it in local mode. And my plugin's plugin.xml and parse-image.jar are present in the runtime/local/plugins.
I just knew the root cause now. Here is how I find the cause: I insert the following code at PluginDescriptor.java line 288 to print out all lookup library path System.out.println(java.util.Arrays.toString(urls)); And I see some problem here: [file:/usr/local/apache-nutch-2.0.0-source/runtime/local/plugins/parse-htm l/parse-image.jar]
Figuring out how to gracefully fix it. But if one knows the right fixing spot, please give me some light. xD BTW, I'm using IntelliJ IDEA but I don't know how to configure it with the Ivy project. Would be great if one could give me hands at iamake at gmail dot com ;-)
Regards, Ake Tangkananond
On 8/3/12 6:59 PM, "Ferdy Galema" <[EMAIL PROTECTED]> wrote:
>Hi, > >Some quick pointers: Do you run it in local mode? Is your plugin's >plugin.xml and parse-image.jar present in runtime/local/plugins after you >build it? Do you use external libraries? > >Ferdy. > >On Fri, Aug 3, 2012 at 1:49 PM, Ake Tangkananond <[EMAIL PROTECTED]> wrote: > >> Hello, >> >> I have question on the Nutch 2 plugin implementation. >> >> I am implementing an image parser. It used to work fine in Nutch 1.5, >>but >> after I migrate the code to Nutch 2.0, there are some errors which I >>spend >> several hours with it and I was unable to trace the cause of it yet. >>Would >> appreciate the insight here in the mailing list. >> >> While I was parsing the content fetched, I got the following error in >>the >> logs/hadoop.log >> 2012-08-03 18:28:25,304 ERROR parse.ParserFactory - >>PluginRuntimeException >> org.apache.nutch.plugin.PluginRuntimeException: >> java.lang.ClassNotFoundException: <my plugin class name> >> at >> >>org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:166 >>) >> at >> org.apache.nutch.parse.ParserFactory.getFields(ParserFactory.java:209) >> at >>org.apache.nutch.parse.ParserJob.getFields(ParserJob.java:191) >> at org.apache.nutch.parse.ParserJob.run(ParserJob.java:243) >> at org.apache.nutch.parse.ParserJob.parse(ParserJob.java:257) >> at org.apache.nutch.parse.ParserJob.run(ParserJob.java:300) >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >> at org.apache.nutch.parse.ParserJob.main(ParserJob.java:304) >> Caused by: java.lang.ClassNotFoundException: <my plugin class name> >> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) >> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:423) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:356) >> at >> >>org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:156 >>) >> ... 7 more >> 2012-08-03 18:28:25,654 INFO crawl.SignatureFactory - Using Signature >> impl: >> org.apache.nutch.crawl.MD5Signature >> >> What I did is that I copied minimal necessary files from other plugin >> folders and modify it to what I need. Then I edited nutch-site.xml to >> include my plugin, edited parse-plugins.xml to register mimeType. I >>added >> parse-image into the 2 packageset under <nutch-source>/build.xml, and >>added >> ant target under deploy and clean in >><nutch-source>/src/plugin/build.xml, >> then I rebuild all. (These what I did in Nutch 1.5 and it works, but no >> luck >> for Nutch 2) >> >> Could you advise what else I miss, or what more information I should >> provide. Thank you very much ! >> >> >> Regards, >> Ake Tangkananond >> >> >>
-
Re: Nutch 2 plugin implementation ClassNotFoundException
Ake Tangkananond 2012-08-03, 15:18
Hi All,
I'm now able to fix the problem. Thank you everyone. The summary of the problem is as follows:
Problem: <nutch-source>/build/plugins/<plugin-name-existing>/plugin.xml was overwritten when I used plugin-name-existing as an id in the <nutch-source>/src/plugin/<plugin-name-new>/plugin.xml:/plugin[@id]. It was my mistake, but after I corrected it (change /plugin[@id] to plugin-name-new), the <nutch-source>/build/plugins/<plugin-name-existing>/plugin.xml has never been re-copied by the build script.
Not sure if this is intended.
BTW. I found whitespace typo in the PluginManifestParser.java:187 Current: LOG.debug("plugin: id=" + id + " name=" + name + " version=" + version + " provider=" + providerName + "class=" + pluginClazz);
Correct: (space before class) LOG.debug("plugin: id=" + id + " name=" + name + " version=" + version + " provider=" + providerName + " class=" + pluginClazz); Regards, Ake Tangkananond On 8/3/12 7:56 PM, "Ake Tangkananond" <[EMAIL PROTECTED]> wrote:
>Hello, > >Thank you for a very quick reply. Yes I run it in local mode. And my >plugin's plugin.xml and parse-image.jar are present in the >runtime/local/plugins. > >I just knew the root cause now. Here is how I find the cause: >I insert the following code at PluginDescriptor.java line 288 to print out >all lookup library path > System.out.println(java.util.Arrays.toString(urls)); >And I see some problem here: > [file:/usr/local/apache-nutch-2.0.0-source/runtime/local/plugins/parse-ht >m >l/parse-image.jar] > >Figuring out how to gracefully fix it. But if one knows the right fixing >spot, please give me some light. xD > > >BTW, I'm using IntelliJ IDEA but I don't know how to configure it with the >Ivy project. Would be great if one could give me hands at iamake at gmail >dot com ;-) > > > >Regards, >Ake Tangkananond > > > >On 8/3/12 6:59 PM, "Ferdy Galema" <[EMAIL PROTECTED]> wrote: > >>Hi, >> >>Some quick pointers: Do you run it in local mode? Is your plugin's >>plugin.xml and parse-image.jar present in runtime/local/plugins after you >>build it? Do you use external libraries? >> >>Ferdy. >> >>On Fri, Aug 3, 2012 at 1:49 PM, Ake Tangkananond <[EMAIL PROTECTED]> >>wrote: >> >>> Hello, >>> >>> I have question on the Nutch 2 plugin implementation. >>> >>> I am implementing an image parser. It used to work fine in Nutch 1.5, >>>but >>> after I migrate the code to Nutch 2.0, there are some errors which I >>>spend >>> several hours with it and I was unable to trace the cause of it yet. >>>Would >>> appreciate the insight here in the mailing list. >>> >>> While I was parsing the content fetched, I got the following error in >>>the >>> logs/hadoop.log >>> 2012-08-03 18:28:25,304 ERROR parse.ParserFactory - >>>PluginRuntimeException >>> org.apache.nutch.plugin.PluginRuntimeException: >>> java.lang.ClassNotFoundException: <my plugin class name> >>> at >>> >>>org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:16 >>>6 >>>) >>> at >>> org.apache.nutch.parse.ParserFactory.getFields(ParserFactory.java:209) >>> at >>>org.apache.nutch.parse.ParserJob.getFields(ParserJob.java:191) >>> at org.apache.nutch.parse.ParserJob.run(ParserJob.java:243) >>> at org.apache.nutch.parse.ParserJob.parse(ParserJob.java:257) >>> at org.apache.nutch.parse.ParserJob.run(ParserJob.java:300) >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >>> at org.apache.nutch.parse.ParserJob.main(ParserJob.java:304) >>> Caused by: java.lang.ClassNotFoundException: <my plugin class name> >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:423) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
|
|