|
|
-
Re: error parsing some xmlIng. Eyeris Rodriguez Rue... 2012-05-21, 17:41
I use nutch 1.4 and solr 3.4
I think that my error is at moment to parse one xml with this structure <!--text with -- inside the comentary--> I was reading but not found so much, this is my error's log. please some help. ************************************************************************************************* 2012-05-21 10:17:53,398 INFO fetcher.Fetcher - Fetcher: starting at 2012-05-21 10:17:53 2012-05-21 10:17:53,399 INFO fetcher.Fetcher - Fetcher: segment: crawl/segments/20120521101752 2012-05-21 10:17:53,762 INFO fetcher.Fetcher - Using queue mode : byHost 2012-05-21 10:17:53,762 INFO fetcher.Fetcher - Fetcher: threads: 20 2012-05-21 10:17:53,762 INFO fetcher.Fetcher - Fetcher: time-out divisor: 2 2012-05-21 10:17:53,777 INFO fetcher.Fetcher - QueueFeeder finished: total 9 records + hit by time limit :0 2012-05-21 10:17:53,804 WARN parse.ParsePluginsReader - Unable to parse [null].Reason is [org.xml.sax.SAXParseException; lineNumber: 37; columnNumber: 7; The string "--" is not permitted within comments.] 2012-05-21 10:17:53,809 WARN mapred.LocalJobRunner - job_local_0005 java.lang.RuntimeException: Parse Plugins preferences could not be loaded. at org.apache.nutch.parse.ParserFactory.<init>(ParserFactory.java:73) at org.apache.nutch.parse.ParseUtil.<init>(ParseUtil.java:53) at org.apache.nutch.fetcher.Fetcher$FetcherThread.<init>(Fetcher.java:581) at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1075) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) **************************************************************************************************** ----- Mensaje original ----- De: "Markus Jelsma" <[EMAIL PROTECTED]> Para: [EMAIL PROTECTED] Enviados: Lunes, 21 de Mayo 2012 11:41:40 Asunto: RE: error parsing some xml Hi Which version do you use? It should list the troubling URL. What's the stack trace? Cheers -----Original message----- > From:Ing. Eyeris Rodriguez Rueda <[EMAIL PROTECTED]> > Sent: Mon 21-May-2012 17:07 > To: [EMAIL PROTECTED] > Subject: error parsing some xml > > Hi all. > When I try to crawl i have a problem at parsing some xml, i get the exception below, i want to know which is the xml with problem at parsing moment. > ************************************************************************************** > WARN parse.ParsePluginsReader - Unable to parse [null].Reason is [org.xml.sax.SAXParseException; lineNumber: 37; columnNumber: 7; The string "--" is not permitted within comments.] > *************************************************************************************** > Please some help will apreciated > > > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION > > http://www.uci.cu > http://www.facebook.com/universidad.uci > http://www.flickr.com/photos/universidad_uci > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci |