Strange, it should show the bad URL. But since you have only 9 URL's the easiest way to go is to use the parsechecker tool for each URL.
-----Original message-----
> From:Ing. Eyeris Rodriguez Rueda <[EMAIL PROTECTED]>
> Sent: Mon 21-May-2012 19:42
> To: [EMAIL PROTECTED]
> Subject: Re: error parsing some xml
>
> I use nutch 1.4 and solr 3.4
> I think that my error is at moment to parse one xml with this structure
> <!--text with -- inside the comentary-->
> I was reading but not found so much, this is my error's log.
> please some help.
> *************************************************************************************************
> 2012-05-21 10:17:53,398 INFO fetcher.Fetcher - Fetcher: starting at 2012-05-21 10:17:53
> 2012-05-21 10:17:53,399 INFO fetcher.Fetcher - Fetcher: segment: crawl/segments/20120521101752
> 2012-05-21 10:17:53,762 INFO fetcher.Fetcher - Using queue mode : byHost
> 2012-05-21 10:17:53,762 INFO fetcher.Fetcher - Fetcher: threads: 20
> 2012-05-21 10:17:53,762 INFO fetcher.Fetcher - Fetcher: time-out divisor: 2
> 2012-05-21 10:17:53,777 INFO fetcher.Fetcher - QueueFeeder finished: total 9 records + hit by time limit :0
> 2012-05-21 10:17:53,804 WARN parse.ParsePluginsReader - Unable to parse [null].Reason is [org.xml.sax.SAXParseException; lineNumber: 37; columnNumber: 7; The string "--" is not permitted within comments.]
> 2012-05-21 10:17:53,809 WARN mapred.LocalJobRunner - job_local_0005
> java.lang.RuntimeException: Parse Plugins preferences could not be loaded.
> at org.apache.nutch.parse.ParserFactory.<init>(ParserFactory.java:73)
> at org.apache.nutch.parse.ParseUtil.<init>(ParseUtil.java:53)
> at org.apache.nutch.fetcher.Fetcher$FetcherThread.<init>(Fetcher.java:581)
> at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1075)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> ****************************************************************************************************
>
>
>
>
> ----- Mensaje original -----
> De: "Markus Jelsma" <[EMAIL PROTECTED]>
> Para: [EMAIL PROTECTED]
> Enviados: Lunes, 21 de Mayo 2012 11:41:40
> Asunto: RE: error parsing some xml
>
> Hi
>
> Which version do you use? It should list the troubling URL. What's the stack trace?
>
> Cheers
>
>
>
> -----Original message-----
> > From:Ing. Eyeris Rodriguez Rueda <[EMAIL PROTECTED]>
> > Sent: Mon 21-May-2012 17:07
> > To: [EMAIL PROTECTED]
> > Subject: error parsing some xml
> >
> > Hi all.
> > When I try to crawl i have a problem at parsing some xml, i get the exception below, i want to know which is the xml with problem at parsing moment.
> > **************************************************************************************
> > WARN parse.ParsePluginsReader - Unable to parse [null].Reason is [org.xml.sax.SAXParseException; lineNumber: 37; columnNumber: 7; The string "--" is not permitted within comments.]
> > ***************************************************************************************
> > Please some help will apreciated
> >
> >
> > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
> > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> >
> >
http://www.uci.cu> >
http://www.facebook.com/universidad.uci> >
http://www.flickr.com/photos/universidad_uci> >
>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
>
http://www.uci.cu>
http://www.facebook.com/universidad.uci>
http://www.flickr.com/photos/universidad_uci>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
>
http://www.uci.cu>
http://www.facebook.com/universidad.uci>
http://www.flickr.com/photos/universidad_uci>