Hi Tamanjit,
Thank you for your help. I tried your suggestion, but it crawl every normal url except url of this type
answers.yahoo.com/question/index;_ylt=AtKz1xss1AS6RGeAQTFz1kyf5HNG;_ylv=3?qid=20110715030336AAzXnNs
I also try this suggestion by
lucene.472066.n3.nabble.com/How-to-make-nutch-crawl-within-a-sub-category-of-an-URL-td619381.html
Use
http://answers.yahoo.com/dir/index;_ylt=AqH5s00Y0dXDEjwmdUrxNabpy6IX;_ylv=3?link=list&sid=396545660 as the url to crawl.
Add this in the crawl-urlfilter.txt
+^
http://answers.yahoo.com/dir/index;_ylt=AqH5s00Y0dXDEjwmdUrxNabpy6IX;_ylv=3?link=list&sid=396545660 +^
http://answers.yahoo.com/question -.
but it couldn't crawl anything
Does this mean that nutch can only crawl normal hyperlink?
________________________________
From: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Friday, 15 July 2011 8:04 PM
Subject: Re: Is it possible to crawl yahoo answer?
Don't think that should be a problem. Though I still feel you would have to
try to actually know, because am not sure if it is going to crawl to an
encrypted url (Experts please help here)
Just make sure the following line is coomented out in crawl-urlfilter.txt:
# skip URLs containing certain characters as probable queries, etc.
#-[?*!@=]
And add the following line:
+^
http://answers.yahoo.com/([a-zA-Z0-9-_/]*)Hopefully it should work. Good luck.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Is-it-possible-to-crawl-yahoo-answer-tp3171559p3171764.htmlSent from the Nutch - User mailing list archive at Nabble.com.