|
|
-
No more urls to fetchtamanjit.bindra@...) 2011-06-29, 17:15
Hi,
I was going through past threads and found the problem i face has been faced by many others. But mostly either it has been ignored or has been unresolved. I use Nutch 1.1. My crawl has been working fine mostly (though i am still getting a hang of how all the screws work). I have a particular url which I generally need to crawl more than others (its a site-map). So i cleaned up my Solr index of the domain (to re-start. My index had lot of 404 urls which were not getting cleaned up) i.e. deleted all the docs of the domain of the url i need to fetch. I deleted everything from the crawl folder, so everything is fresh. I start off a crawl with depth = 1 and topN = 1000 and noOfThreads = 10. It fetched lot of site in the index (though not everythin). So i repeated the same crawl command another 7-8 times. The docs in the index kept on increasing. But then this final time when i try running the crawl it fails at depth 0, with the message Stopping at depth=0 - no more URLs to fetch. No URLs to fetch - check your seed list and URL filters. crawl finished: I cleaned up everythin again and started from scratch, crawlingstarted off again only to fail again after a few inital successful crawls. Awaiting your reply. -- View this message in context: http://lucene.472066.n3.nabble.com/No-more-urls-to-fetch-tp3122462p3122462.html Sent from the Nutch - User mailing list archive at Nabble.com. |