-No more urls to fetch
tamanjit.bindra@...) 2011-06-29, 17:15
I was going through past threads and found the problem i face has been faced
by many others. But mostly either it has been ignored or has been
I use Nutch 1.1. My crawl has been working fine mostly (though i am still
getting a hang of how all the screws work).
I have a particular url which I generally need to crawl more than others
(its a site-map). So i cleaned up my Solr index of the domain (to re-start.
My index had lot of 404 urls which were not getting cleaned up) i.e. deleted
all the docs of the domain of the url i need to fetch.
I deleted everything from the crawl folder, so everything is fresh.
I start off a crawl with depth = 1 and topN = 1000 and noOfThreads = 10.
It fetched lot of site in the index (though not everythin). So i repeated
the same crawl command another 7-8 times. The docs in the index kept on
But then this final time when i try running the crawl it fails at depth 0,
with the message
Stopping at depth=0 - no more URLs to fetch.
No URLs to fetch - check your seed list and URL filters.
I cleaned up everythin again and started from scratch, crawlingstarted off
again only to fail again after a few inital successful crawls.
Awaiting your reply.
View this message in context: http://lucene.472066.n3.nabble.com/No-more-urls-to-fetch-tp3122462p3122462.html
Sent from the Nutch - User mailing list archive at Nabble.com.
tamanjit.bindra@...) 2011-06-30, 05:29
tamanjit.bindra@...) 2011-06-30, 04:27