Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Nutch, mail # user - Problems with tutorial


Copy link to this message
-
Re: Problems with tutorial
Emre Çelikten 2012-06-17, 17:32
Hello,

Check your urls and regex-urlfilter files. Probably you have a problem
there, assuming you are using your own links.

On 06/17/2012 05:46 PM, soberchallen wrote:
> Hello, I have the same problem. Have you already solved? The detail is as
> followed!
> *bin/nutch crawl urls -dir crawl -depth 2 -topN 100 -threads 2*
> solrUrl is not set, indexing will be skipped...
> crawl started in: crawl
> rootUrlDir = urls
> threads = 2
> depth = 2
> solrUrl=null
> topN = 100
> Injector: starting at 2012-06-17 22:27:39
> Injector: crawlDb: crawl/crawldb
> Injector: urlDir: urls
> Injector: Converting injected urls to crawl db entries.
> Injector: Merging injected urls into crawl db.
> Injector: finished at 2012-06-17 22:27:41, elapsed: 00:00:02
> Generator: starting at 2012-06-17 22:27:41
> Generator: Selecting best-scoring urls due for fetch.
> Generator: filtering: true
> Generator: normalizing: true
> Generator: topN: 100
> Generator: jobtracker is 'local', generating exactly one partition.
> Generator: 0 records selected for fetching, exiting ...
> Stopping at depth=0 - no more URLs to fetch.
> No URLs to fetch - check your seed list and URL filters.
> crawl finished: crawl
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Problems-with-tutorial-tp3156809p3990019.html
> Sent from the Nutch - User mailing list archive at Nabble.com.