Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Plain View
Nutch, mail # user - Seed urls not getting crawled.


+
Sudip Datta 2012-02-09, 07:26
Copy link to this message
-
Re: Seed urls not getting crawled.
Lewis John Mcgibbney 2012-02-10, 21:00
Hi,

On Thu, Feb 9, 2012 at 7:26 AM, Sudip Datta <[EMAIL PROTECTED]> wrote:

>
> While, this indicates that a reattempt will be made in 1 day, the
> 'url' never really gets the state db_fetched. On the other hand, if I
> set generate.max.count = -1, the page is indeed crawled but the crawl
> is painfully slow.
>
Do you have any idea about which part of the crawl is painfully slow?

How are you running your crawls?

Thanks

--
*Lewis*