If you're going to give out web scrapers, PLEASE put a delay between file downloads. Nutch was locking up our system, almost a DOS attack. Hopefully Nutch obeys the robots.txt file.
Re: We just blocked Nutch
Markus Jelsma 2012-04-30, 08:02
Nutch, by default, delays five seconds between successive requests to the same server, but can be overridden easily. Nutch also obeys the robots exclusion standard but can be configured to listen to a different identifier than Nutch.
The best option is to contact the host or ISP if trouble continues.
On Sun, 29 Apr 2012 06:29:35 -0700, Jerry Durand <[EMAIL PROTECTED]> wrote: > If you're going to give out web scrapers, PLEASE put a delay between > file downloads. Nutch was locking up our system, almost a DOS > attack. > Hopefully Nutch obeys the robots.txt file.
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext