Thanks for the suggestion! I already have those 3 values set to -1. I also checked a few of the pdf documents in the db_unfetched state, and many of them are smaller than others that have been fetched successfully. So it doesn't look like a size problem...
From: Filip Stysiak [mailto:[EMAIL PROTECTED]]
Sent: Thursday, July 13, 2017 9:43 AM
To: [EMAIL PROTECTED]
Subject: [EXTERNAL] Re: nutch is not fetching all the pages
Try loosing the restrictions on the contents limits
maybe this will help.
2017-07-12 15:57 GMT+02:00 Srinivasa, Rashmi <[EMAIL PROTECTED]>: