Hi Filip,

Thanks for the suggestion! I already have those 3 values set to -1. I also checked a few of the pdf documents in the db_unfetched state, and many of them are smaller than others that have been fetched successfully. So it doesn't look like a size problem...

Thanks,
Rashmi

-----Original Message-----
From: Filip Stysiak [mailto:[EMAIL PROTECTED]]
Sent: Thursday, July 13, 2017 9:43 AM
To: [EMAIL PROTECTED]
Subject: [EXTERNAL] Re: nutch is not fetching all the pages

Try loosing the restrictions on the contents limits

<property>
  <name>file.content.limit</name>
  <value>-1</value>
</property>

<property>
  <name>http.content.limit</name>
  <value>-1</value>
</property>

<property>
  <name>ftp.content.limit</name>
  <value>-1</value>
</property>

maybe this will help.

2017-07-12 15:57 GMT+02:00 Srinivasa, Rashmi <[EMAIL PROTECTED]>:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB