I am currently attempting to dump the contents of a crawl into multiple
WARC files using
./bin/nutch commoncrawldump -outputDir nameOfOutputDir -segment
However, I get multiple occurrences of
URL skipped. Content of size X was truncated to Y.
I have set both http.content.limit and file.content.limit to -1 in order
to remove any limits, but I'm guessing neither applies to this
situation. Any way of removing said cap?