FYI, for a similar task - testing crawler-commons parser - I've started a small test
tools which reads the sitemaps from WARC files:!topic/crawler-commons/pOLsCVwRsxY

As it only takes what is necessary for testing, it's lean and "no overkill".


On 07/11/2017 12:06 PM, Jackson, Andy wrote:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB