Nice.

Well, in case it¹s useful, I cleaned up my code somewhat, used Sebastian¹s
code to parse the HTTP headers for WARC files, and added (BSD licensed)
test files from DROID and some reasonably meaningful tests.

It¹s on this branch:

https://github.com/ukwa/tika/tree/experimental-warc-parsing

And the parser tests give some idea of the current behaviour:

https://github.com/ukwa/tika/blob/experimental-warc-parsing/tika-parsers/sr
c/test/java/org/apache/tika/parser/warc/WARCParserTest.java

HTH,
Andy
On 11/07/2017, 19:11, "Sebastian Nagel" <[EMAIL PROTECTED]> wrote:

******************************************************************************************************************
Experience the British Library online at www.bl.uk<http://www.bl.uk/>
The British Library’s latest Annual Report and Accounts : www.bl.uk/aboutus/annrep/index.html<http://www.bl.uk/aboutus/annrep/index.html>
Help the British Library conserve the world's knowledge. Adopt a Book. www.bl.uk/adoptabook<http://www.bl.uk/adoptabook>
The Library's St Pancras site is WiFi - enabled
*****************************************************************************************************************
The information contained in this e-mail is confidential and may be legally privileged. It is intended for the addressee(s) only. If you are not the intended recipient, please delete this e-mail and notify the [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> : The contents of this e-mail must not be disclosed or copied without the sender's consent.
The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of the British Library. The British Library does not take any responsibility for the views of the author.
*****************************************************************************************************************
Think before you print
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB