Awesome!

Any tips on building Beam?  Should it work on (dare I say) Windows?

Intellij is complaining that it can't find jdk.tools:jdk.tools:1.6 as a dependency under much of the Hadoop modules.

mvn clean install is failing at Beam::SDKS::Java::Core
[ERROR]   AvroIOTest.testWriteDisplayData:561
Expected: display data with item: (with key is "filePrefix" and with type is <STRING> and with value is "/foo")
     but: found 6 non-matching item(s):
<[]org.apache.beam.sdk.io.AvroIO$Write:codec=snappy
[]org.apache.beam.sdk.io.AvroIO$Write:schema=org.apache.beam.sdk.io.AvroIOTest$GenericClass
[]org.apache.beam.sdk.io.AvroIO$Write:fileSuffix=bar
[]org.apache.beam.sdk.io.AvroIO$Write:numShards=100
[]org.apache.beam.sdk.io.AvroIO$Write:shardNameTemplate=-SS-of-NN-
[]org.apache.beam.sdk.io.AvroIO$Write:filePrefix=C:\foo>
[ERROR]   FileBasedSinkTest.testRemoveWithTempFilename:148->testRemoveTemporaryFiles:261 temp file C:\Users\tallison\AppData\Local\Temp\junit5212433513605155196\temp\file0 exists
Expected: is <false>
     but: was <true>
[ERROR]   FileBasedSourceTest.testSplittingFailsOnEmptyFileExpansion
Expected: (an instance of java.io.FileNotFoundException and exception with message a string containing "No files found for spec: C:\Users\tallison\AppData\Local\Temp\junit1719865221821921346\junit7087025770573441186/missing.txt")
     but: an instance of java.io.FileNotFoundException <java.lang.IllegalStateException: Unable to find registrar for c> is a java.lang.IllegalStateException
Stacktrace was: java.lang.IllegalStateException: Unable to find registrar for c
        at org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:447)
        at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:111)
among many other errors...
-----Original Message-----
From: Sergey Beryozkin [mailto:[EMAIL PROTECTED]]
Sent: Thursday, May 25, 2017 12:47 PM
To: Allison, Timothy B. <[EMAIL PROTECTED]>; [EMAIL PROTECTED]
Subject: Re: Integrating Tika with Apache Beam

Hi Guys

The link to the initial code is available in JIRA, at this stage the focus is on preparing a solid initial PR, and then we can all improve Tika related code :-)

Cheers, Sergey
On 24/05/17 11:41, Sergey Beryozkin wrote:
--
Sergey Beryozkin

Talend Community Coders
http://coders.talend.com/
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB