I would try em’ all out honestly. Performance-wise, setup wise they are kind of different, though
Tika boils it down to a config file for each which is nice. I am working on a paper that compares
all of them but am not done yet ;)

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Principal Data Scientist, Engineering Administrative Office (3010)
Manager, NSF & Open Source Projects Formulation and Development Offices (8212)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 180-503E, Mailstop: 180-503
Email: [EMAIL PROTECTED]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
From: "Merrill, Jeremy" <[EMAIL PROTECTED]>
Reply-To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Date: Monday, March 20, 2017 at 11:59 AM
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Subject: Re: machine translation recommendation for use with Tika?

Hi Chris,
Thank you, this is helpful. I think running our own system is out of the question, just on account of time (News just keeps on happening. Though it'd certainly would be fun to play with...) and -- presumably -- result quality.
Do you have thoughts on which of Google, Microsoft and Lingo24 might be easiest? Or are they all just as easy to use with Tika and I should just try 'em all out?
Thanks,

---
Jeremy B. Merrill
The New York Times
On Mon, Mar 20, 2017 at 1:43 PM, Mattmann, Chris A (3010) <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Hi Jeremy,

Thanks for reaching out.

So far I have had really good experience with the Lingo24 translator. It really depends though
and is based on two families of what you are trying to do. For example, if you want the widest,
most broad coverage and trained translation, Google, Microsoft, Lingo24, fall into the remote
translation API service category. They all have tons of data, and training. I also think all use
human curators for quality review of some things. All will eventually cost you. I know that you
get some X million characters of translation a month in the services.

On the other end is if you deploy your own Apache Joshua (incubating) and/or Moses MT system,
and then have Tika connect to them as a service. In this case you control the costs and can run it
on your own servers, etc, but you are limited by the quality of your trained models, and your language
pairs.

Does this make sense?

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Principal Data Scientist, Engineering Administrative Office (3010)
Manager, NSF & Open Source Projects Formulation and Development Offices (8212)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 180-503E, Mailstop: 180-503
Email: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
From: "Merrill, Jeremy" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Reply-To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Date: Monday, March 20, 2017 at 8:30 AM
To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Subject: machine translation recommendation for use with Tika?

Hi friends,

I've been tasked with figuring out how to machine-translate a large set of documents from a common European language into English, using a system that already utilizes Tika.

I know Tika integrates with a handful of machine-translation APIs<https://tika.apache.org/1.14/api/org/apache/tika/language/translate/package-summary.html>. Do you all have a sense of which works best, both in terms of translation quality and ease of integration with Tika?

(We know we're going to have to pay, but the amount of content won't be huge, so differences in price aren't a big factor.)

Thanks in advance,
Jeremy B. Merrill
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB