Hi Jeremy,

Thanks for reaching out.

So far I have had really good experience with the Lingo24 translator. It really depends though
and is based on two families of what you are trying to do. For example, if you want the widest,
most broad coverage and trained translation, Google, Microsoft, Lingo24, fall into the remote
translation API service category. They all have tons of data, and training. I also think all use
human curators for quality review of some things. All will eventually cost you. I know that you
get some X million characters of translation a month in the services.

On the other end is if you deploy your own Apache Joshua (incubating) and/or Moses MT system,
and then have Tika connect to them as a service. In this case you control the costs and can run it
on your own servers, etc, but you are limited by the quality of your trained models, and your language

Does this make sense?


Chris Mattmann, Ph.D.
Principal Data Scientist, Engineering Administrative Office (3010)
Manager, NSF & Open Source Projects Formulation and Development Offices (8212)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 180-503E, Mailstop: 180-503
WWW:  http://sunset.usc.edu/~mattmann/
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
From: "Merrill, Jeremy" <[EMAIL PROTECTED]>
Date: Monday, March 20, 2017 at 8:30 AM
Subject: machine translation recommendation for use with Tika?

Hi friends,

I've been tasked with figuring out how to machine-translate a large set of documents from a common European language into English, using a system that already utilizes Tika.

I know Tika integrates with a handful of machine-translation APIs<https://tika.apache.org/1.14/api/org/apache/tika/language/translate/package-summary.html>. Do you all have a sense of which works best, both in terms of translation quality and ease of integration with Tika?

(We know we're going to have to pay, but the amount of content won't be huge, so differences in price aren't a big factor.)

Thanks in advance,
Jeremy B. Merrill
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB