Sorry, it’s an error. I need the text content of PDF, txt and doc docx to
index in solr.

 

Thanks for your help.

 

 

De : msaunier [mailto:[EMAIL PROTECTED]]
Envoyé : vendredi 5 janvier 2018 18:05
À : [EMAIL PROTECTED]
Objet : OCR Tika to read PDF, txt and doc docx

 

Hello,

 

How can I used/install an OCR to extract the content_html in files with
ManifoldCF ?

I need the HTML content.

 

Thanks for your help,
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB