Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: .   Results from 91 to 100 from 6048 (4.804s).
Loading phrases to help you
refine your search...
Re: how to add more metadata to tika extraction? - Tika - [mail # dev]
...On Wed, 27 Feb 2013, eShard wrote:  Looks like the metadata you want isn't being pulled out as metadata by  Tika   Metadata != content  I'd suspect that if you look at th...
   Author: Nick Burch, 2013-03-05, 21:33
Re: How to hide some Excel content - Tika - [mail # user]
...OK. I was just wondering if there was a built-in way to specify a customer handler that could do something like this to avoid compiling a custom version of the project.    I see. G...
   Author: CL, 2013-03-05, 17:34
Re: How to hide some Excel content - Tika - [mail # user]
...On Tue, 5 Mar 2013, CL wrote:  There are several examples in Apache POI, and the code behind Tika is open  source. Skipping certain slides should be fairly easy, other things will ...
   Author: Nick Burch, 2013-03-05, 17:27
Re: How to hide some Excel content - Tika - [mail # user]
...Thanks for your feedback. I may go that route if I have to, but I'm not finding any good converters. I was hoping to avoid writing my own, which is why I'm trying Tika. Do you know if there'...
   Author: CL, 2013-03-05, 17:22
Re: How to hide some Excel content - Tika - [mail # user]
...On Tue, 5 Mar 2013, CL wrote:  If you have quite specific requirements (which it sounds liek you do), and  only need to work with one file format, you're probably better off callin...
   Author: Nick Burch, 2013-03-05, 15:32
How to hide some Excel content - Tika - [mail # user]
...Hi, I just started using Tika (1.3) for converting Excel (OOXML) content to HTML. Looking good. Two things I'm wondering...  1) Is there a way to convert only a specific worksheet of a ...
   Author: CL, 2013-03-05, 15:26
Re: Improvement in Metadata Class - Tika - [mail # user]
...Hey Lewis,  RE: #3 — it would be great to get Nutch using Tika's metadata container — I don't think we have anything special in Nutch that prevents it. RE: #2 — I committed your Tika do...
   Author: Mattmann, Chris A, 2013-03-04, 05:41
Re: IdentityHtmlMapper not used by Boilerpipe? - Tika - [mail # user]
...unsubscribe   On Fri, Mar 1, 2013 at 7:35 AM, Markus Jelsma wrote:     Dan Klueter...
   Author: Dan Klueter, 2013-03-02, 00:03
[TIKA-1085] PDF header and mime detection - Tika - [issue]
...I've found some PDF files Tika recognizes as application/octet-stream.These files differs from regularly identified PDF having a different header: the %PDF-N.n string isn't at the beginning ...
http://issues.apache.org/jira/browse/TIKA-1085    Author: Marco Quaranta, 2013-03-01, 13:53
IdentityHtmlMapper not used by Boilerpipe? - Tika - [mail # user]
...Hi,  We need div elements returned when we pass the stream through Boilerpipe from Nutch. We enable includeMarkup to get markup returned in the first place, but divs are not returned. I...
   Author: Markus Jelsma, 2013-03-01, 12:35
Sort:
project
Lucene (130008)
Solr (104012)
ElasticSearch (33869)
Mahout (31332)
Nutch (16551)
ManifoldCF (15141)
Tika (5956)
Lucene.Net (5782)
PyLucene (1905)
Droids (1668)
Lucy (1359)
OpenRelevance (286)
type
javadoc (1746)
mail # dev (1433)
mail # user (1276)
issue (1097)
source code (357)
Sematext # blog (92)
web site (38)
wiki (9)
date
last 7 days (3)
last 30 days (14)
last 90 days (118)
last 6 months (459)
last 9 months (3945)
author
Jukka Zitting (530)
Nick Burch (410)
Mattmann, Chris A (324)
Michael McCandless (176)
Ken Krugler (161)
buildbot@...)
Oleg Tikhonov (58)
Markus Jelsma (56)
Mark Kerzner (53)
Dave Meikle (49)
Maxim Valyanskiy (46)
Keith R. Bennett (45)
Ray Gauss II (40)
Antoni Mylka (37)
Benson Margulies (37)