Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: .   Results from 121 to 130 from 805 (5.436s).
Loading phrases to help you
refine your search...
Re: How to configure nutch so that apache tika can extract all the tags ? - Nutch - [mail # user]
...Hi Kiran  You should be able to do that with either parse-html and parse-tika by implementing an extension of HtmlParseFilter and store the attr_* values in the parse metadata then writ...
   Author: Julien Nioche, 2012-09-07, 09:20
Re: SolrDeleteDuplicates bug - Nutch - [mail # user]
...Hi,  1.6 is not published yet and is the trunk on SVN see http://nutch.apache.org/version_control.html   On 6 September 2012 20:17,  wrote:     * *Open Source Soluti...
   Author: Julien Nioche, 2012-09-06, 19:44
Re: nutch 1.5 not able to parse mutliValued metatags - Nutch - [mail # user]
...Hi Kiran  This looks like a possible improvement indeed. Please open an issue on JIRA  https://issues.apache.org/jira/browse/NUTCH Thanks  Julien  On 5 September 2012 20:...
   Author: Julien Nioche, 2012-09-06, 08:38
Re: Cached page (like google) with hits highlighted - Nutch - [mail # user]
...Sorry I had missed your previous comments.  On 16 August 2012 09:32, Julien Nioche wrote:    * *Open Source Solutions for Text Engineering  http://digitalpebble.blogspot....
   Author: Julien Nioche, 2012-08-16, 08:33
Re: Cached page (like google) with hits highlighted - Nutch - [mail # user]
...You need to use parse-tika, however the underlying parser for pdf does not currently generate much markup, the Word one does IIRC.  Why don't you try Tika standalone with its GUI to exp...
   Author: Julien Nioche, 2012-08-16, 08:32
Re: how to add raw HTML field to Solr - Nutch - [mail # user]
...Max,  Can you please open an issue on JIRA? I think it would make sense to have the possibility to add the raw content as a binary field so that people could use it on the SOLR side as ...
   Author: Julien Nioche, 2012-08-16, 08:27
Re: duplicate jar files by plugin dependencies - Nutch - [mail # dev]
...+1 to using the maven-dependency-plugin within our ANT script. I think I had put a preliminary version for 1.x in JIRA but we'd need to extend the mechanism to the plugins as well.  On ...
   Author: Julien Nioche, 2012-08-10, 10:10
Re: cache field in index-basic in 2.X - Nutch - [mail # user]
... it does not concern metadata, we store as metadata the policies regarding caching that are specified in the html pages ( http://www.i18nguy.com/markup/metatags.html) then store the pol...
   Author: Julien Nioche, 2012-08-10, 09:39
Re: CHM Files and Tika - Nutch - [mail # user]
...new JIRA?  On 9 August 2012 23:30, Markus Jelsma  wrote:     * *Open Source Solutions for Text Engineering  http://digitalpebble.blogspot.com/ http://www.digitalpebb...
   Author: Julien Nioche, 2012-08-10, 07:32
Re: cache field in index-basic in 2.X - Nutch - [mail # user]
...Could this be for the html meta directives?  On 9 August 2012 22:36, Lewis John Mcgibbney wrote:     * *Open Source Solutions for Text Engineering  http://digitalpebble.b...
   Author: Julien Nioche, 2012-08-10, 07:30
Sort:
project
Nutch (805)
Tika (37)
Lucene (30)
Mahout (8)
Solr (5)
ManifoldCF (4)
Droids (1)
type
mail # user (430)
mail # dev (253)
issue (122)
date
last 7 days (0)
last 30 days (6)
last 90 days (24)
last 6 months (68)
last 9 months (805)
author
Markus Jelsma (1767)
Lewis John Mcgibbney (1125)
Julien Nioche (805)
Mattmann, Chris A (402)
lewis john mcgibbney (334)
Andrzej Bialecki (302)
Ferdy Galema (224)
Tejas Patil (164)
Bai Shen (163)
kiran chitturi (157)
Sebastian Nagel (156)
alxsss@...)
remi tassing (133)
Lewis John McGibbney (129)
Gabriele Kahlout (115)