Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: .   Results from 1 to 10 from 816 (0.355s).
Loading phrases to help you
refine your search...
Re: HTMLParseFilter equivalent in Nutch 2.2 ??? - Nutch - [mail # user]
...They are called ParseFilters in 2.x : http://nutch.apache.org/apidocs-2.2/org/apache/nutch/parse/ParseFilter.html as they are not limited to processing HTML documents since Tika generates SA...
   Author: Julien Nioche, 2013-06-12, 12:46
Re: Data Extraction from 100+ different sites... - Nutch - [mail # user]
...What I usually do in cases like these is to propagate an identifier from the seeds and use that in the HTMLParsers to determine whether they should process a page. See url-meta plugin for th...
   Author: Julien Nioche, 2013-06-12, 08:01
Re: [DISCUSS] Nutch 1.7 ready for release? - Nutch - [mail # dev]
...Have just committed NUTCH-1522 for both 2-x and trunk   On 10 June 2013 12:07, Julien Nioche  wrote:     * *Open Source Solutions for Text Engineering  http://digita...
   Author: Julien Nioche, 2013-06-10, 11:27
Re: [DISCUSS] Nutch 1.7 ready for release? - Nutch - [mail # dev]
...Have added the upgrade to Tika 1.3 to v1.7 https://issues.apache.org/jira/browse/NUTCH-1522. It should be quite straightforward to include and would be a shame not to do it for this release....
   Author: Julien Nioche, 2013-06-10, 11:07
Re: [DISCUSS] Nutch 1.7 ready for release? - Nutch - [mail # dev]
...+1 to release now but it would have been nice to do https://issues.apache.org/jira/browse/NUTCH-1527 as part of the same release. The main change introduced in this version is the pluggable ...
   Author: Julien Nioche, 2013-06-10, 07:48
Re: [RESULT] WAS: Re: [VOTE] Apache Nutch 2.2 Release Candidate - Nutch - [mail # dev]
...Hi Lewis,  The md5, asc and sha are now correct. Thanks for fixing it.  Have a nice week end  Julien   On 7 June 2013 21:16, Lewis John Mcgibbney wrote:    * *O...
   Author: Julien Nioche, 2013-06-08, 10:09
Job opening in Bristol, UK - Nutch - [mail # dev]
...[Apologies for cross posting]  We are looking for a candidate with the following skills and expertise :      * experience in web crawling, ideally with Apache Nutch  ...
   Author: Julien Nioche, 2013-06-06, 18:59
Re: Unable to crawl google search results - Nutch - [mail # user]
...Check your URL filters e.g. that you removed the lines below which are there by default  *# skip URLs containing certain characters as probable queries, etc.* *-[?*!@=]*  Julien &n...
   Author: Julien Nioche, 2013-06-04, 22:23
Re: [VOTE] Apache Nutch 2.2 Release Candidate - Nutch - [mail # dev]
...Hi,  The keys live at http://www.apache.org/dist/nutch/KEYS ( http://nutch.apache.org/dist/KEYS gives a 404)  Aren't the .asc files supposed to be there as well? ( http://www.apach...
   Author: Julien Nioche, 2013-06-04, 06:34
Re: Generic LinkRank plugin for Nutch - Nutch - [mail # dev]
...Hi Ahmet,  You don't need to use the ScoringFilters at all.  The nutch.scoring.webgraph package can be taken as an example of how to do. It works fine as far as I know but what we ...
   Author: Julien Nioche, 2013-05-30, 08:41
Sort:
project
Nutch (816)
Tika (37)
Lucene (30)
Mahout (8)
Solr (5)
ManifoldCF (4)
Droids (1)
type
mail # user (434)
mail # dev (260)
issue (122)
date
last 7 days (2)
last 30 days (11)
last 90 days (24)
last 6 months (64)
last 9 months (816)
author
Markus Jelsma (1783)
Lewis John Mcgibbney (1175)
Julien Nioche (816)
Mattmann, Chris A (405)
lewis john mcgibbney (336)
Andrzej Bialecki (302)
Ferdy Galema (229)
Tejas Patil (218)
Bai Shen (177)
kiran chitturi (165)
Sebastian Nagel (163)
alxsss@...)
remi tassing (133)
Lewis John McGibbney (129)
Gabriele Kahlout (115)