Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: .   Results from 191 to 200 from 805 (0.258s).
Loading phrases to help you
refine your search...
Re: VOTE Apache Nutch 2.0 RC1 - Nutch - [mail # dev]
...Before you do, could you check that NutchGora passes ant test successfully. I just tried and got an error related to the parse-tika tests. Am about to open a JIRA to update to the latest ver...
   Author: Julien Nioche, 2012-06-15, 09:43
Re: VOTE Apache Nutch 2.0 RC1 - Nutch - [mail # dev]
...+1  On 15 June 2012 09:00, Ferdy Galema  wrote:    * *Open Source Solutions for Text Engineering  http://digitalpebble.blogspot.com/ http://www.digitalpebble.com htt...
   Author: Julien Nioche, 2012-06-15, 08:39
Re: VOTE Apache Nutch 2.0 RC1 - Nutch - [mail # dev]
...yep, remember that you can't build from the bin package so inevitably someone will wonder why only such or such backend is available etc...  another option is to NOT have a binary relea...
   Author: Julien Nioche, 2012-06-14, 20:56
Re: VOTE Apache Nutch 2.0 RC1 - Nutch - [mail # dev]
...I disagree. You'd expect a binary release to work out of the box - which is not the case. Plus we'd have to spend more time explaining the workaround, answering the same questions over and o...
   Author: Julien Nioche, 2012-06-14, 20:39
Re: Suitable Nutch 2.0 Project Description - Nutch - [mail # dev]
..." and and array other document " looks like a typo, rest is fine  On 13 June 2012 13:45, Ferdy Galema  wrote:    * *Open Source Solutions for Text Engineering  http:...
   Author: Julien Nioche, 2012-06-13, 14:40
Re: VOTE Apache Nutch 2.0 RC1 - Nutch - [mail # dev]
...Ferdy   The binary distrib corresponds to runtime/local and as such should NOT have the job file there. This is now the norm since 1.5  Will try and do some testing of the RC  ...
   Author: Julien Nioche, 2012-06-13, 14:38
Re: very long fetch reduce task - Nutch - [mail # user]
...unless the parsing is activated in the fetch step - this is likely to be a different issue e.g. normalization of URL taking forever or something like this. Use jstack to see what the problem...
   Author: Julien Nioche, 2012-06-13, 14:36
Re: How to ensure even distribution of the fetch phase across Hadoop nodes - Nutch - [mail # user]
...  Ok, let's get a few things right. What you are referring to is called [-numFetchers] when using the command nutch generate. It splits the output into separate files which are then use...
   Author: Julien Nioche, 2012-06-13, 14:33
Re: How to ensure even distribution of the fetch phase across Hadoop nodes - Nutch - [mail # user]
...Guys,  This has to do with the way URLs are grouped for politeness and not so much with the number of blocks in the input. Limiting the URLs by #  host names, domains or IP is a wa...
   Author: Julien Nioche, 2012-06-12, 13:56
Re: Getting seed url - Nutch - [mail # user]
...forgot to say : this would work by adding a seed metadata to the urls in the seed list, the value of which is then propagated by the scoring filter in urlmeta  On 12 June 2012 14:41, Ju...
   Author: Julien Nioche, 2012-06-12, 13:42
Sort:
project
Nutch (805)
Tika (37)
Lucene (30)
Mahout (8)
Solr (5)
ManifoldCF (4)
Droids (1)
type
mail # user (430)
mail # dev (253)
issue (122)
date
last 7 days (0)
last 30 days (6)
last 90 days (24)
last 6 months (68)
last 9 months (805)
author
Markus Jelsma (1767)
Lewis John Mcgibbney (1125)
Julien Nioche (805)
Mattmann, Chris A (402)
lewis john mcgibbney (334)
Andrzej Bialecki (302)
Ferdy Galema (224)
Tejas Patil (164)
Bai Shen (163)
kiran chitturi (157)
Sebastian Nagel (156)
alxsss@...)
remi tassing (133)
Lewis John McGibbney (129)
Gabriele Kahlout (115)