| clear query|facets|time |
Search criteria: .
Results from 191 to 200 from
805 (0.258s).
|
|
|
Loading phrases to help you refine your search...
|
|
Re: VOTE Apache Nutch 2.0 RC1 - Nutch - [mail # dev]
|
|
...Before you do, could you check that NutchGora passes ant test successfully. I just tried and got an error related to the parse-tika tests. Am about to open a JIRA to update to the latest ver...
|
|
|
Author: Julien Nioche,
2012-06-15, 09:43
|
|
|
Re: VOTE Apache Nutch 2.0 RC1 - Nutch - [mail # dev]
|
|
...+1 On 15 June 2012 09:00, Ferdy Galema wrote: * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com htt...
|
|
|
Author: Julien Nioche,
2012-06-15, 08:39
|
|
|
Re: VOTE Apache Nutch 2.0 RC1 - Nutch - [mail # dev]
|
|
...yep, remember that you can't build from the bin package so inevitably someone will wonder why only such or such backend is available etc... another option is to NOT have a binary relea...
|
|
|
Author: Julien Nioche,
2012-06-14, 20:56
|
|
|
Re: VOTE Apache Nutch 2.0 RC1 - Nutch - [mail # dev]
|
|
...I disagree. You'd expect a binary release to work out of the box - which is not the case. Plus we'd have to spend more time explaining the workaround, answering the same questions over and o...
|
|
|
Author: Julien Nioche,
2012-06-14, 20:39
|
|
|
Re: Suitable Nutch 2.0 Project Description - Nutch - [mail # dev]
|
|
..." and and array other document " looks like a typo, rest is fine On 13 June 2012 13:45, Ferdy Galema wrote: * *Open Source Solutions for Text Engineering http:...
|
|
|
Author: Julien Nioche,
2012-06-13, 14:40
|
|
|
Re: VOTE Apache Nutch 2.0 RC1 - Nutch - [mail # dev]
|
|
...Ferdy The binary distrib corresponds to runtime/local and as such should NOT have the job file there. This is now the norm since 1.5 Will try and do some testing of the RC  ...
|
|
|
Author: Julien Nioche,
2012-06-13, 14:38
|
|
|
Re: very long fetch reduce task - Nutch - [mail # user]
|
|
...unless the parsing is activated in the fetch step - this is likely to be a different issue e.g. normalization of URL taking forever or something like this. Use jstack to see what the problem...
|
|
|
Author: Julien Nioche,
2012-06-13, 14:36
|
|
|
Re: How to ensure even distribution of the fetch phase across Hadoop nodes - Nutch - [mail # user]
|
|
... Ok, let's get a few things right. What you are referring to is called [-numFetchers] when using the command nutch generate. It splits the output into separate files which are then use...
|
|
|
Author: Julien Nioche,
2012-06-13, 14:33
|
|
|
Re: How to ensure even distribution of the fetch phase across Hadoop nodes - Nutch - [mail # user]
|
|
...Guys, This has to do with the way URLs are grouped for politeness and not so much with the number of blocks in the input. Limiting the URLs by # host names, domains or IP is a wa...
|
|
|
Author: Julien Nioche,
2012-06-12, 13:56
|
|
|
Re: Getting seed url - Nutch - [mail # user]
|
|
...forgot to say : this would work by adding a seed metadata to the urls in the seed list, the value of which is then propagated by the scoring filter in urlmeta On 12 June 2012 14:41, Ju...
|
|
|
Author: Julien Nioche,
2012-06-12, 13:42
|
|
|
|