-Re: NutchHadoopTutorial Updated
Julien Nioche 2012-03-20, 14:20
Note : the case variation in my previous email is purely accidental. I did
not intend to shout or make the first part more important than the second
On 20 March 2012 14:19, Julien Nioche <[EMAIL PROTECTED]> wrote:
> The section Deploy Nutch to Single Machine is probably based on an old
> version of Nutch and quite misleading. Wether you are in fully or pseudo
> distributed mode all you need to do is build the job file from the Nutch
> root, go to runtime deploy and use the Nutch command from the bin
> directory. There aren't any conf files or hadoop executable anymore. If you
> need to change something in the conf e.g. url filter files, you need to
> rebuild a new job file.
> This is definitely a good effort but IMHO most of it is about Hadoop
> configuration which is very well explained on the Hadoop pages
> http://hadoop.apache.org/common/docs/stable/single_node_setup.html. I
> think we should refer to them systematically and focus on the Nutch
> specific parts instead.
> On 20 March 2012 13:45, Lewis John Mcgibbney <[EMAIL PROTECTED]>wrote:
>> No I'm taking it out right now. Thanks troops. :)
>> On Tue, Mar 20, 2012 at 1:38 PM, Mathijs Homminga <
>> [EMAIL PROTECTED]> wrote:
>> > >> About the section "Deploy Nutch to Multiple Machines": this is not
>> > >> necessary right? The job jar should be self containing and ship with
>> > >> the configuration files necessary. Nutch should be able to run on any
>> > >> vanilla Hadoop cluster.
>> > >
>> > > It does. All you need is a healthy cluster and a Hadoop environment
>> > (cluster
>> > > or local) that points to the jobtracker.
>> > Exactly ;)
>> > Lewis, any reason to keep this section in there?
>> > Mathijs
> *Open Source Solutions for Text Engineering
*Open Source Solutions for Text Engineering