-Re: [VOTE] Move 2.0 out of trunk
Julien Nioche 2011-09-19, 12:28
A few comments below :
My vote is thumbs down: -1
> I am only involved in Nutch 2.0 and that would be put the back burner...
It has never left it so that's not much of a change :-) Nutch 2.0 (and GORA)
has had more than a year to gather momentum and it hasn't.
More seriously, as Chris explained people will still be able to work on 2.0
if they want to, the code is moved, not RE-moved. The other aspect of the
change is that we won't keep necessarily 1.x sync with 2.0 - it has been a
complete pain to have to maintain two branches at the same time and most
people (judging by the votes) are fed up with it. We are making good
progress on 1.x and 2.0 should not be hold us back.
Again if people have the time and inclination to work on 2.0 then they will
still be able to do so.
> I'm glad to hear that there at least 2 people in the community that do
> business in their field and proudly use a Nutch-based crawler together with
> Cassandra to store the data through Gora. That would not have been possible
> with Nutch 1.x version.
Not clear what you mean by not possible with Nutch 1. From a functionality
point of view there is nothing in 2.0 that you can't do with 1.x, the
reverse is not true (e.g. multiple outputs for parse) + 2.0 has a large
number of bugs and is not fit for use in production
I am sure that there are more than 2 users of Nutch 2.0 out there but
that's after more than a year of having Nutch in trunk and is quite small
compared to the number of users of 1.x
> Maybe this has been widely discussed already. IMOO, crawl segments are
> hard-to-maintain and easily lost. If you want to do that HDFS is what you
> are looking for. Even Yahoo has given up and is now using Microsoft updated
> crawl information in order to implement search. They use HBase which is, by
> the way, Nutch 2.0 compatible.
> Take at look:
> http://developer.yahoo.com/events/hadoopsummit2011/agenda.html#22 (sorry I
> don't think any video of the summit is available yet, not sure why)
The advantages in having a single crawl table are well known and this is why
we wanted to do that in 2.0. Again, if people want to get involved and
improve it they will be able to do so.
> On Mon, Sep 19, 2011 at 1:05 AM, Julien Nioche <
> [EMAIL PROTECTED]> wrote:
> Here is my vote :
>> +1 : Shelve 2.0 and move 1.4 to trunk
>> On 18 September 2011 10:21, Julien Nioche <[EMAIL PROTECTED]>wrote:
>>> Following the discussions  on the dev-list about the future of Nutch
>>> 2.0, I would like to call for a vote on moving Nutch 2.0 from the trunk to a
>>> separate branch, promote 1.4 to trunk and consider 2.0 as unmaintained. The
>>> arguments for / against can be found in the thread I mentioned.
>>> The vote is open for the next 72 hours.
>>> [ ] +1 : Shelve 2.0 and move 1.4 to trunk
>>>  0 : No opinion
>>>  -1 : Bad idea. Please give justification.
>>> http://www.mail-archive.com/[EMAIL PROTECTED]/msg00483.html<http://mail-archives.apache.org/mod_mbox/nutch-dev/201109.mbox/%3CCA+[EMAIL PROTECTED]%3E>
>>> *Open Source Solutions for Text Engineering
>> *Open Source Solutions for Text Engineering
*Open Source Solutions for Text Engineering