|
Lewis John Mcgibbney
2012-06-25, 16:13
Lewis John Mcgibbney
2012-06-27, 06:53
Julien Nioche
2012-06-27, 08:06
Markus Jelsma
2012-06-27, 13:11
Lewis John Mcgibbney
2012-06-27, 18:32
Markus Jelsma
2012-06-28, 07:32
Julien Nioche
2012-06-28, 08:49
Lewis John Mcgibbney
2012-06-28, 17:42
Lewis John Mcgibbney
2012-07-02, 18:01
|
-
[VOTE] Apache Nutch 1.5.1 Release CandidateLewis John Mcgibbney 2012-06-25, 16:13
Hi Everyone,
A candidate for the Apache Nutch 1.5.1 RC#1 is available at: http://people.apache.org/~lewismc/apache-nutch-1.5.1-rc1 The release candidate is a src.zip, src.tar.gz, bin-zip and bin-tar.gz archive of the sources in: http://svn.apache.org/repos/asf/nutch/tags/release-1.5.1 This Release Candidate (and subsequent release) is a bug fix of the recently released Apache Nutch 1.5 Further, a staged Maven repository of the 1.5.1 jar, sources.jar and javadoc.jar is available here: https://repository.apache.org/content/repositories/orgapachenutch-274 Please vote on releasing this package as Apache Nutch 1.5.1. The vote is open for the next 72 hours and passes if a majority of at least three +1 Nutch PMC votes are cast. [ ] +1 Release this package as Apache Nutch 1.5.1 [ ] -1 Do not release this package because... Many Thanks and heres to plenty more. Kind Regards, Lewis P.S. Here's my +1. -- Lewis
-
Re: [VOTE] Apache Nutch 1.5.1 Release CandidateLewis John Mcgibbney 2012-06-27, 06:53
Hi Guys,
The RC was pulled from the most recent commit to ./src/bin/nutch which I believe was to remove the -core developer CLI options which are now deprecated. This was a much cleaner option for providing a 1.5.1 branch and generating the relevant 1.5.1 tag. If we it's required I can run a 1.5.1RC2 tonight with JUST the patch to ./src/bin/nutch which corrects the .job issue is deploy mode? Please let me know Thanks Lewis On Wed, Jun 27, 2012 at 3:09 AM, Mattmann, Chris A (388J) <[EMAIL PROTECTED]> wrote: > Hey Markus, > > Don't beat yourself up over it -- you did awesome work and have been contributing a ton > so who cares! > > If we need to do another patch release, we can easily do it (especially with super release > guy Lewis!) > > Cheers, > Chris > > On Jun 26, 2012, at 3:55 PM, Markus Jelsma wrote: > >> Hi, >> >> The HostURLNormalizer is not supposed to be in 1.5.1, this is true for other issues as well. Nutch 1.5.1 is a bugfix release and should not be pulled from trunk but from the tag + the required patches, i didn't notice it was pulled from trunk until now. >> >> The build issue has for that plugin is already resolved in trunk, i had a bad commit for which i'm very sorry. >> >> Thanks >> Markus >> >> -----Original message----- >>> From:Sebastian Nagel <[EMAIL PROTECTED]> >>> Sent: Tue 26-Jun-2012 23:22 >>> To: [EMAIL PROTECTED] >>> Subject: Re: [VOTE] Apache Nutch 1.5.1 Release Candidate >>> >>> -1 >>> The plugin urlnormalizer-host (NUTCH-1319 listed in CHANGES.txt) >>> is missing in the bin package. >>> It also does not build for the src package: it's missing in >>> src/plugins/build.xml of 1.5.1. >>> >>> @Markus: >>> You are right: up to 1.4 there was a top-level folder apache-nutch-1.x/ >>> in the package (src and bin). That's cosmetic but not blocking. Also: >>> permissions of bin/nutch should be 755 (exec bits should be set). >>> >>> Beside: Runs (tested local mode only). >>> >>> Sebastian >>> >>> On 06/26/2012 06:32 PM, Markus Jelsma wrote: >>>> This was command line. I didn't notive it with 1.5 because i unpacked that in a GUI. It really unpacks in the cwd, or my system makes a fool out of me :) >>>> >>>> wget http://people.apache.org/~lewismc/apache-nutch-1.5.1-rc1/apache-nutch-1.5.1-src.tar.gz >>>> tar -xvzf apache-nutch-1.5.1-src.tar.gz >>>> >>>> ls >>>> apache-nutch-1.5.1-src.tar.gz CHANGES.txt default.properties ivy LICENSE.txt README.txt >>>> build.xml conf docs lib NOTICE.txt src >>>> >>>> It is not a blocked indeed but may be annoying. >>>> >>>> +1 >>>> >>>> -----Original message----- >>>>> From:Julien Nioche <[EMAIL PROTECTED]> >>>>> Sent: Tue 26-Jun-2012 17:59 >>>>> To: [EMAIL PROTECTED] >>>>> Subject: Re: [VOTE] Apache Nutch 1.5.1 Release Candidate >>>>> >>>>> Probably depends on the tool you are using to open the archive. It does >>>>> that with File Roller on Ubuntu but works fine on the command line or when >>>>> doing "extract here" from the file menu >>>>> >>>>> Not a blocker IMHO >>>>> >>>>> On 26 June 2012 08:04, Markus Jelsma <[EMAIL PROTECTED]> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> It builds and runs smoothly but there's something that didn't catch my eye >>>>>> with 1.5 since i then used a GUI to unpack the src file, the src and bin >>>>>> packages decompresses everything in the cwd, this means no apache-nutch-1.5 >>>>>> folder is created. This was the case with 1.4 and earlier. I believe this >>>>>> should be addressed. >>>>>> >>>>>> Cheers >>>>>> >>>>>> >>>>>> -----Original message----- >>>>>>> From:Lewis John Mcgibbney <[EMAIL PROTECTED]> >>>>>>> Sent: Mon 25-Jun-2012 23:01 >>>>>>> To: [EMAIL PROTECTED] >>>>>>> Subject: Re: [VOTE] Apache Nutch 1.5.1 Release Candidate >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Of course all VOTE's are extremely important and very much valued >>>>>>> within the overall community VOTE'ing process. >>>>>>> >>>>>>> Thanks for getting in touch and for reviewing. If you are able to Lewis
-
Re: [VOTE] Apache Nutch 1.5.1 Release CandidateJulien Nioche 2012-06-27, 08:06
Hi,
We'll have the things that are currently not working anyway so we might as well include them in 1.5.1. If you guys prefer doing the minimal stuff instead I am fine with that too Jul On 27 June 2012 07:53, Lewis John Mcgibbney <[EMAIL PROTECTED]>wrote: > Hi Guys, > > The RC was pulled from the most recent commit to ./src/bin/nutch which > I believe was to remove the -core developer CLI options which are now > deprecated. This was a much cleaner option for providing a 1.5.1 > branch and generating the relevant 1.5.1 tag. > > If we it's required I can run a 1.5.1RC2 tonight with JUST the patch > to ./src/bin/nutch which corrects the .job issue is deploy mode? > > Please let me know > > Thanks > > Lewis > > On Wed, Jun 27, 2012 at 3:09 AM, Mattmann, Chris A (388J) > <[EMAIL PROTECTED]> wrote: > > Hey Markus, > > > > Don't beat yourself up over it -- you did awesome work and have been > contributing a ton > > so who cares! > > > > If we need to do another patch release, we can easily do it (especially > with super release > > guy Lewis!) > > > > Cheers, > > Chris > > > > On Jun 26, 2012, at 3:55 PM, Markus Jelsma wrote: > > > >> Hi, > >> > >> The HostURLNormalizer is not supposed to be in 1.5.1, this is true for > other issues as well. Nutch 1.5.1 is a bugfix release and should not be > pulled from trunk but from the tag + the required patches, i didn't notice > it was pulled from trunk until now. > >> > >> The build issue has for that plugin is already resolved in trunk, i had > a bad commit for which i'm very sorry. > >> > >> Thanks > >> Markus > >> > >> -----Original message----- > >>> From:Sebastian Nagel <[EMAIL PROTECTED]> > >>> Sent: Tue 26-Jun-2012 23:22 > >>> To: [EMAIL PROTECTED] > >>> Subject: Re: [VOTE] Apache Nutch 1.5.1 Release Candidate > >>> > >>> -1 > >>> The plugin urlnormalizer-host (NUTCH-1319 listed in CHANGES.txt) > >>> is missing in the bin package. > >>> It also does not build for the src package: it's missing in > >>> src/plugins/build.xml of 1.5.1. > >>> > >>> @Markus: > >>> You are right: up to 1.4 there was a top-level folder apache-nutch-1.x/ > >>> in the package (src and bin). That's cosmetic but not blocking. Also: > >>> permissions of bin/nutch should be 755 (exec bits should be set). > >>> > >>> Beside: Runs (tested local mode only). > >>> > >>> Sebastian > >>> > >>> On 06/26/2012 06:32 PM, Markus Jelsma wrote: > >>>> This was command line. I didn't notive it with 1.5 because i unpacked > that in a GUI. It really unpacks in the cwd, or my system makes a fool out > of me :) > >>>> > >>>> wget > http://people.apache.org/~lewismc/apache-nutch-1.5.1-rc1/apache-nutch-1.5.1-src.tar.gz > >>>> tar -xvzf apache-nutch-1.5.1-src.tar.gz > >>>> > >>>> ls > >>>> apache-nutch-1.5.1-src.tar.gz CHANGES.txt default.properties ivy > LICENSE.txt README.txt > >>>> build.xml conf docs lib > NOTICE.txt src > >>>> > >>>> It is not a blocked indeed but may be annoying. > >>>> > >>>> +1 > >>>> > >>>> -----Original message----- > >>>>> From:Julien Nioche <[EMAIL PROTECTED]> > >>>>> Sent: Tue 26-Jun-2012 17:59 > >>>>> To: [EMAIL PROTECTED] > >>>>> Subject: Re: [VOTE] Apache Nutch 1.5.1 Release Candidate > >>>>> > >>>>> Probably depends on the tool you are using to open the archive. It > does > >>>>> that with File Roller on Ubuntu but works fine on the command line > or when > >>>>> doing "extract here" from the file menu > >>>>> > >>>>> Not a blocker IMHO > >>>>> > >>>>> On 26 June 2012 08:04, Markus Jelsma <[EMAIL PROTECTED]> > wrote: > >>>>> > >>>>>> Hi, > >>>>>> > >>>>>> It builds and runs smoothly but there's something that didn't catch > my eye > >>>>>> with 1.5 since i then used a GUI to unpack the src file, the src > and bin > >>>>>> packages decompresses everything in the cwd, this means no > apache-nutch-1.5 > >>>>>> folder is created. This was the case with 1.4 and earlier. I > believe this > >>>>>> should be addressed. * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble
-
RE: [VOTE] Apache Nutch 1.5.1 Release CandidateMarkus Jelsma 2012-06-27, 13:11
Hello,
I would prefer a minimal bugfix release. The stuff that i committed to trunk may still have some quirks that i haven't found yet, the HostURLNormalizer thing Sebastian noted was just one of them. Thanks. -----Original message----- > From:Julien Nioche <[EMAIL PROTECTED]> > Sent: Wed 27-Jun-2012 10:07 > To: [EMAIL PROTECTED] > Subject: Re: [VOTE] Apache Nutch 1.5.1 Release Candidate > > Hi, > > We'll have the things that are currently not working anyway so we might as well include them in 1.5.1. If you guys prefer doing the minimal stuff instead I am fine with that too > > Jul > > On 27 June 2012 07:53, Lewis John Mcgibbney <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> > wrote: > Hi Guys, > > The RC was pulled from the most recent commit to ./src/bin/nutch which > I believe was to remove the -core developer CLI options which are now > deprecated. This was a much cleaner option for providing a 1.5.1 > branch and generating the relevant 1.5.1 tag. > > If we it's required I can run a 1.5.1RC2 tonight with JUST the patch > to ./src/bin/nutch which corrects the .job issue is deploy mode? > > Please let me know > > Thanks > > Lewis > > On Wed, Jun 27, 2012 at 3:09 AM, Mattmann, Chris A (388J) > <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> > wrote: > > Hey Markus, > > > > Don't beat yourself up over it -- you did awesome work and have been contributing a ton > > so who cares! > > > > If we need to do another patch release, we can easily do it (especially with super release > > guy Lewis!) > > > > Cheers, > > Chris > > > > On Jun 26, 2012, at 3:55 PM, Markus Jelsma wrote: > > > >> Hi, > >> > >> The HostURLNormalizer is not supposed to be in 1.5.1, this is true for other issues as well. Nutch 1.5.1 is a bugfix release and should not be pulled from trunk but from the tag + the required patches, i didn't notice it was pulled from trunk until now. > >> > >> The build issue has for that plugin is already resolved in trunk, i had a bad commit for which i'm very sorry. > >> > >> Thanks > >> Markus > >> > >> -----Original message----- > >>> From:Sebastian Nagel <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> > > >>> Sent: Tue 26-Jun-2012 23:22 > >>> To: [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> > >>> Subject: Re: [VOTE] Apache Nutch 1.5.1 Release Candidate > >>> > >>> -1 > >>> The plugin urlnormalizer-host (NUTCH-1319 listed in CHANGES.txt) > >>> is missing in the bin package. > >>> It also does not build for the src package: it's missing in > >>> src/plugins/build.xml of 1.5.1. > >>> > >>> @Markus: > >>> You are right: up to 1.4 there was a top-level folder apache-nutch-1.x/ > >>> in the package (src and bin). That's cosmetic but not blocking. Also: > >>> permissions of bin/nutch should be 755 (exec bits should be set). > >>> > >>> Beside: Runs (tested local mode only). > >>> > >>> Sebastian > >>> > >>> On 06/26/2012 06:32 PM, Markus Jelsma wrote: > >>>> This was command line. I didn't notive it with 1.5 because i unpacked that in a GUI. It really unpacks in the cwd, or my system makes a fool out of me :) > >>>> > >>>> wget http://people.apache.org/~lewismc/apache-nutch-1.5.1-rc1/apache-nutch-1.5.1-src.tar.gz <http://people.apache.org/%7Elewismc/apache-nutch-1.5.1-rc1/apache-nutch-1.5.1-src.tar.gz> > >>>> tar -xvzf apache-nutch-1.5.1-src.tar.gz > >>>> > >>>> ls > >>>> apache-nutch-1.5.1-src.tar.gz CHANGES.txt default.properties ivy LICENSE.txt README.txt > >>>> build.xml conf docs lib NOTICE.txt src > >>>> > >>>> It is not a blocked indeed but may be annoying. > >>>> > >>>> +1 > >>>> > >>>> -----Original message----- > >>>>> From:Julien Nioche <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> > > >>>>> Sent: Tue 26-Jun-2012 17:59 > >>>>> To: [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> > >>>>> Subject: Re: [VOTE] Apache Nutch 1.5.1 Release Candidate > >>
-
Re: [VOTE] Apache Nutch 1.5.1 Release CandidateLewis John Mcgibbney 2012-06-27, 18:32
Hi,
On Wed, Jun 27, 2012 at 2:11 PM, Markus Jelsma <[EMAIL PROTECTED]> wrote: > Hello, > > I would prefer a minimal bugfix release. The stuff that i committed to trunk may still have some quirks that i haven't found yet, the HostURLNormalizer thing Sebastian noted was just one of them. > OK so based on the 1.5.1RC#1 CHANGES.txt [0] we currently have the following commits... * NUTCH-1400 Remove developer -core option for bin/nutch (jnioche) * NUTCH-1404 Nutch script fails to find job file in deploy mode (sidabatra, jnioche) * NUTCH-1398 Upgrade to Hadoop 1.0.3 (jnioche) * NUTCH-1300 Indexer to filter normalize URL's (markus) * NUTCH-1330 WebGraph OutlinkDB to preserve back up (markus) * NUTCH-1319 HostNormalizer plugin (markus) * NUTCH-1386 Headings filter not to add empty values (markus) * NUTCH-1356 ParseUtil use ExecutorService instead of manually thread handling (ferdy via markus) * NUTCH-1352 Improve regex urlfilters/normalizers synchronization (ferdy via markus) * NUTCH-1024 Dynamically set fetchInterval by MIME-type (markus) * NUTCH-1364 Add a counter in Generator for malformed urls (lewismc) * NUTCH-1360 Suport the storing of IP address connected to when web crawling (lewismc) * NUTCH-1262 Map `duplicating` content-types to a single type (markus) * NUTCH-1384 Typo in ParseSegments's run-method (Matthias Agethle via markus) * NUTCH-1385 More robust plug-in order properties in nutch-site.xml (Andy Xue via markus) * NUTCH-1336 Optionally not index db_notmodified pages (markus) * NUTCH-1346 Follow outlinks to ignore external (markus) * NUTCH-1320 IndexChecker and ParseChecker choke on IDN's (markus) * NUTCH-1351 DomainStatistics to aggregate by TLD (markus) * NUTCH-1381 Allow to override default subcollection field name (markus) * NUTCH-XX Commit to add configuration for separation of ant distribution targets (lewismc + jnioche) Do we just wish to include * NUTCH-1404 Nutch script fails to find job file in deploy mode (sidabatra, jnioche) ??? I can run this tomorrow. Thanks [0] http://people.apache.org/~lewismc/apache-nutch-1.5.1-rc1/CHANGES.txt
-
RE: [VOTE] Apache Nutch 1.5.1 Release CandidateMarkus Jelsma 2012-06-28, 07:32
Hello,
I'd opt for these additional patches * NUTCH-1398 Upgrade to Hadoop 1.0.3 (jnioche) * NUTCH-1384 Typo in ParseSegments's run-method (Matthias Agethle via markus) * NUTCH-1400 Remove developer -core option for bin/nutch (jnioche) -----Original message----- > From:Lewis John Mcgibbney <[EMAIL PROTECTED]> > Sent: Wed 27-Jun-2012 20:33 > To: [EMAIL PROTECTED] > Subject: Re: [VOTE] Apache Nutch 1.5.1 Release Candidate > > Hi, > > > On Wed, Jun 27, 2012 at 2:11 PM, Markus Jelsma > <[EMAIL PROTECTED]> wrote: > > Hello, > > > > I would prefer a minimal bugfix release. The stuff that i committed to trunk may still have some quirks that i haven't found yet, the HostURLNormalizer thing Sebastian noted was just one of them. > > > > OK so based on the 1.5.1RC#1 CHANGES.txt [0] we currently have the > following commits... > > * NUTCH-1400 Remove developer -core option for bin/nutch (jnioche) > > * NUTCH-1404 Nutch script fails to find job file in deploy mode > (sidabatra, jnioche) > > * NUTCH-1398 Upgrade to Hadoop 1.0.3 (jnioche) > > * NUTCH-1300 Indexer to filter normalize URL's (markus) > > * NUTCH-1330 WebGraph OutlinkDB to preserve back up (markus) > > * NUTCH-1319 HostNormalizer plugin (markus) > > * NUTCH-1386 Headings filter not to add empty values (markus) > > * NUTCH-1356 ParseUtil use ExecutorService instead of manually thread > handling (ferdy via markus) > > * NUTCH-1352 Improve regex urlfilters/normalizers synchronization > (ferdy via markus) > > * NUTCH-1024 Dynamically set fetchInterval by MIME-type (markus) > > * NUTCH-1364 Add a counter in Generator for malformed urls (lewismc) > > * NUTCH-1360 Suport the storing of IP address connected to when web > crawling (lewismc) > > * NUTCH-1262 Map `duplicating` content-types to a single type (markus) > > * NUTCH-1384 Typo in ParseSegments's run-method (Matthias Agethle via markus) > > * NUTCH-1385 More robust plug-in order properties in nutch-site.xml > (Andy Xue via markus) > > * NUTCH-1336 Optionally not index db_notmodified pages (markus) > > * NUTCH-1346 Follow outlinks to ignore external (markus) > > * NUTCH-1320 IndexChecker and ParseChecker choke on IDN's (markus) > > * NUTCH-1351 DomainStatistics to aggregate by TLD (markus) > > * NUTCH-1381 Allow to override default subcollection field name (markus) > > * NUTCH-XX Commit to add configuration for separation of ant > distribution targets (lewismc + jnioche) > > Do we just wish to include > > * NUTCH-1404 Nutch script fails to find job file in deploy mode > (sidabatra, jnioche) ??? > > I can run this tomorrow. Thanks > > [0] http://people.apache.org/~lewismc/apache-nutch-1.5.1-rc1/CHANGES.txt >
-
Re: [VOTE] Apache Nutch 1.5.1 Release CandidateJulien Nioche 2012-06-28, 08:49
+1
On 28 June 2012 08:32, Markus Jelsma <[EMAIL PROTECTED]> wrote: > Hello, > > I'd opt for these additional patches > * NUTCH-1398 Upgrade to Hadoop 1.0.3 (jnioche) > * NUTCH-1384 Typo in ParseSegments's run-method (Matthias Agethle via > markus) > * NUTCH-1400 Remove developer -core option for bin/nutch (jnioche) > > -----Original message----- > > From:Lewis John Mcgibbney <[EMAIL PROTECTED]> > > Sent: Wed 27-Jun-2012 20:33 > > To: [EMAIL PROTECTED] > > Subject: Re: [VOTE] Apache Nutch 1.5.1 Release Candidate > > > > Hi, > > > > > > On Wed, Jun 27, 2012 at 2:11 PM, Markus Jelsma > > <[EMAIL PROTECTED]> wrote: > > > Hello, > > > > > > I would prefer a minimal bugfix release. The stuff that i committed to > trunk may still have some quirks that i haven't found yet, the > HostURLNormalizer thing Sebastian noted was just one of them. > > > > > > > OK so based on the 1.5.1RC#1 CHANGES.txt [0] we currently have the > > following commits... > > > > * NUTCH-1400 Remove developer -core option for bin/nutch (jnioche) > > > > * NUTCH-1404 Nutch script fails to find job file in deploy mode > > (sidabatra, jnioche) > > > > * NUTCH-1398 Upgrade to Hadoop 1.0.3 (jnioche) > > > > * NUTCH-1300 Indexer to filter normalize URL's (markus) > > > > * NUTCH-1330 WebGraph OutlinkDB to preserve back up (markus) > > > > * NUTCH-1319 HostNormalizer plugin (markus) > > > > * NUTCH-1386 Headings filter not to add empty values (markus) > > > > * NUTCH-1356 ParseUtil use ExecutorService instead of manually thread > > handling (ferdy via markus) > > > > * NUTCH-1352 Improve regex urlfilters/normalizers synchronization > > (ferdy via markus) > > > > * NUTCH-1024 Dynamically set fetchInterval by MIME-type (markus) > > > > * NUTCH-1364 Add a counter in Generator for malformed urls (lewismc) > > > > * NUTCH-1360 Suport the storing of IP address connected to when web > > crawling (lewismc) > > > > * NUTCH-1262 Map `duplicating` content-types to a single type (markus) > > > > * NUTCH-1384 Typo in ParseSegments's run-method (Matthias Agethle via > markus) > > > > * NUTCH-1385 More robust plug-in order properties in nutch-site.xml > > (Andy Xue via markus) > > > > * NUTCH-1336 Optionally not index db_notmodified pages (markus) > > > > * NUTCH-1346 Follow outlinks to ignore external (markus) > > > > * NUTCH-1320 IndexChecker and ParseChecker choke on IDN's (markus) > > > > * NUTCH-1351 DomainStatistics to aggregate by TLD (markus) > > > > * NUTCH-1381 Allow to override default subcollection field name (markus) > > > > * NUTCH-XX Commit to add configuration for separation of ant > > distribution targets (lewismc + jnioche) > > > > Do we just wish to include > > > > * NUTCH-1404 Nutch script fails to find job file in deploy mode > > (sidabatra, jnioche) ??? > > > > I can run this tomorrow. Thanks > > > > [0] http://people.apache.org/~lewismc/apache-nutch-1.5.1-rc1/CHANGES.txt > > > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble
-
Re: [VOTE] Apache Nutch 1.5.1 Release CandidateLewis John Mcgibbney 2012-06-28, 17:42
OK this will be done ASAP.
Thanks for the comments and the time. Lewis On Thu, Jun 28, 2012 at 8:32 AM, Markus Jelsma <[EMAIL PROTECTED]> wrote: > Hello, > > I'd opt for these additional patches > * NUTCH-1398 Upgrade to Hadoop 1.0.3 (jnioche) > * NUTCH-1384 Typo in ParseSegments's run-method (Matthias Agethle via markus) > * NUTCH-1400 Remove developer -core option for bin/nutch (jnioche) > > -----Original message----- >> From:Lewis John Mcgibbney <[EMAIL PROTECTED]> >> Sent: Wed 27-Jun-2012 20:33 >> To: [EMAIL PROTECTED] >> Subject: Re: [VOTE] Apache Nutch 1.5.1 Release Candidate >> >> Hi, >> >> >> On Wed, Jun 27, 2012 at 2:11 PM, Markus Jelsma >> <[EMAIL PROTECTED]> wrote: >> > Hello, >> > >> > I would prefer a minimal bugfix release. The stuff that i committed to trunk may still have some quirks that i haven't found yet, the HostURLNormalizer thing Sebastian noted was just one of them. >> > >> >> OK so based on the 1.5.1RC#1 CHANGES.txt [0] we currently have the >> following commits... >> >> * NUTCH-1400 Remove developer -core option for bin/nutch (jnioche) >> >> * NUTCH-1404 Nutch script fails to find job file in deploy mode >> (sidabatra, jnioche) >> >> * NUTCH-1398 Upgrade to Hadoop 1.0.3 (jnioche) >> >> * NUTCH-1300 Indexer to filter normalize URL's (markus) >> >> * NUTCH-1330 WebGraph OutlinkDB to preserve back up (markus) >> >> * NUTCH-1319 HostNormalizer plugin (markus) >> >> * NUTCH-1386 Headings filter not to add empty values (markus) >> >> * NUTCH-1356 ParseUtil use ExecutorService instead of manually thread >> handling (ferdy via markus) >> >> * NUTCH-1352 Improve regex urlfilters/normalizers synchronization >> (ferdy via markus) >> >> * NUTCH-1024 Dynamically set fetchInterval by MIME-type (markus) >> >> * NUTCH-1364 Add a counter in Generator for malformed urls (lewismc) >> >> * NUTCH-1360 Suport the storing of IP address connected to when web >> crawling (lewismc) >> >> * NUTCH-1262 Map `duplicating` content-types to a single type (markus) >> >> * NUTCH-1384 Typo in ParseSegments's run-method (Matthias Agethle via markus) >> >> * NUTCH-1385 More robust plug-in order properties in nutch-site.xml >> (Andy Xue via markus) >> >> * NUTCH-1336 Optionally not index db_notmodified pages (markus) >> >> * NUTCH-1346 Follow outlinks to ignore external (markus) >> >> * NUTCH-1320 IndexChecker and ParseChecker choke on IDN's (markus) >> >> * NUTCH-1351 DomainStatistics to aggregate by TLD (markus) >> >> * NUTCH-1381 Allow to override default subcollection field name (markus) >> >> * NUTCH-XX Commit to add configuration for separation of ant >> distribution targets (lewismc + jnioche) >> >> Do we just wish to include >> >> * NUTCH-1404 Nutch script fails to find job file in deploy mode >> (sidabatra, jnioche) ??? >> >> I can run this tomorrow. Thanks >> >> [0] http://people.apache.org/~lewismc/apache-nutch-1.5.1-rc1/CHANGES.txt >> -- Lewis
-
Re: [VOTE] Apache Nutch 1.5.1 Release CandidateLewis John Mcgibbney 2012-07-02, 18:01
Hi Guys,
Just an update on this. Please take a look at CHANGES to the new branch I created [0] I'm waiting on Sebastian's comments as currently the zip and tar-src's produce the desired output however the tar and zip-bin targets do not. If this is not a blocker then I can release the artifacts for a VOTE but I wanted to hear from you guys before I do so. Best Lewis [0] http://svn.apache.org/repos/asf/nutch/branches/branch-1.5.1/CHANGES.txt On Thu, Jun 28, 2012 at 6:42 PM, Lewis John Mcgibbney <[EMAIL PROTECTED]> wrote: > OK this will be done ASAP. > > Thanks for the comments and the time. > > Lewis > > On Thu, Jun 28, 2012 at 8:32 AM, Markus Jelsma > <[EMAIL PROTECTED]> wrote: >> Hello, >> >> I'd opt for these additional patches >> * NUTCH-1398 Upgrade to Hadoop 1.0.3 (jnioche) >> * NUTCH-1384 Typo in ParseSegments's run-method (Matthias Agethle via markus) >> * NUTCH-1400 Remove developer -core option for bin/nutch (jnioche) >> >> -----Original message----- >>> From:Lewis John Mcgibbney <[EMAIL PROTECTED]> >>> Sent: Wed 27-Jun-2012 20:33 >>> To: [EMAIL PROTECTED] >>> Subject: Re: [VOTE] Apache Nutch 1.5.1 Release Candidate >>> >>> Hi, >>> >>> >>> On Wed, Jun 27, 2012 at 2:11 PM, Markus Jelsma >>> <[EMAIL PROTECTED]> wrote: >>> > Hello, >>> > >>> > I would prefer a minimal bugfix release. The stuff that i committed to trunk may still have some quirks that i haven't found yet, the HostURLNormalizer thing Sebastian noted was just one of them. >>> > >>> >>> OK so based on the 1.5.1RC#1 CHANGES.txt [0] we currently have the >>> following commits... >>> >>> * NUTCH-1400 Remove developer -core option for bin/nutch (jnioche) >>> >>> * NUTCH-1404 Nutch script fails to find job file in deploy mode >>> (sidabatra, jnioche) >>> >>> * NUTCH-1398 Upgrade to Hadoop 1.0.3 (jnioche) >>> >>> * NUTCH-1300 Indexer to filter normalize URL's (markus) >>> >>> * NUTCH-1330 WebGraph OutlinkDB to preserve back up (markus) >>> >>> * NUTCH-1319 HostNormalizer plugin (markus) >>> >>> * NUTCH-1386 Headings filter not to add empty values (markus) >>> >>> * NUTCH-1356 ParseUtil use ExecutorService instead of manually thread >>> handling (ferdy via markus) >>> >>> * NUTCH-1352 Improve regex urlfilters/normalizers synchronization >>> (ferdy via markus) >>> >>> * NUTCH-1024 Dynamically set fetchInterval by MIME-type (markus) >>> >>> * NUTCH-1364 Add a counter in Generator for malformed urls (lewismc) >>> >>> * NUTCH-1360 Suport the storing of IP address connected to when web >>> crawling (lewismc) >>> >>> * NUTCH-1262 Map `duplicating` content-types to a single type (markus) >>> >>> * NUTCH-1384 Typo in ParseSegments's run-method (Matthias Agethle via markus) >>> >>> * NUTCH-1385 More robust plug-in order properties in nutch-site.xml >>> (Andy Xue via markus) >>> >>> * NUTCH-1336 Optionally not index db_notmodified pages (markus) >>> >>> * NUTCH-1346 Follow outlinks to ignore external (markus) >>> >>> * NUTCH-1320 IndexChecker and ParseChecker choke on IDN's (markus) >>> >>> * NUTCH-1351 DomainStatistics to aggregate by TLD (markus) >>> >>> * NUTCH-1381 Allow to override default subcollection field name (markus) >>> >>> * NUTCH-XX Commit to add configuration for separation of ant >>> distribution targets (lewismc + jnioche) >>> >>> Do we just wish to include >>> >>> * NUTCH-1404 Nutch script fails to find job file in deploy mode >>> (sidabatra, jnioche) ??? >>> >>> I can run this tomorrow. Thanks >>> >>> [0] http://people.apache.org/~lewismc/apache-nutch-1.5.1-rc1/CHANGES.txt >>> > > > > -- > Lewis -- Lewis |