Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: .   Results from 41 to 50 from 155 (0.429s).
Loading phrases to help you
refine your search...
[NUTCH-1436] bin/nutch absent in zip package - Nutch - [issue]
...The script bin/nutch is absent in the package apache-nutch-1.5.1-bin.zip,the tar-bin package is not affected....
http://issues.apache.org/jira/browse/NUTCH-1436    Author: Sebastian Nagel, 2013-01-12, 17:48
Re: problem with nutch2.1 and redirect - Nutch - [mail # user]
...Hi David,  Nutch follows redirects. You should check the URL you are redirected to:   http://search.ebscohost.com/login.aspx?direct=true&scope=site&db=a2h&AN=84164637&msid=94333040...
   Author: Sebastian Nagel, 2013-01-08, 20:49
[NUTCH-1245] URL gone with 404 after db.fetch.interval.max stays db_unfetched in CrawlDb and is generated over and over again - Nutch - [issue]
...A document gone with 404 after db.fetch.interval.max (90 days) has passedis fetched over and over again but although fetch status is fetch_goneits status in CrawlDb keeps db_unfetched. Conse...
http://issues.apache.org/jira/browse/NUTCH-1245    Author: Sebastian Nagel, 2013-01-08, 14:44
Re: nutch 2.1 command line options - Nutch - [mail # user]
...While in 1.x all commands show a help when called as    bin/nutch command this is not always the case for 2.x - a known inconsistency (NUTCH-1393). Unfortunately, until this issue ...
   Author: Sebastian Nagel, 2013-01-06, 21:14
[NUTCH-1339] Default URL normalization rules to remove page anchors completely - Nutch - [issue]
...The default rules of URLNormalizerRegex remove the anchor up to the firstoccurrence of ? or &. The remaining part of the anchor is keptwhich may cause a large, possibly infinite number o...
http://issues.apache.org/jira/browse/NUTCH-1339    Author: Sebastian Nagel, 2012-12-06, 14:53
[NUTCH-1455] RobotRulesParser to match multi-word user-agent names - Nutch - [issue]
...If the user-agent name(s) configured in http.robots.agents contains spaces it is not matched even if is exactly contained in the robots.txthttp.robots.agents = "Download Ninja,*"If the robot...
http://issues.apache.org/jira/browse/NUTCH-1455    Author: Sebastian Nagel, 2012-12-06, 14:53
[NUTCH-1422] reset signature for redirects - Nutch - [issue]
...In a long running continuous crawl with Nutch 1.4 URLs with a HTTP redirect (http.redirect.max = 0) are kept as not-modified in the CrawlDb. Short protocol (cf. attached dumped segment / Cra...
http://issues.apache.org/jira/browse/NUTCH-1422    Author: Sebastian Nagel, 2012-12-06, 14:53
Re: Wrong ParseData in segment - Nutch - [mail # user]
...Hi Markus,  sounds somewhat similar to NUTCH-1252 but that was rather trivial and easy to reproduce.  Sebastian  2012/11/30 Markus Jelsma :...
   Author: Sebastian Nagel, 2012-11-30, 19:57
Re: [VOTE] Apache Nutch 1.6 Release Candidate - Nutch - [mail # dev]
...+1  - source package builds, tests pass - successful test crawl with bin package (20+ URLs, Linux, local mode, Solr 3.6)  On 11/23/2012 03:24 PM, lewis john mcgibbney wrote:...
   Author: Sebastian Nagel, 2012-11-28, 22:19
Re: shouldFetch rejected - Nutch - [mail # user]
...Then all should work as expected.  Are you sure they aren't fetched at all? This debug log output in Generator mapper is shown also for URLs fetched in previous cycles. You should check...
   Author: Sebastian Nagel, 2012-11-25, 20:02
Sort:
project
Nutch (155)
type
mail # user (90)
mail # dev (38)
issue (27)
date
last 7 days (2)
last 30 days (9)
last 90 days (25)
last 6 months (54)
last 9 months (155)
author
Markus Jelsma (1767)
Lewis John Mcgibbney (1113)
Julien Nioche (805)
Mattmann, Chris A (400)
lewis john mcgibbney (334)
Andrzej Bialecki (302)
Ferdy Galema (224)
Bai Shen (161)
Tejas Patil (158)
Sebastian Nagel (155)
kiran chitturi (155)
alxsss@...)
remi tassing (133)
Lewis John McGibbney (129)
Gabriele Kahlout (115)