| clear query|facets|time |
Search criteria: .
Results from 1 to 10 from
39 (0.128s).
|
|
|
Loading phrases to help you refine your search...
|
|
Re: Lower case URLs - correct regex? - Nutch - [mail # user]
|
|
...We've dug a bit deeper... We're actually upgrading from Nutch 1.0 to 1.4. It seems the regex stuff has moved away from the Perl5Substitution implementation, which supported...
|
|
|
Author: Dean Pullen,
2012-05-08, 14:30
|
|
|
Lower case URLs - correct regex? - Nutch - [mail # user]
|
|
...Hi all, I'm trying to lower case all URLs via Nutch's regex-normalize.xml The regex looks like: (.*) \L$1\E This appears to be correct, yet we're seeing this when ...
|
|
|
Author: Dean Pullen,
2012-05-08, 12:37
|
|
|
Hadoop not doing anything - Nutch - [mail # user]
|
|
...Hi all, If this is definitely a Hadoop issue, as opposed to it being an issue caused by Nutch, I'll happily go ask on the Hadoop mailing list... Anyway, I'm kicking off a &...
|
|
|
Author: Dean Pullen,
2012-05-01, 15:26
|
|
|
Re: Nutch 1.4 with Hadoop - how does Nutch know where Hadoop is running - Nutch - [mail # user]
|
|
...Thanks for your reply. I understand what you've said, but how does Nutch know where the Hadoop jobtracker is running? Regards, Dean. On 20/03/2012 11:03, Markus...
|
|
|
Author: Dean Pullen,
2012-03-20, 10:59
|
|
|
Nutch 1.4 with Hadoop - how does Nutch know where Hadoop is running - Nutch - [mail # user]
|
|
...Hi all, An odd question, but I can't work out how Nutch 1.4 actually knows where Hadoop is running. Usually I copy Hadoop over the top of Nutch, but if we want to put  ...
|
|
|
Author: Dean Pullen,
2012-03-20, 10:51
|
|
|
Re: Failed fetching - Nutch - [mail # user]
|
|
...Thanks for the reply - I'm using 1.4 The problem was; as previously described, the nutch-site.xml didn't have the protocol-http in the plugins include - I had presumed this was &...
|
|
|
Author: Dean Pullen,
2012-02-03, 11:06
|
|
|
Re: Failed fetching - Nutch - [mail # user]
|
|
...What I see in logs/userlogs/myfetchjobxx/syslog is: 2012-02-02 17:15:25,045 INFO org.apache.nutch.fetcher.Fetcher: fetch of http://nutch.apache.org/ failed with: org.apache...
|
|
|
Author: Dean Pullen,
2012-02-02, 17:22
|
|
|
Re: Failed fetching - Nutch - [mail # user]
|
|
...I've added: http.verbose true If true, HTTP will log more verbosely. fetcher.verbose true If true, fetcher will log more verbosely. To the nutch-site.xml in an att...
|
|
|
Author: Dean Pullen,
2012-02-02, 17:11
|
|
|
Failed fetching - Nutch - [mail # user]
|
|
...Hi all, I'm trying to fetch from http://nutch.apache.org But after fetching, parsing, and updating the DB I examine the DB for 'http://nutch.apache.org/' (oddly I must incl...
|
|
|
Author: Dean Pullen,
2012-02-02, 16:44
|
|
|
Re: Null Pointer During Crawl on Hadoop EC2 - Nutch - [mail # user]
|
|
...Looks like this to me: https://issues.apache.org/jira/browse/NUTCH-1084 D. On 13/01/2012 15:41, Matthew Slade wrote:...
|
|
|
Author: Dean Pullen,
2012-01-13, 15:43
|
|
|
|