Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Plain View
Nutch, mail # user - Failed fetching


Copy link to this message
-
Failed fetching
Dean Pullen 2012-02-02, 16:44
Hi all,

I'm trying to fetch from http://nutch.apache.org

But after fetching, parsing, and updating the DB I examine the DB for
'http://nutch.apache.org/' (oddly I must include the last slash) and get:

/URL: http://nutch.apache.org/
Version: 7
Status: 1 (*db_unfetched*)
Fetch time: Fri Feb 03 16:33:13 GMT 2012
Modified time: Thu Jan 01 01:00:00 GMT 1970
Retries since fetch: 1
Retry interval: 2592000 seconds (30 days)
Score: 500.0
Signature: null
Metadata: _pst_: *failed*(2), lastModified=0/

Why is the fetch failing and how can I show more nutch logging so as to
view the failure attempt/message?
Nothing is seen in my access logs when I try to crawl my own external site.

To ensure all URLs are permitted I've changed the regex-urlfilter.txt to:

/# accept anything else
+./

This has been puzzling me all day, I'm hoping someone can help!

Dean.
+
Dean Pullen 2012-02-02, 17:11
+
Dean Pullen 2012-02-02, 17:22
+
Lewis John Mcgibbney 2012-02-02, 18:01
+
Dean Pullen 2012-02-03, 11:06
+
tiagorcs 2012-02-06, 03:31
+
tiagorcs 2012-02-06, 04:37
+
Lewis John Mcgibbney 2012-02-10, 21:18
+
remi tassing 2012-02-14, 18:03
+
Lewis John Mcgibbney 2012-02-14, 18:08
+
tiagorcs 2012-02-15, 01:46
+
remi tassing 2012-02-15, 09:50
+
tiagorcs 2012-02-22, 01:11
+
Markus Jelsma 2012-02-02, 18:17
+
tiagorcs 2012-02-03, 10:01
+
tiagorcs 2012-02-03, 10:06
+
Lewis John Mcgibbney 2012-02-03, 10:11
+
tiagorcs 2012-02-03, 10:22
+
Markus Jelsma 2012-02-03, 10:22
+
tiagorcs 2012-02-03, 10:48
+
Markus Jelsma 2012-02-03, 10:49
+
tiagorcs 2012-02-03, 10:57
+
Markus Jelsma 2012-02-03, 11:02