| clear query|facets|time |
Search criteria: .
Results from 21 to 30 from
133 (0.388s).
|
|
|
Loading phrases to help you refine your search...
|
|
Re: nutch says No URLs to fetch - check your seed list and URL filters when trying to index fmforums.com - Nutch - [mail # user]
|
|
...It could be a million reasons: seed, filter, authentication...maybe the pages are already crawled... is there any clue in the log? Remi On Mon, Apr 2, 2012 at 5:37 PM, jeps...
|
|
|
Author: remi tassing,
2012-04-02, 09:44
|
|
|
Re: crawling a website - Nutch - [mail # user]
|
|
...It depends on the structure of your site and you can modify "regex-urlfilter.txt" to reach your goal. *"- ^http://ww.mywebsite.com/[^/]*$"* it will exclude http://ww.mywebsite.co...
|
|
|
Author: remi tassing,
2012-04-02, 09:40
|
|
|
Normalizer error: "IndexOutOfBoundsException: No group 1" - Nutch - [mail # user]
|
|
...Hi all, I just found a weird error and it looks like a JDK bug but I'm not sure. Whenever replacing a URL-A, that contains a number, with a URL-B, then I get an error: "IndexOutOfBound...
|
|
|
Author: remi tassing,
2012-04-02, 07:40
|
|
|
Re: Re-indexing temporarily unavailable page - Nutch - [mail # user]
|
|
...nice! On Wed, Mar 28, 2012 at 10:52 PM, dspathis wrote: ...
|
|
|
Author: remi tassing,
2012-03-29, 05:03
|
|
|
Re: Different number of parsed pages for crawls with same settings - Nutch - [mail # user]
|
|
...This happened to me before for a very specific reason and I'm not sure if it's the same for you. Some of the websites I was trying to access were temporarily down. I would suggest you ...
|
|
|
Author: remi tassing,
2012-03-27, 10:05
|
|
|
Re: [ANNOUNCEMENT] Lewis John Mc Gibbney is a Nutch committer and PMC member - Nutch - [mail # user]
|
|
...Try this: http://wiki.apache.org/solr/FAQ#My_search_returns_too_many_.2BAC8_too_little_.2BAC8_unexpected_results.2C_how_to_debug.3F Solr also has a debug mode where you can see result'...
|
|
|
Author: remi tassing,
2012-03-27, 10:02
|
|
|
Re: db_unfetched large number, but crawling not fetching any longer - Nutch - [mail # user]
|
|
...I'm not sure to totally understand what you meant. 1. In case you know exactly how the relative urls are translated into, you can use urlnormalizefilter to change them in what would ma...
|
|
|
Author: remi tassing,
2012-03-27, 09:56
|
|
|
Re: divide fetch process ? - Nutch - [mail # user]
|
|
...I think that is exactly what HADOOP does! Start here: http://wiki.apache.org/nutch/NutchHadoopTutorial On Tue, Mar 27, 2012 at 6:19 AM, pepe3059 wrote: ...
|
|
|
Author: remi tassing,
2012-03-27, 09:52
|
|
|
Re: Out-of-the-box Nutch indexing url source to Solr - Nutch - [mail # user]
|
|
...Hey, Try the command "bin/nutch readseg -dump"[1][2]. It reads a segment (or multiple segments) and output their content including outlinks, html content, parsed content... I hop...
|
|
|
Author: remi tassing,
2012-03-26, 00:56
|
|
|
Re: nutch crawling file system SOLVED - Nutch - [mail # user]
|
|
...Using crawl-ulrfilter (or regex-urlfilter depending on which one you're using), you should be able to solve this. Unless you're not clear on what folders to exclude...? On Sunday, Marc...
|
|
|
Author: remi tassing,
2012-03-12, 05:06
|
|
|
|