| clear query|facets|time |
Search criteria: .
Results from 1 to 10 from
133 (0.291s).
|
|
|
Loading phrases to help you refine your search...
|
|
Re: Escaping URL during redirection - Nutch - [mail # user]
|
|
...Sorry, I think it works. I was trying 'parsechecker' and it doesn't apply 'regexnormalizer' rules by default. So, case solved, thanks a lot! On Sunday, September 9, 2012, Sebasti...
|
|
|
Author: remi tassing,
2012-09-10, 12:32
|
|
|
Escaping URL during redirection - Nutch - [mail # user]
|
|
...Hi guys, I'm not quite sure how to make Nutch follow the normalizer regular expressions during redirection. I see some URLs are not properly escaped. Any help? Remi...
|
|
|
Author: remi tassing,
2012-09-08, 17:30
|
|
|
Re: Problem with corrupted index "Input path does not exist:" - Nutch - [mail # user]
|
|
...deleting that specific segment directory [0] should fix the problem but it depends on what you're attempting to do. Remi [0]: /home/user/Apache Nutch/crawl/segments/**20120908095...
|
|
|
Author: remi tassing,
2012-09-08, 09:03
|
|
|
Re: Crawl HTTPS websites/Enable Plugin - Nutch - [mail # user]
|
|
...So did it fail before or after you used protocol-httpclient? On 7/24/12, Kay wrote: Remi Tassing...
|
|
|
Author: remi tassing,
2012-07-24, 05:20
|
|
|
Re: How does nutch reflect with HTTP status not 200? - Nutch - [mail # user]
|
|
...Hi, just in case there was no reply yet. Nutch does have some handling depending on the HTTP response code (e.g. 302 redirection ...). For more detail, check the source code Http...
|
|
|
Author: remi tassing,
2012-07-22, 12:34
|
|
|
Re: javascript in href does not get into outlink - Nutch - [mail # user]
|
|
...I have a similar problem and I'm planning to modify the parsing code...I hope it works On Mon, Jul 2, 2012 at 2:10 PM, Alexander Aristov wrote: ...
|
|
|
Author: remi tassing,
2012-07-03, 08:26
|
|
|
Re: Compilation of core classes - Nutch - [mail # user]
|
|
...Merci Julien! I tried that and compilation works. However, there is a small problem, I changed fetcher.java and running "bin/nutch..." doesn't use include the latest binaries. I ...
|
|
|
Author: remi tassing,
2012-06-30, 12:37
|
|
|
Re: Near Duplicate Detection in nutch /Solr - Nutch - [mail # user]
|
|
...I'm very interested in this topic as well. Plz let the community know if/when you get smth cool implemented =) On Saturday, June 23, 2012, parnab kumar wrote: ...
|
|
|
Author: remi tassing,
2012-06-23, 09:59
|
|
|
Re: disable filtering and normalization in the crawl-tool - Nutch - [mail # user]
|
|
...Certainty, but you might need them to avoid crawling unnecessary pages On Monday, June 11, 2012, Matthias Paul wrote: ...
|
|
|
Author: remi tassing,
2012-06-11, 22:52
|
|
|
Re: URL filtering and normalization - Nutch - [mail # user]
|
|
..."bad" URLs are already and still in. You'll need to update your db with the 'updatedb' command On Monday, June 11, 2012, Bai Shen wrote: ...
|
|
|
Author: remi tassing,
2012-06-11, 22:50
|
|
|
|