Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: .   Results from 1 to 10 from 224 (0.211s).
Loading phrases to help you
refine your search...
[NUTCH-1314] Impose a limit on the length of outlink target urls - Nutch - [issue]
...In the past we have encountered situations where crawling specific broken sites resulted in ridiciously long urls that caused the stalling of tasks. The regex plugins (normalizing/filtering)...
http://issues.apache.org/jira/browse/NUTCH-1314    Author: Ferdy Galema, 2013-02-13, 11:42
Re: Usage of db.max.inlinks property in nutch-site.xml in 2.x - Nutch - [mail # user]
...Absolutely. We should remove any unused property that is not in the planning for (re)implementing.   On Tue, Feb 5, 2013 at 2:12 AM, Lewis John Mcgibbney  wrote:     *Fer...
   Author: Ferdy Galema, 2013-02-05, 08:05
Re: Very long time just before fetching and just after parsing - Nutch - [mail # user]
...Hi,  Not sure if it's possibly in the 2.x branch to filter/normalize just once, but with a bit of hacking this should not be too difficult. If you filter the input urls (injected urls) ...
   Author: Ferdy Galema, 2013-02-04, 16:15
Re: Usage of db.max.inlinks property in nutch-site.xml in 2.x - Nutch - [mail # user]
...Hi Lewis,  The relevant property seems to be db.update.max.inlinks   On Fri, Feb 1, 2013 at 4:27 AM, Lewis John Mcgibbney  wrote:     *Ferdy Galema* Kalooga Developm...
   Author: Ferdy Galema, 2013-02-04, 16:10
[NUTCH-1313] Nutch trunk add response headers to datastore for the protocol-httpclient plugin - Nutch - [issue]
...For tracking progress the port of NUTCH-1311 to Nutch trunk....
http://issues.apache.org/jira/browse/NUTCH-1313    Author: Ferdy Galema, 2013-01-12, 19:59
[NUTCH-1387] All parsers should respond to cancellation / interrupts. - Nutch - [issue]
...During parsing a TimeoutException can occur. This is caused whenever the FutureTask.get() cannot be completed within the specified timeout. The tricky part is that single urls might be perfe...
http://issues.apache.org/jira/browse/NUTCH-1387    Author: Ferdy Galema, 2013-01-12, 19:15
[NUTCH-1286] Refactoring/reimplementing crawling API (NutchApp) - Nutch - [issue]
...This issue is to track changes we (Mathijs and I) have planned for the API and webapp in Nutchgora. We have a pretty good idea of how we want to be using the crawl API. It may involve some m...
http://issues.apache.org/jira/browse/NUTCH-1286    Author: Ferdy Galema, 2013-01-12, 18:55
[NUTCH-1452] hadoop.job.history.user.location in nutch-default making job history useless - Nutch - [issue]
...There is still a property in nutch-default 'hadoop.job.history.user.location' that redirects the creation of history files from job output locations to a custom location. I noticed that the ...
http://issues.apache.org/jira/browse/NUTCH-1452    Author: Ferdy Galema, 2013-01-12, 18:47
[NUTCH-1457] Nutch2 Refactor the update process so that fetched items are only processed once - Nutch - [issue]
http://issues.apache.org/jira/browse/NUTCH-1457    Author: Ferdy Galema, 2013-01-12, 18:46
Re: code changes not reflecting when deployed on hadoop - Nutch - [mail # user]
...For the record: This no longer seems to be the case for trunk. (At least when you properly ant clean prior to building).   On Fri, Dec 28, 2012 at 12:25 PM, Sourajit Basak wrote:  ...
   Author: Ferdy Galema, 2013-01-07, 10:52
Sort:
project
Nutch (224)
ElasticSearch (2)
Mahout (1)
type
mail # user (117)
issue (61)
mail # dev (46)
date
last 7 days (0)
last 30 days (0)
last 90 days (0)
last 6 months (13)
last 9 months (224)
author
Markus Jelsma (1767)
Lewis John Mcgibbney (1125)
Julien Nioche (805)
Mattmann, Chris A (402)
lewis john mcgibbney (334)
Andrzej Bialecki (302)
Ferdy Galema (224)
Tejas Patil (164)
Bai Shen (163)
kiran chitturi (157)
Sebastian Nagel (156)
alxsss@...)
remi tassing (133)
Lewis John McGibbney (129)
Gabriele Kahlout (115)