| clear query|facets|time |
Search criteria: .
Results from 1 to 10 from
224 (0.211s).
|
|
|
Loading phrases to help you refine your search...
|
|
[NUTCH-1314] Impose a limit on the length of outlink target urls - Nutch - [issue]
|
|
...In the past we have encountered situations where crawling specific broken sites resulted in ridiciously long urls that caused the stalling of tasks. The regex plugins (normalizing/filtering)...
|
|
|
http://issues.apache.org/jira/browse/NUTCH-1314
Author: Ferdy Galema,
2013-02-13, 11:42
|
|
|
Re: Usage of db.max.inlinks property in nutch-site.xml in 2.x - Nutch - [mail # user]
|
|
...Absolutely. We should remove any unused property that is not in the planning for (re)implementing. On Tue, Feb 5, 2013 at 2:12 AM, Lewis John Mcgibbney wrote: *Fer...
|
|
|
Author: Ferdy Galema,
2013-02-05, 08:05
|
|
|
Re: Very long time just before fetching and just after parsing - Nutch - [mail # user]
|
|
...Hi, Not sure if it's possibly in the 2.x branch to filter/normalize just once, but with a bit of hacking this should not be too difficult. If you filter the input urls (injected urls) ...
|
|
|
Author: Ferdy Galema,
2013-02-04, 16:15
|
|
|
Re: Usage of db.max.inlinks property in nutch-site.xml in 2.x - Nutch - [mail # user]
|
|
...Hi Lewis, The relevant property seems to be db.update.max.inlinks On Fri, Feb 1, 2013 at 4:27 AM, Lewis John Mcgibbney wrote: *Ferdy Galema* Kalooga Developm...
|
|
|
Author: Ferdy Galema,
2013-02-04, 16:10
|
|
|
[NUTCH-1313] Nutch trunk add response headers to datastore for the protocol-httpclient plugin - Nutch - [issue]
|
|
...For tracking progress the port of NUTCH-1311 to Nutch trunk....
|
|
|
http://issues.apache.org/jira/browse/NUTCH-1313
Author: Ferdy Galema,
2013-01-12, 19:59
|
|
|
[NUTCH-1387] All parsers should respond to cancellation / interrupts. - Nutch - [issue]
|
|
...During parsing a TimeoutException can occur. This is caused whenever the FutureTask.get() cannot be completed within the specified timeout. The tricky part is that single urls might be perfe...
|
|
|
http://issues.apache.org/jira/browse/NUTCH-1387
Author: Ferdy Galema,
2013-01-12, 19:15
|
|
|
[NUTCH-1286] Refactoring/reimplementing crawling API (NutchApp) - Nutch - [issue]
|
|
...This issue is to track changes we (Mathijs and I) have planned for the API and webapp in Nutchgora. We have a pretty good idea of how we want to be using the crawl API. It may involve some m...
|
|
|
http://issues.apache.org/jira/browse/NUTCH-1286
Author: Ferdy Galema,
2013-01-12, 18:55
|
|
|
[NUTCH-1452] hadoop.job.history.user.location in nutch-default making job history useless - Nutch - [issue]
|
|
...There is still a property in nutch-default 'hadoop.job.history.user.location' that redirects the creation of history files from job output locations to a custom location. I noticed that the ...
|
|
|
http://issues.apache.org/jira/browse/NUTCH-1452
Author: Ferdy Galema,
2013-01-12, 18:47
|
|
|
[NUTCH-1457] Nutch2 Refactor the update process so that fetched items are only processed once - Nutch - [issue]
|
|
|
|
http://issues.apache.org/jira/browse/NUTCH-1457
Author: Ferdy Galema,
2013-01-12, 18:46
|
|
|
Re: code changes not reflecting when deployed on hadoop - Nutch - [mail # user]
|
|
...For the record: This no longer seems to be the case for trunk. (At least when you properly ant clean prior to building). On Fri, Dec 28, 2012 at 12:25 PM, Sourajit Basak wrote: ...
|
|
|
Author: Ferdy Galema,
2013-01-07, 10:52
|
|
|
|