| clear query|facets|time |
Search criteria: .
Results from 81 to 90 from
1767 (1.317s).
|
|
|
Loading phrases to help you refine your search...
|
|
RE: Comparing Nutch and Common Crawl - Nutch - [mail # user]
|
|
...Hi, Interesting indeed. Apart from our customers we operate a cluster of a few high octane machines for research purposes that crawls the entire internet as much as it physically can. ...
|
|
|
Author: Markus Jelsma,
2012-12-17, 21:59
|
|
|
RE: How to extend Nutch for article crawling - Nutch - [mail # user]
|
|
...The 1.x indexer can filter and normalize. ...
|
|
|
Author: Markus Jelsma,
2012-12-17, 14:13
|
|
|
RE: shouldFetch rejected - Nutch - [mail # user]
|
|
...You're doing nothing wrong, it's just a debug entry. curTime is just the CURRENT TIME and fetchTime is the time in the future after which the record must be fetched again. The fetch time is ...
|
|
|
Author: Markus Jelsma,
2012-12-17, 12:45
|
|
|
RE: shouldFetch rejected - Nutch - [mail # user]
|
|
...Hi - curTime does not exceed fetchTime, thus the record is not eligible for fetch. ...
|
|
|
Author: Markus Jelsma,
2012-12-17, 12:40
|
|
|
RE: Best practices for running Nutch - Nutch - [mail # user]
|
|
...A long running fetcher: - allows possible memory leak to accumulate until disaster; - looses more records if it terminates; - can run even longer because more records to shuffle in map reduc...
|
|
|
Author: Markus Jelsma,
2012-12-14, 17:34
|
|
|
RE: identify domains from fetch lists taking lot of time. - Nutch - [mail # user]
|
|
...Hi - you have to get rid of those URL's via URL filters. If you cannot filter them out you can set the fetcher time limit (see nutch-default) to limit the time the fetcher runs or set the fe...
|
|
|
Author: Markus Jelsma,
2012-12-14, 09:00
|
|
|
RE: [ANNOUNCE] Apache Nutch 1.6 Released - Nutch - [mail # user]
|
|
|
|
Author: Markus Jelsma,
2012-12-10, 11:48
|
|
|
RE: fetcher partitioning - Nutch - [mail # user]
|
|
... The partitioner decides which record ends up in which fetch list. When running locally, there is always one fetch list and one mapper to ingest that fetch list. T...
|
|
|
Author: Markus Jelsma,
2012-12-10, 11:46
|
|
|
RE: fetcher partitioning - Nutch - [mail # user]
|
|
...Sourajit, Looks fine at a first glance. A partitioner does not partition between threads, only mappers. It also makes little sense because in the fetcher number of threads can be set p...
|
|
|
Author: Markus Jelsma,
2012-12-10, 10:53
|
|
|
[NUTCH-1232] Remove host field from index-basic - Nutch - [issue]
|
|
...Either fields needs to be removed, it makes no sense to have two identical values for separate fields. I propose to get rid of the site field and leave the host field. This may be a breaking...
|
|
|
http://issues.apache.org/jira/browse/NUTCH-1232
Author: Markus Jelsma,
2012-12-10, 04:35
|
|
|
|