Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: .   Results from 81 to 90 from 1767 (1.317s).
Loading phrases to help you
refine your search...
RE: Comparing Nutch and Common Crawl - Nutch - [mail # user]
...Hi,  Interesting indeed. Apart from our customers we operate a cluster of a few high octane machines for research purposes that crawls the entire internet as much as it physically can. ...
   Author: Markus Jelsma, 2012-12-17, 21:59
RE: How to extend Nutch for article crawling - Nutch - [mail # user]
...The 1.x indexer can filter and normalize.    ...
   Author: Markus Jelsma, 2012-12-17, 14:13
RE: shouldFetch rejected - Nutch - [mail # user]
...You're doing nothing wrong, it's just a debug entry. curTime is just the CURRENT TIME and fetchTime is the time in the future after which the record must be fetched again. The fetch time is ...
   Author: Markus Jelsma, 2012-12-17, 12:45
RE: shouldFetch rejected - Nutch - [mail # user]
...Hi - curTime does not exceed fetchTime, thus the record is not eligible for fetch.    ...
   Author: Markus Jelsma, 2012-12-17, 12:40
RE: Best practices for running Nutch - Nutch - [mail # user]
...A long running fetcher: - allows possible memory leak to accumulate until disaster; - looses more records if it terminates; - can run even longer because more records to shuffle in map reduc...
   Author: Markus Jelsma, 2012-12-14, 17:34
RE: identify domains from fetch lists taking lot of time. - Nutch - [mail # user]
...Hi - you have to get rid of those URL's via URL filters. If you cannot filter them out you can set the fetcher time limit (see nutch-default) to limit the time the fetcher runs or set the fe...
   Author: Markus Jelsma, 2012-12-14, 09:00
RE: [ANNOUNCE] Apache Nutch 1.6 Released - Nutch - [mail # user]
...Thanks Lewis! :)      ...
   Author: Markus Jelsma, 2012-12-10, 11:48
RE: fetcher partitioning - Nutch - [mail # user]
...      The partitioner decides which record ends up in which fetch list. When running locally, there is always one fetch list and one mapper to ingest that fetch list.   T...
   Author: Markus Jelsma, 2012-12-10, 11:46
RE: fetcher partitioning - Nutch - [mail # user]
...Sourajit,  Looks fine at a first glance. A partitioner does not partition between threads, only mappers. It also makes little sense because in the fetcher number of threads can be set p...
   Author: Markus Jelsma, 2012-12-10, 10:53
[NUTCH-1232] Remove host  field from index-basic - Nutch - [issue]
...Either fields needs to be removed, it makes no sense to have two identical values for separate fields. I propose to get rid of the site field and leave the host field. This may be a breaking...
http://issues.apache.org/jira/browse/NUTCH-1232    Author: Markus Jelsma, 2012-12-10, 04:35
Sort:
project
Nutch (1767)
Solr (909)
Tika (56)
Lucene (9)
type
mail # user (1302)
mail # dev (270)
issue (195)
date
last 7 days (0)
last 30 days (1)
last 90 days (21)
last 6 months (182)
last 9 months (1767)
author
Markus Jelsma (1767)
Lewis John Mcgibbney (1118)
Julien Nioche (805)
Mattmann, Chris A (402)
lewis john mcgibbney (334)
Andrzej Bialecki (302)
Ferdy Galema (224)
Bai Shen (161)
Tejas Patil (161)
Sebastian Nagel (155)
kiran chitturi (155)
alxsss@...)
remi tassing (133)
Lewis John McGibbney (129)
Gabriele Kahlout (115)