| clear query|facets|time |
Search criteria: .
Results from 11 to 17 from
17 (0.105s).
|
|
|
Loading phrases to help you refine your search...
|
|
[NUTCH-1410] impact of a map-reduce problem - Nutch - [issue]
|
|
...with a simple test , found that each mapper or reducer have a local view of variables. in Nutch, there are multiple places that share a variable between mappers or reducers , for example in ...
|
|
|
http://issues.apache.org/jira/browse/NUTCH-1410
Author: behnam nikbakht,
2013-01-12, 18:51
|
|
|
[NUTCH-1331] limit crawler to defined depth - Nutch - [issue]
|
|
...there is a need to limit crawler to some defined depth, and importance of this option is to avoid crawling of infinite loops, with dynamic generated urls, that occur in some sites, and to op...
|
|
|
http://issues.apache.org/jira/browse/NUTCH-1331
Author: behnam nikbakht,
2012-12-22, 04:30
|
|
|
[NUTCH-1347] fetcher politeness related to map-reduce - Nutch - [issue]
|
|
...when Nutch is running on Hadoop , based on map-reduce concept, each map task do some thing on it's owned data, so, each fetcher map-task work with it's Queues and do not know any thing about...
|
|
|
http://issues.apache.org/jira/browse/NUTCH-1347
Author: behnam nikbakht,
2012-12-19, 13:56
|
|
|
[NUTCH-1328] a problem with regex-normalize.xml - Nutch - [issue]
|
|
...there is a regex-pattern in regex-normalize.xml:<pattern>([;_]?((?i)l|j|bv_)?((?i)sid|phpsessid|sessionid)=.*?)(?|&|#|$)</pattern>that remove session ids from url...
|
|
|
http://issues.apache.org/jira/browse/NUTCH-1328
Author: behnam nikbakht,
2012-07-10, 22:20
|
|
|
[NUTCH-1288] Generator should not generate filter and not found and denied and gone and permanently moved pages - Nutch - [issue]
|
|
...Generator should not generate filter and not found and denied and gone and permanently moved pages.in the shouldFetch method in AbstractFetchSchedule, CrawlDatum must checked against special...
|
|
|
http://issues.apache.org/jira/browse/NUTCH-1288
Author: behnam nikbakht,
2012-02-21, 10:13
|
|
|
[NUTCH-1204] not all of pages parsed - Nutch - [issue]
|
|
...when we fetch a site in multiple segments, and dump crawldb with readdb, the system says that some of pages are unfetched, and when we checked, we find that these pages were fetched and stor...
|
|
|
http://issues.apache.org/jira/browse/NUTCH-1204
Author: behnam nikbakht,
2011-11-14, 15:40
|
|
|
[NUTCH-1199] unfetched URLs problem - Nutch - [issue]
|
|
...we write a script to fetch unfetched urls:#first dump from readdb to a text file, and extract unfetched urls to a text file: bin/nutch readdb $crawldb -dump $...
|
|
|
http://issues.apache.org/jira/browse/NUTCH-1199
Author: behnam nikbakht,
2011-11-08, 09:52
|
|
|
|