| clear query|facets|time |
Search criteria: .
Results from 121 to 130 from
1767 (0.127s).
|
|
|
Loading phrases to help you refine your search...
|
|
[NUTCH-1327] QueryStringNormalizer - Nutch - [issue]
|
|
...A normalizer for dealing with query strings. Sorting query strings is helpful in preventing duplicates for some (bad) websites....
|
|
|
http://issues.apache.org/jira/browse/NUTCH-1327
Author: Markus Jelsma,
2012-12-06, 14:53
|
|
|
[NUTCH-1337] WebGraph to follow redirects - Nutch - [issue]
|
|
...With the current WebGraph URL shortening services `steal` inlinks from the actual target pages. The WebGraph OutlinkDB Mapper should use the target URL instead if there is any....
|
|
|
http://issues.apache.org/jira/browse/NUTCH-1337
Author: Markus Jelsma,
2012-12-06, 14:53
|
|
|
[NUTCH-1414] Date extraction parse filter - Nutch - [issue]
|
|
...Date extraction parse filter for Nutch to provide means to extract an arbitrary page date (article date) from the parse text....
|
|
|
http://issues.apache.org/jira/browse/NUTCH-1414
Author: Markus Jelsma,
2012-12-06, 14:53
|
|
|
[NUTCH-1335] OutlinkDB to collect unique URL's only - Nutch - [issue]
|
|
...The aggregating code in the Outlink reducer does not take care of incoming duplicates. When the input segments contain duplicates of a single URL they are collected....
|
|
|
http://issues.apache.org/jira/browse/NUTCH-1335
Author: Markus Jelsma,
2012-12-06, 14:53
|
|
|
[NUTCH-1151] Index-anchor to add numInlinks count - Nutch - [issue]
|
|
...Issue to improve in index-anchor to add the number of inlinks per document. This count is useful for calculating some authority metric in the search server.T...
|
|
|
http://issues.apache.org/jira/browse/NUTCH-1151
Author: Markus Jelsma,
2012-12-06, 14:53
|
|
|
[NUTCH-1194] CrawlDB lock should be released earlier - Nutch - [issue]
|
|
...Lock on the CrawlDB is released when everything is finished. But when generating many segments, the lock remains in place while it's not neccessary anymore. If GENERATE_UPDATE_DB is false we...
|
|
|
http://issues.apache.org/jira/browse/NUTCH-1194
Author: Markus Jelsma,
2012-12-06, 14:53
|
|
|
[NUTCH-1202] Fetcher timebomb kills long waiting fetch jobs - Nutch - [issue]
|
|
...The timebomb feature kills of mappers of jobs that have been waiting too long in the job queue. The timebomb feature should start at mapper initialization instead, not in job init.Thoughts?...
|
|
|
http://issues.apache.org/jira/browse/NUTCH-1202
Author: Markus Jelsma,
2012-12-06, 14:53
|
|
|
[NUTCH-1377] Add option to index via CloudSolrServer instead - Nutch - [issue]
|
|
...Nutch indexes to a specific Solr server. With SolrCloud on its way we can still use the current indexer and point to any server. However, the SolrCloudServer can connect to ZooKeeper instead...
|
|
|
http://issues.apache.org/jira/browse/NUTCH-1377
Author: Markus Jelsma,
2012-12-06, 14:53
|
|
|
[NUTCH-1034] Create Solr Velocity templates - Nutch - [issue]
|
|
...Solr has Velocity integration and provides an easy method for creating HTML based front-ends for the search engine. This issue tracks the development of Velocity templates specifically for N...
|
|
|
http://issues.apache.org/jira/browse/NUTCH-1034
Author: Markus Jelsma,
2012-12-06, 14:53
|
|
|
[NUTCH-1103] Port protocol-sftp to 1.4 - Nutch - [issue]
|
|
...Port protocol-sftp from trunk back to 1.4...
|
|
|
http://issues.apache.org/jira/browse/NUTCH-1103
Author: Markus Jelsma,
2012-12-06, 14:53
|
|
|
|