Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: .   Results from 121 to 130 from 1767 (0.127s).
Loading phrases to help you
refine your search...
[NUTCH-1327] QueryStringNormalizer - Nutch - [issue]
...A normalizer for dealing with query strings. Sorting query strings is helpful in preventing duplicates for some (bad) websites....
http://issues.apache.org/jira/browse/NUTCH-1327    Author: Markus Jelsma, 2012-12-06, 14:53
[NUTCH-1337] WebGraph to follow redirects - Nutch - [issue]
...With the current WebGraph URL shortening services `steal` inlinks from the actual target pages. The WebGraph OutlinkDB Mapper should use the target URL instead if there is any....
http://issues.apache.org/jira/browse/NUTCH-1337    Author: Markus Jelsma, 2012-12-06, 14:53
[NUTCH-1414] Date extraction parse filter - Nutch - [issue]
...Date extraction parse filter for Nutch to provide means to extract an arbitrary page date (article date) from the parse text....
http://issues.apache.org/jira/browse/NUTCH-1414    Author: Markus Jelsma, 2012-12-06, 14:53
[NUTCH-1335] OutlinkDB to collect unique URL's only - Nutch - [issue]
...The aggregating code in the Outlink reducer does not take care of incoming duplicates. When the input segments contain duplicates of a single URL they are collected....
http://issues.apache.org/jira/browse/NUTCH-1335    Author: Markus Jelsma, 2012-12-06, 14:53
[NUTCH-1151] Index-anchor to add numInlinks count - Nutch - [issue]
...Issue to improve in index-anchor to add the number of inlinks per document. This count is useful for calculating some authority metric in the search server.T...
http://issues.apache.org/jira/browse/NUTCH-1151    Author: Markus Jelsma, 2012-12-06, 14:53
[NUTCH-1194] CrawlDB lock should be released earlier - Nutch - [issue]
...Lock on the CrawlDB is released when everything is finished. But when generating many segments, the lock remains in place while it's not neccessary anymore. If GENERATE_UPDATE_DB is false we...
http://issues.apache.org/jira/browse/NUTCH-1194    Author: Markus Jelsma, 2012-12-06, 14:53
[NUTCH-1202] Fetcher timebomb kills long waiting fetch jobs - Nutch - [issue]
...The timebomb feature kills of mappers of jobs that have been waiting too long in the job queue. The timebomb feature should start at mapper initialization instead, not in job init.Thoughts?...
http://issues.apache.org/jira/browse/NUTCH-1202    Author: Markus Jelsma, 2012-12-06, 14:53
[NUTCH-1377] Add option to index via CloudSolrServer instead - Nutch - [issue]
...Nutch indexes to a specific Solr server. With SolrCloud on its way we can still use the current indexer and point to any server. However, the SolrCloudServer can connect to ZooKeeper instead...
http://issues.apache.org/jira/browse/NUTCH-1377    Author: Markus Jelsma, 2012-12-06, 14:53
[NUTCH-1034] Create Solr Velocity templates - Nutch - [issue]
...Solr has Velocity integration and provides an easy method for creating HTML based front-ends for the search engine. This issue tracks the development of Velocity templates specifically for N...
http://issues.apache.org/jira/browse/NUTCH-1034    Author: Markus Jelsma, 2012-12-06, 14:53
[NUTCH-1103] Port protocol-sftp to 1.4 - Nutch - [issue]
...Port protocol-sftp from trunk back to 1.4...
http://issues.apache.org/jira/browse/NUTCH-1103    Author: Markus Jelsma, 2012-12-06, 14:53
Sort:
project
Nutch (1767)
Solr (910)
Tika (56)
Lucene (9)
type
mail # user (1302)
mail # dev (270)
issue (195)
date
last 7 days (0)
last 30 days (0)
last 90 days (20)
last 6 months (176)
last 9 months (1767)
author
Markus Jelsma (1767)
Lewis John Mcgibbney (1125)
Julien Nioche (805)
Mattmann, Chris A (402)
lewis john mcgibbney (334)
Andrzej Bialecki (302)
Ferdy Galema (224)
Tejas Patil (164)
Bai Shen (161)
Sebastian Nagel (156)
kiran chitturi (155)
alxsss@...)
remi tassing (133)
Lewis John McGibbney (129)
Gabriele Kahlout (115)