Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: .   Results from 1 to 10 from 17 (0.363s).
Loading phrases to help you
refine your search...
[NUTCH-1329] parser not extract outlinks to external web sites - Nutch - [issue]
...found a bug in /src/plugin/parse-html/src/java/org/apache/nutch/parse/html/DOMContentUtils.java, that outlinks like www.example2.com from www.example1.com are inserted as www.example1.com/ww...
http://issues.apache.org/jira/browse/NUTCH-1329    Author: behnam nikbakht, 2013-01-20, 11:10
[NUTCH-1309] fetch queue management - Nutch - [issue]
...when run fetch in hadoop with multiple concurrent mapper, there are multiple independent fetchQueues that make hard to manage them. i suggest that construct fetchQueues before begin of run w...
http://issues.apache.org/jira/browse/NUTCH-1309    Author: behnam nikbakht, 2013-01-12, 19:19
[NUTCH-1375] extract main content of a html file - Nutch - [issue]
...i write a code, that can extract main content of a html (usally weblogs).this content usally apperas in <body><p> tag but there is no insurance. also might be multiple tags with ...
http://issues.apache.org/jira/browse/NUTCH-1375    Author: behnam nikbakht, 2013-01-12, 19:16
[NUTCH-1278] Fetch Improvement in threads per host - Nutch - [issue]
...the value of maxThreads is equal to fetcher.threads.per.host and is constant for every hostthere is a possibility with using of dynamic values for every host that influeced with number of bl...
http://issues.apache.org/jira/browse/NUTCH-1278    Author: behnam nikbakht, 2013-01-12, 19:03
[NUTCH-1281] tika parser not work properly with unwanted file types that passed from filters in nutch - Nutch - [issue]
...when in parse-plugins.xml, set this property:<mimeType name="*">        <plugin id="parse-tika" /></mimeType>all unwanted files that pass from all ...
http://issues.apache.org/jira/browse/NUTCH-1281    Author: behnam nikbakht, 2013-01-12, 18:59
[NUTCH-1270] some of Deflate encoded pages not fetched - Nutch - [issue]
...it is a problem with some of web pages that fetched but their content can not retrivedafter this change, this error fixedwe change lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBa...
http://issues.apache.org/jira/browse/NUTCH-1270    Author: behnam nikbakht, 2013-01-12, 18:57
[NUTCH-1269] Generate main problems - Nutch - [issue]
...there are some problems with current Generate method, with maxNumSegments and maxHostCount options:1. first, size of generated segments are different2. with maxHostCount option, it is unclea...
http://issues.apache.org/jira/browse/NUTCH-1269    Author: behnam nikbakht, 2013-01-12, 18:57
[NUTCH-1282] linkdb scalability - Nutch - [issue]
...as described in NUTCH-1054, the linkdb is optional in solrindex and it's usage is only for anchor and not impact on scoring. as seemed, size of linkdb in incremental crawl grow very fast and...
http://issues.apache.org/jira/browse/NUTCH-1282    Author: behnam nikbakht, 2013-01-12, 18:56
[NUTCH-1303] Fetcher to skip queues for URLS getting repeated exceptions, based on percentage - Nutch - [issue]
...as described in https://issues.apache.org/jira/browse/NUTCH-769, it is a good solution to skip queues with high exception value, but it is not easy to set value of fetcher.max.exceptions.per...
http://issues.apache.org/jira/browse/NUTCH-1303    Author: behnam nikbakht, 2013-01-12, 18:56
[NUTCH-1297] it is better for fetchItemQueues to select items from greater queues first - Nutch - [issue]
...there is a situation that if we have multiple hosts in fetch, and size of hosts were different, large hosts have a long delay until the getFetchItem() in FetchItemQueues class select a url f...
http://issues.apache.org/jira/browse/NUTCH-1297    Author: behnam nikbakht, 2013-01-12, 18:54
Sort:
project
Nutch (17)
type
issue (17)
date
last 7 days (0)
last 30 days (0)
last 90 days (0)
last 6 months (13)
last 9 months (17)
author
Markus Jelsma (1767)
Lewis John Mcgibbney (1119)
Julien Nioche (805)
Mattmann, Chris A (402)
lewis john mcgibbney (334)
Andrzej Bialecki (302)
Ferdy Galema (224)
Tejas Patil (163)
Bai Shen (161)
Sebastian Nagel (156)
kiran chitturi (155)
alxsss@...)
remi tassing (133)
Lewis John McGibbney (129)
Gabriele Kahlout (115)
behnam nikbakht