| clear query|facets|time |
Search criteria: .
Results from 1 to 10 from
17 (0.363s).
|
|
|
Loading phrases to help you refine your search...
|
|
[NUTCH-1329] parser not extract outlinks to external web sites - Nutch - [issue]
|
|
...found a bug in /src/plugin/parse-html/src/java/org/apache/nutch/parse/html/DOMContentUtils.java, that outlinks like www.example2.com from www.example1.com are inserted as www.example1.com/ww...
|
|
|
http://issues.apache.org/jira/browse/NUTCH-1329
Author: behnam nikbakht,
2013-01-20, 11:10
|
|
|
[NUTCH-1309] fetch queue management - Nutch - [issue]
|
|
...when run fetch in hadoop with multiple concurrent mapper, there are multiple independent fetchQueues that make hard to manage them. i suggest that construct fetchQueues before begin of run w...
|
|
|
http://issues.apache.org/jira/browse/NUTCH-1309
Author: behnam nikbakht,
2013-01-12, 19:19
|
|
|
[NUTCH-1375] extract main content of a html file - Nutch - [issue]
|
|
...i write a code, that can extract main content of a html (usally weblogs).this content usally apperas in <body><p> tag but there is no insurance. also might be multiple tags with ...
|
|
|
http://issues.apache.org/jira/browse/NUTCH-1375
Author: behnam nikbakht,
2013-01-12, 19:16
|
|
|
[NUTCH-1278] Fetch Improvement in threads per host - Nutch - [issue]
|
|
...the value of maxThreads is equal to fetcher.threads.per.host and is constant for every hostthere is a possibility with using of dynamic values for every host that influeced with number of bl...
|
|
|
http://issues.apache.org/jira/browse/NUTCH-1278
Author: behnam nikbakht,
2013-01-12, 19:03
|
|
|
[NUTCH-1281] tika parser not work properly with unwanted file types that passed from filters in nutch - Nutch - [issue]
|
|
...when in parse-plugins.xml, set this property:<mimeType name="*"> <plugin id="parse-tika" /></mimeType>all unwanted files that pass from all ...
|
|
|
http://issues.apache.org/jira/browse/NUTCH-1281
Author: behnam nikbakht,
2013-01-12, 18:59
|
|
|
[NUTCH-1270] some of Deflate encoded pages not fetched - Nutch - [issue]
|
|
...it is a problem with some of web pages that fetched but their content can not retrivedafter this change, this error fixedwe change lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBa...
|
|
|
http://issues.apache.org/jira/browse/NUTCH-1270
Author: behnam nikbakht,
2013-01-12, 18:57
|
|
|
[NUTCH-1269] Generate main problems - Nutch - [issue]
|
|
...there are some problems with current Generate method, with maxNumSegments and maxHostCount options:1. first, size of generated segments are different2. with maxHostCount option, it is unclea...
|
|
|
http://issues.apache.org/jira/browse/NUTCH-1269
Author: behnam nikbakht,
2013-01-12, 18:57
|
|
|
[NUTCH-1282] linkdb scalability - Nutch - [issue]
|
|
...as described in NUTCH-1054, the linkdb is optional in solrindex and it's usage is only for anchor and not impact on scoring. as seemed, size of linkdb in incremental crawl grow very fast and...
|
|
|
http://issues.apache.org/jira/browse/NUTCH-1282
Author: behnam nikbakht,
2013-01-12, 18:56
|
|
|
[NUTCH-1303] Fetcher to skip queues for URLS getting repeated exceptions, based on percentage - Nutch - [issue]
|
|
...as described in https://issues.apache.org/jira/browse/NUTCH-769, it is a good solution to skip queues with high exception value, but it is not easy to set value of fetcher.max.exceptions.per...
|
|
|
http://issues.apache.org/jira/browse/NUTCH-1303
Author: behnam nikbakht,
2013-01-12, 18:56
|
|
|
[NUTCH-1297] it is better for fetchItemQueues to select items from greater queues first - Nutch - [issue]
|
|
...there is a situation that if we have multiple hosts in fetch, and size of hosts were different, large hosts have a long delay until the getFetchItem() in FetchItemQueues class select a url f...
|
|
|
http://issues.apache.org/jira/browse/NUTCH-1297
Author: behnam nikbakht,
2013-01-12, 18:54
|
|
|
|