Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: .   Results from 1 to 10 from 16631 (0.138s).
Loading phrases to help you
refine your search...
Re: Explanation of RegexURLFIlterTestBase benchmark's - Nutch - [mail # user]
...Standard micro-benchmark issues with Java, run the 50 last and it'll run faster.  JVM warmup, and JIT compilation, yadda, yadda, yadda.   On Thu, May 23, 2013 at 1:57 PM, Lewis Joh...
   Author: Kirby Bohling, 2013-05-24, 00:06
Nutch 2.1: extension point ParseFilter: doc is null - Nutch - [mail # user]
...Dear nutchers,  I extended the ParseFilter extension point  public Parse filter(String url, WebPage page, Parse parse,     HTMLMetaTags metaTags, DocumentFragment doc) { ...
   Author: Martin Aesch, 2013-05-23, 21:28
[NOTICE] Nutch 2.X RC#1 Imminent - Nutch - [mail # dev]
...Hi All, A short notice to say that I will push the RC for 2.X once NUTCH-1575 is pushed. This will mean that there are no more issues remaining for 2.2. I pushed on all issues with patches t...
   Author: Lewis John Mcgibbney, 2013-05-23, 20:47
Re: Nutch 2.1 pdf parsing - Nutch - [mail # user]
...Hi Lewis,  thank you very much. I will try your solution.   2013/5/23 Lewis John Mcgibbney      Adriana Farina...
   Author: Adriana Farina, 2013-05-23, 20:31
Re: error crawling - Nutch - [mail # user]
...I do not think that script works in nutch-2.x. For example I see this $bin/nutch generate $commonOptions $CRAWL_ID/crawldb $CRAWL_ID/segments -topN $sizeFetchlist -numFetchers $numSlaves -no...
   Author: alxsss@..., 2013-05-23, 20:16
Explanation of RegexURLFIlterTestBase benchmark's - Nutch - [mail # user]
...Hi All, A really nice aspect of the regex (urlfilter-automaton and urfilter-regex) plugin implementation's in Nutch is that there is a small but very useful RegexURLFilterBaseTest [0] which ...
   Author: Lewis John Mcgibbney, 2013-05-23, 19:57
Re: Nutch 2.1 pdf parsing - Nutch - [mail # user]
...Hi Adriana, If I were you I would switch your logging to DEBUG for the ParserJob  - log4j.logger.org.apache.nutch.parse.ParserJob=INFO,cmdstdout + log4j.logger.org.apache.nutch.parse.Pa...
   Author: Lewis John Mcgibbney, 2013-05-23, 18:09
Nutch 2.1 pdf parsing - Nutch - [mail # user]
...Hi,  I'm using Nutch 2.1 in distributed mode on top of Hadoop 1.0.4, with HBase 0.90.4 as database.  I wrote a Java class from which I run the crawling cycle, the code that impleme...
   Author: Adriana Farina, 2013-05-23, 15:14
Re: Nutch 2.1 - Unauthorized - Nutch - [mail # user]
...Hi,  I think he is referring to this issue:  https://issues.apache.org/jira/browse/NUTCH-1575  BR, Tobias  Am 22.05.2013 um 18:14 schrieb Lewis John Mcgibbney:   Tob...
   Author: Tobias Marx, 2013-05-23, 09:56
OutOfMemoryError for bin/nutch elasticindex ocpnutch -all - Nutch - [mail # user]
...Dear List,  I have been following the instructions at http://wiki.apache.org/nutch/Nutch2Tutorial to see if I can get a nutch installation running with ElasticSearch. I have successfull...
   Author: Nicholas W, 2013-05-23, 08:47
Sort:
project
Lucene (129991)
Solr (103955)
ElasticSearch (33782)
Mahout (31288)
Nutch (16539)
ManifoldCF (15125)
Tika (5956)
Lucene.Net (5782)
PyLucene (1905)
Droids (1664)
Lucy (1357)
OpenRelevance (286)
type
mail # user (10403)
mail # dev (2106)
javadoc (1790)
issue (1548)
source code (477)
wiki (201)
Sematext # blog (92)
web site (14)
date
last 7 days (55)
last 30 days (273)
last 90 days (1003)
last 6 months (2167)
last 9 months (14326)
author
Markus Jelsma (1767)
Lewis John Mcgibbney (1122)
Julien Nioche (805)
Mattmann, Chris A (402)
lewis john mcgibbney (334)
Andrzej Bialecki (302)
Ferdy Galema (224)
Tejas Patil (163)
Bai Shen (161)
Sebastian Nagel (156)
kiran chitturi (155)
alxsss@...)
remi tassing (133)
Lewis John McGibbney (129)
Gabriele Kahlout (115)