Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: .   Results from 1 to 10 from 155 (0.12s).
Loading phrases to help you
refine your search...
Re: Unable to parse flv and epub file contents using nutch - Nutch - [mail # dev]
...No, you don't have to: the plugin parse-tika can parse .epub and .flv - see http://tika.apache.org/1.2/formats.html - test it, eg:   % bin/nutch parsechecker http://.../book.epub  ...
   Author: Sebastian Nagel, 2013-05-13, 22:08
Re: Nutch to index filesystem meta data? - Nutch - [mail # user]
...That's possible but not out-of-the-box.  The available plugin protocol-file does the opposite - get the files raw content to be passed to a parser   to extract plain-text content a...
   Author: Sebastian Nagel, 2013-05-13, 21:36
Re: [DISCUSS] Apache Nutch 2.2 Release Candidate - Nutch - [mail # dev]
...+1  Agreed, testing will definitely take its time. But we should port it soon after 2.2 is released: one of the biggest advantages of the pluggable indexing it that porting indexer back...
   Author: Sebastian Nagel, 2013-05-09, 10:03
Re: version for apache nutch giraph integration and irc - Nutch - [mail # dev]
...Hi Mike   Trunk is for the 1.x releases (last release is 1.6), while branches/2.x is 2.1, etc.   Webgraph hasn't been ported to 2.x yet, see NUTCH-875.  I would say that depen...
   Author: Sebastian Nagel, 2013-04-27, 18:57
Re: Nutch 2 hanging after aborting hung threads - Nutch - [mail # user]
...Yes, you are right. The threads are still alive, see NUTCH-1182. And the fetcher job is not finished after fetcher threads have finished: fetched data has to be written to disk/hdfs/storage....
   Author: Sebastian Nagel, 2013-04-24, 21:17
Re: Nutch 2 hanging after aborting hung threads - Nutch - [mail # user]
...Hi,  if fetcher.parse is the default (=false) the OOM is caused by fetcher itself (not while parsing). Because document content is buffered as byte[] (almost no memory overhead): - eith...
   Author: Sebastian Nagel, 2013-04-23, 19:52
Re: Nutch 2 hanging after aborting hung threads - Nutch - [mail # user]
...There could be a couple of reasons why the timeout happens on the server but not on the local machine.  Can you try to limit http.content.limit and try again?  On 04/22/2013 09:17 ...
   Author: Sebastian Nagel, 2013-04-22, 19:39
Re: java.lang.RuntimeException: Filter org.apache.nutch.urlfilter.prefix.PrefixURLFilter not found. - Nutch - [mail # dev]
...Does the property plugin.includes include "urlfilter-prefix"? Default is only "urlfilter-regex".  On 04/22/2013 06:28 PM, naveen shukla wrote:...
   Author: Sebastian Nagel, 2013-04-22, 19:27
Re: Nutch 2 hanging after aborting hung threads - Nutch - [mail # user]
...Hi,  more information would be useful: - exact Nutch version (2.?) - how Nutch is called (eg, via bin/crawl) - details of the configuration, esp.   -depth   -topN   http....
   Author: Sebastian Nagel, 2013-04-22, 18:58
Re: Next Release Cycle - Nutch - [mail # dev]
...Hi Lewis,  +1  it's time: May for 2.2 and beginning of June for 1.7 to adhere to the 6-month release cycle.  After sorting major/critical issues for 1.7 with patches available...
   Author: Sebastian Nagel, 2013-04-15, 20:53
Sort:
project
Nutch (155)
type
mail # user (90)
mail # dev (38)
issue (27)
date
last 7 days (2)
last 30 days (9)
last 90 days (25)
last 6 months (54)
last 9 months (155)
author
Markus Jelsma (1767)
Lewis John Mcgibbney (1114)
Julien Nioche (805)
Mattmann, Chris A (401)
lewis john mcgibbney (334)
Andrzej Bialecki (302)
Ferdy Galema (224)
Bai Shen (161)
Tejas Patil (160)
Sebastian Nagel (155)
kiran chitturi (155)
alxsss@...)
remi tassing (133)
Lewis John McGibbney (129)
Gabriele Kahlout (115)