Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: .   Results from 31 to 40 from 164 (0.515s).
Loading phrases to help you
refine your search...
Re: Nutch 1.6 : java.lang.OutOfMemoryError: unable to create new native thread - Nutch - [mail # user]
...That would mean: you need 200 rounds and also 200 segments for 400k documents. That's a work-around no solution!  If you find the time you should trace the process. Seems to be either a...
   Author: Sebastian Nagel, 2013-03-03, 20:56
Re: Nutch 1.6 : java.lang.OutOfMemoryError: unable to create new native thread - Nutch - [mail # user]
...Hi Kiran,  there are many possible reasons for the problem. Beside the limits on the number of processes the stack size in the Java VM and the system (see java -Xss and ulimit -s). &nbs...
   Author: Sebastian Nagel, 2013-03-03, 20:41
Re: Nutch 1.6 with Java - not loading correct configuration file - Nutch - [mail # user]
...Hi,   – configuration files are found via Java’s classpath  – only the first instance of each file found in one   of the directories of the classpath is used  – settings ...
   Author: Sebastian Nagel, 2013-02-21, 20:23
Re: Is there a bug in the crawl script coming with nutch 1.6 ? - Nutch - [mail # user]
...Hi Amit, hi Lewis,  see NUTCH-1500 for details.  You can take  http://svn.apache.org/repos/asf/nutch/trunk/src/bin/crawl and replace (runtime/local/)bin/crawl of 1.6. It shoul...
   Author: Sebastian Nagel, 2013-02-19, 19:44
Re: mime type text/plain - Nutch - [mail # user]
...No, I didn't try to follow the redirects.  If you follow them (nytimes is sending you around, 10 redirects or more), finally the page gets fetched and parsed succussfully. Can you try t...
   Author: Sebastian Nagel, 2013-02-04, 21:30
Re: Nutch Incremental Crawl - Nutch - [mail # user]
...Hi David,  the first steps are right but maybe it's easier to run the Java classes via bin/nutch:  bin/nutch freegen  urls2/  freegen_segments/ # generated: freegen_segme...
   Author: Sebastian Nagel, 2013-02-04, 21:00
Re: mime type text/plain - Nutch - [mail # user]
...Hi,  the given URL is a redirect (HTTP 303, at least, when I try) with no content (only the HTTP header). Tried with curl and Nutch's parsechecker tool:  % bin/nutch parsechecker "...
   Author: Sebastian Nagel, 2013-02-02, 15:13
Re: Nutch Incremental Crawl - Nutch - [mail # user]
...Hi David,  Yes. That's correct.  Yes, provided that you know which documents have been changed, of course. Have a look at o.a.n.tools.FreeGenerator (Nutch 1.x). Start a segment for...
   Author: Sebastian Nagel, 2013-02-01, 23:57
Re: Outlinks in parse filter - Nutch - [mail # dev]
...Hi Markus,  Yes, even better: FeedParser only contains URLNormalizers and URLFilters objects which get the references to plugin instances themselves via ObjectCache in the constructor. ...
   Author: Sebastian Nagel, 2013-02-01, 23:01
Re: Outlinks in parse filter - Nutch - [mail # dev]
...Hi Markus,  this would mean that urlfilter and urlnormalizer plugins are accessed from parse plugins. At a first glance, sounds somewhat oddish. But it's already the case for the feed p...
   Author: Sebastian Nagel, 2013-01-29, 21:14
Sort:
project
Nutch (164)
Tika (1)
type
mail # user (95)
mail # dev (42)
issue (27)
date
last 7 days (4)
last 30 days (9)
last 90 days (26)
last 6 months (53)
last 9 months (164)
author
Markus Jelsma (1783)
Lewis John Mcgibbney (1181)
Julien Nioche (817)
Mattmann, Chris A (406)
lewis john mcgibbney (336)
Andrzej Bialecki (302)
Ferdy Galema (229)
Tejas Patil (218)
Bai Shen (177)
kiran chitturi (165)
Sebastian Nagel (164)
alxsss@...)
remi tassing (133)
Lewis John McGibbney (129)
Gabriele Kahlout (115)