Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: .   Results from 21 to 30 from 156 (0.539s).
Loading phrases to help you
refine your search...
Re: DiskChecker$DiskErrorException - Nutch - [mail # user]
...Hi Alexei,  principally, in local mode you cannot run more than one Hadoop job concurrently, or you have to use disjoint hadoop.tmp.dir properties. There have been a few posts on this l...
   Author: Sebastian Nagel, 2013-03-04, 20:53
Re: Nutch 1.6 : java.lang.OutOfMemoryError: unable to create new native thread - Nutch - [mail # user]
...After all documents are fetched (and ev. parsed) the segment has to be written: finish sorting the data and copy it from local temp dir (hadoop.tmp.dir) to the segment directory. If IO is a ...
   Author: Sebastian Nagel, 2013-03-04, 20:33
Re: Nutch 1.6 : java.lang.OutOfMemoryError: unable to create new native thread - Nutch - [mail # user]
...That would mean: you need 200 rounds and also 200 segments for 400k documents. That's a work-around no solution!  If you find the time you should trace the process. Seems to be either a...
   Author: Sebastian Nagel, 2013-03-03, 20:56
Re: Nutch 1.6 : java.lang.OutOfMemoryError: unable to create new native thread - Nutch - [mail # user]
...Hi Kiran,  there are many possible reasons for the problem. Beside the limits on the number of processes the stack size in the Java VM and the system (see java -Xss and ulimit -s). &nbs...
   Author: Sebastian Nagel, 2013-03-03, 20:41
Re: Nutch 1.6 with Java - not loading correct configuration file - Nutch - [mail # user]
...Hi,   – configuration files are found via Java’s classpath  – only the first instance of each file found in one   of the directories of the classpath is used  – settings ...
   Author: Sebastian Nagel, 2013-02-21, 20:23
Re: Is there a bug in the crawl script coming with nutch 1.6 ? - Nutch - [mail # user]
...Hi Amit, hi Lewis,  see NUTCH-1500 for details.  You can take  http://svn.apache.org/repos/asf/nutch/trunk/src/bin/crawl and replace (runtime/local/)bin/crawl of 1.6. It shoul...
   Author: Sebastian Nagel, 2013-02-19, 19:44
Re: mime type text/plain - Nutch - [mail # user]
...No, I didn't try to follow the redirects.  If you follow them (nytimes is sending you around, 10 redirects or more), finally the page gets fetched and parsed succussfully. Can you try t...
   Author: Sebastian Nagel, 2013-02-04, 21:30
Re: Nutch Incremental Crawl - Nutch - [mail # user]
...Hi David,  the first steps are right but maybe it's easier to run the Java classes via bin/nutch:  bin/nutch freegen  urls2/  freegen_segments/ # generated: freegen_segme...
   Author: Sebastian Nagel, 2013-02-04, 21:00
Re: mime type text/plain - Nutch - [mail # user]
...Hi,  the given URL is a redirect (HTTP 303, at least, when I try) with no content (only the HTTP header). Tried with curl and Nutch's parsechecker tool:  % bin/nutch parsechecker "...
   Author: Sebastian Nagel, 2013-02-02, 15:13
Re: Nutch Incremental Crawl - Nutch - [mail # user]
...Hi David,  Yes. That's correct.  Yes, provided that you know which documents have been changed, of course. Have a look at o.a.n.tools.FreeGenerator (Nutch 1.x). Start a segment for...
   Author: Sebastian Nagel, 2013-02-01, 23:57
Sort:
project
Nutch (156)
Tika (1)
type
mail # user (90)
mail # dev (39)
issue (27)
date
last 7 days (1)
last 30 days (6)
last 90 days (24)
last 6 months (52)
last 9 months (156)
author
Markus Jelsma (1767)
Lewis John Mcgibbney (1125)
Julien Nioche (805)
Mattmann, Chris A (402)
lewis john mcgibbney (334)
Andrzej Bialecki (302)
Ferdy Galema (224)
Tejas Patil (164)
Bai Shen (161)
Sebastian Nagel (156)
kiran chitturi (155)
alxsss@...)
remi tassing (133)
Lewis John McGibbney (129)
Gabriele Kahlout (115)