| clear query|facets|time |
Search criteria: .
Results from 21 to 30 from
156 (0.539s).
|
|
|
Loading phrases to help you refine your search...
|
|
Re: DiskChecker$DiskErrorException - Nutch - [mail # user]
|
|
...Hi Alexei, principally, in local mode you cannot run more than one Hadoop job concurrently, or you have to use disjoint hadoop.tmp.dir properties. There have been a few posts on this l...
|
|
|
Author: Sebastian Nagel,
2013-03-04, 20:53
|
|
|
Re: Nutch 1.6 : java.lang.OutOfMemoryError: unable to create new native thread - Nutch - [mail # user]
|
|
...After all documents are fetched (and ev. parsed) the segment has to be written: finish sorting the data and copy it from local temp dir (hadoop.tmp.dir) to the segment directory. If IO is a ...
|
|
|
Author: Sebastian Nagel,
2013-03-04, 20:33
|
|
|
Re: Nutch 1.6 : java.lang.OutOfMemoryError: unable to create new native thread - Nutch - [mail # user]
|
|
...That would mean: you need 200 rounds and also 200 segments for 400k documents. That's a work-around no solution! If you find the time you should trace the process. Seems to be either a...
|
|
|
Author: Sebastian Nagel,
2013-03-03, 20:56
|
|
|
Re: Nutch 1.6 : java.lang.OutOfMemoryError: unable to create new native thread - Nutch - [mail # user]
|
|
...Hi Kiran, there are many possible reasons for the problem. Beside the limits on the number of processes the stack size in the Java VM and the system (see java -Xss and ulimit -s). &nbs...
|
|
|
Author: Sebastian Nagel,
2013-03-03, 20:41
|
|
|
Re: Nutch 1.6 with Java - not loading correct configuration file - Nutch - [mail # user]
|
|
...Hi, – configuration files are found via Java’s classpath – only the first instance of each file found in one of the directories of the classpath is used – settings ...
|
|
|
Author: Sebastian Nagel,
2013-02-21, 20:23
|
|
|
Re: Is there a bug in the crawl script coming with nutch 1.6 ? - Nutch - [mail # user]
|
|
...Hi Amit, hi Lewis, see NUTCH-1500 for details. You can take http://svn.apache.org/repos/asf/nutch/trunk/src/bin/crawl and replace (runtime/local/)bin/crawl of 1.6. It shoul...
|
|
|
Author: Sebastian Nagel,
2013-02-19, 19:44
|
|
|
Re: mime type text/plain - Nutch - [mail # user]
|
|
...No, I didn't try to follow the redirects. If you follow them (nytimes is sending you around, 10 redirects or more), finally the page gets fetched and parsed succussfully. Can you try t...
|
|
|
Author: Sebastian Nagel,
2013-02-04, 21:30
|
|
|
Re: Nutch Incremental Crawl - Nutch - [mail # user]
|
|
...Hi David, the first steps are right but maybe it's easier to run the Java classes via bin/nutch: bin/nutch freegen urls2/ freegen_segments/ # generated: freegen_segme...
|
|
|
Author: Sebastian Nagel,
2013-02-04, 21:00
|
|
|
Re: mime type text/plain - Nutch - [mail # user]
|
|
...Hi, the given URL is a redirect (HTTP 303, at least, when I try) with no content (only the HTTP header). Tried with curl and Nutch's parsechecker tool: % bin/nutch parsechecker "...
|
|
|
Author: Sebastian Nagel,
2013-02-02, 15:13
|
|
|
Re: Nutch Incremental Crawl - Nutch - [mail # user]
|
|
...Hi David, Yes. That's correct. Yes, provided that you know which documents have been changed, of course. Have a look at o.a.n.tools.FreeGenerator (Nutch 1.x). Start a segment for...
|
|
|
Author: Sebastian Nagel,
2013-02-01, 23:57
|
|
|
|