| clear query|facets|time |
Search criteria: .
Results from 31 to 40 from
164 (0.515s).
|
|
|
Loading phrases to help you refine your search...
|
|
Re: Nutch 1.6 : java.lang.OutOfMemoryError: unable to create new native thread - Nutch - [mail # user]
|
|
...That would mean: you need 200 rounds and also 200 segments for 400k documents. That's a work-around no solution! If you find the time you should trace the process. Seems to be either a...
|
|
|
Author: Sebastian Nagel,
2013-03-03, 20:56
|
|
|
Re: Nutch 1.6 : java.lang.OutOfMemoryError: unable to create new native thread - Nutch - [mail # user]
|
|
...Hi Kiran, there are many possible reasons for the problem. Beside the limits on the number of processes the stack size in the Java VM and the system (see java -Xss and ulimit -s). &nbs...
|
|
|
Author: Sebastian Nagel,
2013-03-03, 20:41
|
|
|
Re: Nutch 1.6 with Java - not loading correct configuration file - Nutch - [mail # user]
|
|
...Hi, – configuration files are found via Java’s classpath – only the first instance of each file found in one of the directories of the classpath is used – settings ...
|
|
|
Author: Sebastian Nagel,
2013-02-21, 20:23
|
|
|
Re: Is there a bug in the crawl script coming with nutch 1.6 ? - Nutch - [mail # user]
|
|
...Hi Amit, hi Lewis, see NUTCH-1500 for details. You can take http://svn.apache.org/repos/asf/nutch/trunk/src/bin/crawl and replace (runtime/local/)bin/crawl of 1.6. It shoul...
|
|
|
Author: Sebastian Nagel,
2013-02-19, 19:44
|
|
|
Re: mime type text/plain - Nutch - [mail # user]
|
|
...No, I didn't try to follow the redirects. If you follow them (nytimes is sending you around, 10 redirects or more), finally the page gets fetched and parsed succussfully. Can you try t...
|
|
|
Author: Sebastian Nagel,
2013-02-04, 21:30
|
|
|
Re: Nutch Incremental Crawl - Nutch - [mail # user]
|
|
...Hi David, the first steps are right but maybe it's easier to run the Java classes via bin/nutch: bin/nutch freegen urls2/ freegen_segments/ # generated: freegen_segme...
|
|
|
Author: Sebastian Nagel,
2013-02-04, 21:00
|
|
|
Re: mime type text/plain - Nutch - [mail # user]
|
|
...Hi, the given URL is a redirect (HTTP 303, at least, when I try) with no content (only the HTTP header). Tried with curl and Nutch's parsechecker tool: % bin/nutch parsechecker "...
|
|
|
Author: Sebastian Nagel,
2013-02-02, 15:13
|
|
|
Re: Nutch Incremental Crawl - Nutch - [mail # user]
|
|
...Hi David, Yes. That's correct. Yes, provided that you know which documents have been changed, of course. Have a look at o.a.n.tools.FreeGenerator (Nutch 1.x). Start a segment for...
|
|
|
Author: Sebastian Nagel,
2013-02-01, 23:57
|
|
|
Re: Outlinks in parse filter - Nutch - [mail # dev]
|
|
...Hi Markus, Yes, even better: FeedParser only contains URLNormalizers and URLFilters objects which get the references to plugin instances themselves via ObjectCache in the constructor. ...
|
|
|
Author: Sebastian Nagel,
2013-02-01, 23:01
|
|
|
Re: Outlinks in parse filter - Nutch - [mail # dev]
|
|
...Hi Markus, this would mean that urlfilter and urlnormalizer plugins are accessed from parse plugins. At a first glance, sounds somewhat oddish. But it's already the case for the feed p...
|
|
|
Author: Sebastian Nagel,
2013-01-29, 21:14
|
|
|
|