| clear query|facets|time |
Search criteria: .
Results from 1 to 10 from
155 (0.12s).
|
|
|
Loading phrases to help you refine your search...
|
|
Re: Unable to parse flv and epub file contents using nutch - Nutch - [mail # dev]
|
|
...No, you don't have to: the plugin parse-tika can parse .epub and .flv - see http://tika.apache.org/1.2/formats.html - test it, eg: % bin/nutch parsechecker http://.../book.epub ...
|
|
|
Author: Sebastian Nagel,
2013-05-13, 22:08
|
|
|
Re: Nutch to index filesystem meta data? - Nutch - [mail # user]
|
|
...That's possible but not out-of-the-box. The available plugin protocol-file does the opposite - get the files raw content to be passed to a parser to extract plain-text content a...
|
|
|
Author: Sebastian Nagel,
2013-05-13, 21:36
|
|
|
Re: [DISCUSS] Apache Nutch 2.2 Release Candidate - Nutch - [mail # dev]
|
|
...+1 Agreed, testing will definitely take its time. But we should port it soon after 2.2 is released: one of the biggest advantages of the pluggable indexing it that porting indexer back...
|
|
|
Author: Sebastian Nagel,
2013-05-09, 10:03
|
|
|
Re: version for apache nutch giraph integration and irc - Nutch - [mail # dev]
|
|
...Hi Mike Trunk is for the 1.x releases (last release is 1.6), while branches/2.x is 2.1, etc. Webgraph hasn't been ported to 2.x yet, see NUTCH-875. I would say that depen...
|
|
|
Author: Sebastian Nagel,
2013-04-27, 18:57
|
|
|
Re: Nutch 2 hanging after aborting hung threads - Nutch - [mail # user]
|
|
...Yes, you are right. The threads are still alive, see NUTCH-1182. And the fetcher job is not finished after fetcher threads have finished: fetched data has to be written to disk/hdfs/storage....
|
|
|
Author: Sebastian Nagel,
2013-04-24, 21:17
|
|
|
Re: Nutch 2 hanging after aborting hung threads - Nutch - [mail # user]
|
|
...Hi, if fetcher.parse is the default (=false) the OOM is caused by fetcher itself (not while parsing). Because document content is buffered as byte[] (almost no memory overhead): - eith...
|
|
|
Author: Sebastian Nagel,
2013-04-23, 19:52
|
|
|
Re: Nutch 2 hanging after aborting hung threads - Nutch - [mail # user]
|
|
...There could be a couple of reasons why the timeout happens on the server but not on the local machine. Can you try to limit http.content.limit and try again? On 04/22/2013 09:17 ...
|
|
|
Author: Sebastian Nagel,
2013-04-22, 19:39
|
|
|
Re: java.lang.RuntimeException: Filter org.apache.nutch.urlfilter.prefix.PrefixURLFilter not found. - Nutch - [mail # dev]
|
|
...Does the property plugin.includes include "urlfilter-prefix"? Default is only "urlfilter-regex". On 04/22/2013 06:28 PM, naveen shukla wrote:...
|
|
|
Author: Sebastian Nagel,
2013-04-22, 19:27
|
|
|
Re: Nutch 2 hanging after aborting hung threads - Nutch - [mail # user]
|
|
...Hi, more information would be useful: - exact Nutch version (2.?) - how Nutch is called (eg, via bin/crawl) - details of the configuration, esp. -depth -topN http....
|
|
|
Author: Sebastian Nagel,
2013-04-22, 18:58
|
|
|
Re: Next Release Cycle - Nutch - [mail # dev]
|
|
...Hi Lewis, +1 it's time: May for 2.2 and beginning of June for 1.7 to adhere to the 6-month release cycle. After sorting major/critical issues for 1.7 with patches available...
|
|
|
Author: Sebastian Nagel,
2013-04-15, 20:53
|
|
|
|