Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: .   Results from 41 to 50 from 16591 (0.081s).
Loading phrases to help you
refine your search...
how to parse epub files using plugin parse-tika - Nutch - [mail # dev]
...Hi   my requirement is to extract the contents of epub files using apache nutch and solr. In my nutch-site.xml file i have included "epub" format in pugin.includes property and in regex...
   Author: mahodaya, 2013-05-14, 05:14
Re: NUTCH1.2 ,the specific format of the dump text file? - Nutch - [mail # user]
...Thanks for your reply ,Lewis.  I think I didn't make my question easy to understand. In detail , I want to get the body text in the japanese webpage ,but there are so many kinds of code...
   Author: suzhaolong, 2013-05-14, 02:51
Re: NUTCH1.2 ,the specific format of the dump text file? - Nutch - [mail # user]
...there is a good change that the dump is in a foreign language, however this depends on which language you consider as foreign and what language it actually is. AFAIK the encoding should be i...
   Author: Lewis John Mcgibbney, 2013-05-13, 23:24
Re: Unable to parse flv and epub file contents using nutch - Nutch - [mail # dev]
...No, you don't have to: the plugin parse-tika can parse .epub and .flv - see http://tika.apache.org/1.2/formats.html - test it, eg:   % bin/nutch parsechecker http://.../book.epub  ...
   Author: Sebastian Nagel, 2013-05-13, 22:08
Re: Nutch to index filesystem meta data? - Nutch - [mail # user]
...That's possible but not out-of-the-box.  The available plugin protocol-file does the opposite - get the files raw content to be passed to a parser   to extract plain-text content a...
   Author: Sebastian Nagel, 2013-05-13, 21:36
Re: Unable to parse flv and epub file contents using nutch - Nutch - [mail # dev]
...I think, you are doing good till now. Nutch usually crawls the data and fetches the URLs of all the files, like html, pdf etc in the specified directory in binary format. Now, in order to ge...
   Author: Pankaj Kumar, 2013-05-13, 21:17
Re: Solrindex -all not working correctly - Nutch - [mail # user]
...I'm using 2.x HEAD now and I'm still seeing the same problem.  When I call solrindex -all it still indexes everything, not just the newly parsed items.   On Wed, May 1, 2013 at 2:1...
   Author: Bai Shen, 2013-05-13, 18:06
Unable to parse flv and epub file contents using nutch - Nutch - [mail # dev]
...Hi,  i am working with apache nutch and solr, my requirement is to parse the contents of flv and epub files, i am using below command to parse the files  bin/nutch crawl urls -solr...
   Author: vicky4751, 2013-05-13, 13:35
Re: HBase dependency removed from HEAD? - Nutch - [mail # user]
...NM, I grabbed trunk instead of 2.x.   On Mon, May 13, 2013 at 7:25 AM, Bai Shen  wrote:  ...
   Author: Bai Shen, 2013-05-13, 11:36
HBase dependency removed from HEAD? - Nutch - [mail # user]
...I'm trying to set up nutch using HEAD instead of 2.1.  I went to change ivy.xml to uncomment the HBase dependency before calling ant and it's not there.  Has this been removed? &nb...
   Author: Bai Shen, 2013-05-13, 11:25
Sort:
project
Lucene (129844)
Solr (103587)
ElasticSearch (33498)
Mahout (31195)
Nutch (16499)
ManifoldCF (15110)
Tika (5953)
Lucene.Net (5782)
PyLucene (1905)
Droids (1667)
Lucy (1352)
OpenRelevance (286)
type
mail # user (10376)
mail # dev (2093)
javadoc (1790)
issue (1548)
source code (477)
wiki (201)
Sematext # blog (92)
web site (14)
date
last 7 days (59)
last 30 days (295)
last 90 days (1043)
last 6 months (2189)
last 9 months (14286)
author
Markus Jelsma (1767)
Lewis John Mcgibbney (1113)
Julien Nioche (805)
Mattmann, Chris A (400)
lewis john mcgibbney (334)
Andrzej Bialecki (302)
Ferdy Galema (224)
Bai Shen (161)
Tejas Patil (158)
Sebastian Nagel (155)
kiran chitturi (155)
alxsss@...)
remi tassing (133)
Lewis John McGibbney (129)
Gabriele Kahlout (115)