Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: .   Results from 101 to 110 from 805 (0.281s).
Loading phrases to help you
refine your search...
Re: Keeping History/Archive with Nutch 2.x - Nutch - [mail # user]
...Hi James  You could have a custom map reduce job to copy the documents with a custom ID as you just described. Another option would be to use Nutch 2 + HBase and set a large value of ve...
   Author: Julien Nioche, 2012-10-09, 13:30
Re: language profile in Nutch 1.5 - Nutch - [mail # user]
...Hello Patricio  The language identification is delegated to Tika since 1.4 ( https://issues.apache.org/jira/browse/NUTCH-1075) so you should create your own models with Tika instead. As...
   Author: Julien Nioche, 2012-10-08, 10:09
Re: [ANNOUNCE] Apache Nutch 2.1 Released - Nutch - [mail # dev]
...Thanks Lewis and well done everyone! Enjoy your week end  Julien  On 5 October 2012 16:12, lewis john mcgibbney  wrote:     * *Open Source Solutions for Text Enginee...
   Author: Julien Nioche, 2012-10-05, 16:29
Re: Index HTML raw content - Nutch - [mail # user]
...Hi Matteo  Not so much the Tika plugin but is simply that the raw content is not indexed. This can be done however by writing a custom parser that will generate a new field for the raw ...
   Author: Julien Nioche, 2012-10-05, 10:05
Re: [PING] [VOTE] Apache Nutch 2.1 Release Candidate Available - Nutch - [mail # user]
...Only the Apache distribution of Hadoop version 1.0.3 is officially supported by Nutch. Obviously if we can get it to work on other distribution then the better it is but this can't be consid...
   Author: Julien Nioche, 2012-10-03, 13:37
Re: nutch-2.0 generate in deploy mode - Nutch - [mail # user]
...Guys,  There are certainly overheads in using the distributed mode (communication with servers etc...) and moving the job file around, unpacking it etc... but before we start taking abo...
   Author: Julien Nioche, 2012-10-02, 09:11
Re: priorised/scored fetching - Nutch - [mail # user]
...you should be able to do that with a custom scoring filter and give a score based on the mime type  On 2 October 2012 08:28, Markus Jelsma  wrote:     * *Open Source Solu...
   Author: Julien Nioche, 2012-10-02, 08:34
Re: [VOTE] Apache Nutch 2.1 Release Candidate Available - Nutch - [mail # dev]
...Would be good to get thumb-ups from people who've tested crawls on other backends (Cassandra, Hbase) before pushing the release.  I can't really give a +1 as I've just checked the most ...
   Author: Julien Nioche, 2012-10-01, 14:34
Re: [VOTE] Apache Nutch 2.1 Release Candidate Available - Nutch - [mail # dev]
...Ok, thanks. Was trying to get a minimalistic crawl of http://nutch.apache.org/ with MySQL but no success so far (the URL is not being fetched). Unfortunately won't have the time to investiga...
   Author: Julien Nioche, 2012-10-01, 13:18
Re: [VOTE] Apache Nutch 2.1 Release Candidate Available - Nutch - [mail # dev]
...Shouldn't the dependency for gora-sql point to v 0.2.1?  On 21 September 2012 16:07, Lewis John Mcgibbney wrote:     * *Open Source Solutions for Text Engineering  http:/...
   Author: Julien Nioche, 2012-10-01, 12:36
Sort:
project
Nutch (805)
Tika (37)
Lucene (30)
Mahout (8)
Solr (5)
ManifoldCF (4)
Droids (1)
type
mail # user (430)
mail # dev (253)
issue (122)
date
last 7 days (1)
last 30 days (7)
last 90 days (28)
last 6 months (68)
last 9 months (805)
author
Markus Jelsma (1767)
Lewis John Mcgibbney (1118)
Julien Nioche (805)
Mattmann, Chris A (402)
lewis john mcgibbney (334)
Andrzej Bialecki (302)
Ferdy Galema (224)
Bai Shen (161)
Tejas Patil (161)
Sebastian Nagel (155)
kiran chitturi (155)
alxsss@...)
remi tassing (133)
Lewis John McGibbney (129)
Gabriele Kahlout (115)