Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: .   Results from 231 to 240 from 817 (0.729s).
Loading phrases to help you
refine your search...
Re: Common Crawl dataset - Nutch - [mail # user]
...I think Common Crawl which uses a slightly different definition of ARCs, not sure though. Anyway they have released a library to read/write to their format https://github.com/commoncrawl/com...
   Author: Julien Nioche, 2012-05-23, 08:23
Re: Apache Nutch release 1.5 RC2 - Nutch - [mail # dev]
...Read http://people.apache.org/~lewismc/nutch-1.5-rc2/ :-)  On 22 May 2012 20:59, Lewis John Mcgibbney wrote:     * *Open Source Solutions for Text Engineering  http://dig...
   Author: Julien Nioche, 2012-05-22, 20:18
Re: 1.5 RC2 - Nutch - [mail # dev]
...Brilliant! Thanks Lewis  On 22 May 2012 20:30, Lewis John Mcgibbney wrote:    * *Open Source Solutions for Text Engineering  http://digitalpebble.blogspot.com/ http://www...
   Author: Julien Nioche, 2012-05-22, 20:16
Re: 1.5 RC2 - Nutch - [mail # dev]
...we'd need to duplicate the tasks tar and zip so that they operate on what package-bin produces + rename the output of the standard package into nutch-X-src. The modif I made to build.xml doe...
   Author: Julien Nioche, 2012-05-22, 19:17
Re: Get Parent of URLs fetched by nutch - Nutch - [mail # user]
...Implement your own scoring filter and add the URL of the source to the targets' metadata. See https://issues.apache.org/jira/browse/NUTCH-1331 for something (vaguely) related  On 22 May...
   Author: Julien Nioche, 2012-05-22, 11:03
Re: svn commit: r1341365 - /nutch/trunk/ivy/mvn.template - Nutch - [mail # dev]
...cut and paste :-) Ferdy wasn't there at all etc... Fixed! Thanks  On 22 May 2012 10:33, Lewis John Mcgibbney wrote:     * *Open Source Solutions for Text Engineering  htt...
   Author: Julien Nioche, 2012-05-22, 09:38
1.5 RC2 - Nutch - [mail # dev]
...Hi Lewis,  I am sure that Chris will have no problem with you doing the RC2. Chris? It would be a good thing to have more than one person who knows how to do it anyway :-) Note that to ...
   Author: Julien Nioche, 2012-05-22, 09:15
Re: Bug in Trunk Generator mapper? - Nutch - [mail # dev]
...Hi Lewis  [Moved to dev@]  We could normalise before filtering in the mapper indeed. Whether this is accidental or on purpose is not clear. PLease open a JIRA for this. On a differ...
   Author: Julien Nioche, 2012-05-21, 19:32
Re: [VOTE] Apache Nutch 1.5 release rc #1 - Nutch - [mail # user]
...can't remember the name of the task right now but should be easy to find out by looking at the build.xml. You'll need to make sure that the maven tasks jars are in the lib dirr. Don't think ...
   Author: Julien Nioche, 2012-05-19, 18:33
Re: Tika parser exception IndexOutOfBoundsException - Nutch - [mail # user]
...Try setting  http.content.limit to a very large value or -1. The parser sometimes chokes on truncated content  On 15 May 2012 15:17, LEVILLAIN Olivier wrote:    * *Open S...
   Author: Julien Nioche, 2012-05-15, 14:43
Sort:
project
Nutch (817)
Tika (37)
Lucene (30)
Mahout (8)
Solr (5)
ManifoldCF (4)
Droids (1)
type
mail # user (434)
mail # dev (261)
issue (122)
date
last 7 days (1)
last 30 days (12)
last 90 days (25)
last 6 months (63)
last 9 months (817)
author
Markus Jelsma (1783)
Lewis John Mcgibbney (1183)
Julien Nioche (817)
Mattmann, Chris A (406)
lewis john mcgibbney (337)
Andrzej Bialecki (302)
Ferdy Galema (229)
Tejas Patil (219)
Bai Shen (177)
kiran chitturi (165)
Sebastian Nagel (164)
alxsss@...)
remi tassing (133)
Lewis John McGibbney (129)
Gabriele Kahlout (115)