Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: .   Results from 11 to 20 from 17022 (0.184s).
Loading phrases to help you
refine your search...
Re: Nutch not passing latest CrawlDatum to IndexingFilter plugin - Nutch - [mail # user]
...Hi Liaokz,  No, or only partially: - multiple CrawlDatums are merged:   determine new status, fetch time, etc.   It is not that the last datum is just written into CrawlDb. &n...
   Author: Sebastian Nagel, 2013-06-18, 21:41
Re: Why webPage.getContent().array() is returning html of all pages in seed.txt ? - Nutch - [mail # user]
...Hi Tony,  On Tue, Jun 18, 2013 at 11:49 AM, Tony Mullins wrote:   I suspect that this should not be happening at all!    This does not make sense Tony. When would a call ...
   Author: Lewis John Mcgibbney, 2013-06-18, 20:51
Re: Why webPage.getContent().array() is returning html of all pages in seed.txt ? - Nutch - [mail # user]
...Lewis, I am also doing the same  but in my ParseFilter plugin. And instead of returning html of the current page it is returning me the url of all the pages in seed.txt  Could you ...
   Author: Tony Mullins, 2013-06-18, 18:49
Re: Why webPage.getContent().array() is returning html of all pages in seed.txt ? - Nutch - [mail # user]
...Please take a look at the WebTableReader [0] Tony at around lines 408 - 420. This works perfectly for dumps of my webdb in Cassandra and should work well for you. hth  [0] http://svn.ap...
   Author: Lewis John Mcgibbney, 2013-06-18, 18:18
Re: Why webPage.getContent().array() is returning html of all pages in seed.txt ? - Nutch - [mail # user]
...Guyz, Its a serious issue, could any one plz help me here that why its doing so ?  I have just crawled www.google.nl and www.bing.com  and in my log file ( I am logging the html in...
   Author: Tony Mullins, 2013-06-18, 17:50
Re: Nutch 2.1 / Hbase / Gora / Solr - Nutch - [mail # user]
...Unfortunately you need to downgrade your hbase distribution. We currently support 0.90.X   On Tuesday, June 18, 2013, dima  wrote: org.apache.hadoop.mapreduce.lib.input.FileInputFo...
   Author: Lewis John Mcgibbney, 2013-06-18, 16:08
Re: Nutch Site - Nutch - [mail # dev]
...Woot you da man Lewis  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 911...
   Author: Mattmann, Chris A, 2013-06-18, 15:17
Re: what is stored in the hbase after inject job - Nutch - [mail # user]
...Hi Tejas,  What about the scores? Where are they stored? I could not find.  On 06/13/2013 09:00 PM, Tejas Patil wrote:...
   Author: Ahmet Emre Aladağ, 2013-06-18, 14:52
Re: Nutch 2.1 / Hbase / Gora / Solr - Nutch - [mail # user]
...Hello, did anyone have this exception when running Nutch 2.2 with Gora and Hbase 0.94.8 Hbase is up and running(tested the shell) but when running nutch inject url I get this error: Injector...
   Author: dima, 2013-06-18, 14:34
Re: DBUpdateJob failed - Exception job failed: name=update-table, - Nutch - [mail # user]
...Lewis ,  I am getting the same error on some other url as well. So there is some issue with such urls which is causing exception in dbupdate job.  Any idea how what could be the re...
   Author: Tony Mullins, 2013-06-18, 11:32
Sort:
project
Lucene (136345)
Solr (105614)
ElasticSearch (35140)
Mahout (31755)
Nutch (16927)
ManifoldCF (15210)
Tika (6014)
Lucene.Net (5810)
PyLucene (1924)
Droids (1674)
Lucy (1405)
OpenRelevance (286)
type
mail # user (10753)
mail # dev (2144)
javadoc (1790)
issue (1548)
source code (477)
wiki (201)
Sematext # blog (95)
web site (14)
date
last 7 days (139)
last 30 days (434)
last 90 days (982)
last 6 months (2213)
last 9 months (14713)
author
Markus Jelsma (1783)
Lewis John Mcgibbney (1183)
Julien Nioche (817)
Mattmann, Chris A (406)
lewis john mcgibbney (336)
Andrzej Bialecki (302)
Ferdy Galema (229)
Tejas Patil (219)
Bai Shen (177)
kiran chitturi (165)
Sebastian Nagel (164)
alxsss@...)
remi tassing (133)
Lewis John McGibbney (129)
Gabriele Kahlout (115)