| clear query|facets|time |
Search criteria: .
Results from 11 to 20 from
17022 (0.184s).
|
|
|
Loading phrases to help you refine your search...
|
|
Re: Nutch not passing latest CrawlDatum to IndexingFilter plugin - Nutch - [mail # user]
|
|
...Hi Liaokz, No, or only partially: - multiple CrawlDatums are merged: determine new status, fetch time, etc. It is not that the last datum is just written into CrawlDb. &n...
|
|
|
Author: Sebastian Nagel,
2013-06-18, 21:41
|
|
|
Re: Why webPage.getContent().array() is returning html of all pages in seed.txt ? - Nutch - [mail # user]
|
|
...Hi Tony, On Tue, Jun 18, 2013 at 11:49 AM, Tony Mullins wrote: I suspect that this should not be happening at all! This does not make sense Tony. When would a call ...
|
|
|
Author: Lewis John Mcgibbney,
2013-06-18, 20:51
|
|
|
Re: Why webPage.getContent().array() is returning html of all pages in seed.txt ? - Nutch - [mail # user]
|
|
...Lewis, I am also doing the same but in my ParseFilter plugin. And instead of returning html of the current page it is returning me the url of all the pages in seed.txt Could you ...
|
|
|
Author: Tony Mullins,
2013-06-18, 18:49
|
|
|
Re: Why webPage.getContent().array() is returning html of all pages in seed.txt ? - Nutch - [mail # user]
|
|
...Please take a look at the WebTableReader [0] Tony at around lines 408 - 420. This works perfectly for dumps of my webdb in Cassandra and should work well for you. hth [0] http://svn.ap...
|
|
|
Author: Lewis John Mcgibbney,
2013-06-18, 18:18
|
|
|
Re: Why webPage.getContent().array() is returning html of all pages in seed.txt ? - Nutch - [mail # user]
|
|
...Guyz, Its a serious issue, could any one plz help me here that why its doing so ? I have just crawled www.google.nl and www.bing.com and in my log file ( I am logging the html in...
|
|
|
Author: Tony Mullins,
2013-06-18, 17:50
|
|
|
Re: Nutch 2.1 / Hbase / Gora / Solr - Nutch - [mail # user]
|
|
...Unfortunately you need to downgrade your hbase distribution. We currently support 0.90.X On Tuesday, June 18, 2013, dima wrote: org.apache.hadoop.mapreduce.lib.input.FileInputFo...
|
|
|
Author: Lewis John Mcgibbney,
2013-06-18, 16:08
|
|
|
Re: Nutch Site - Nutch - [mail # dev]
|
|
...Woot you da man Lewis ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 911...
|
|
|
Author: Mattmann, Chris A,
2013-06-18, 15:17
|
|
|
Re: what is stored in the hbase after inject job - Nutch - [mail # user]
|
|
...Hi Tejas, What about the scores? Where are they stored? I could not find. On 06/13/2013 09:00 PM, Tejas Patil wrote:...
|
|
|
Author: Ahmet Emre Aladağ,
2013-06-18, 14:52
|
|
|
Re: Nutch 2.1 / Hbase / Gora / Solr - Nutch - [mail # user]
|
|
...Hello, did anyone have this exception when running Nutch 2.2 with Gora and Hbase 0.94.8 Hbase is up and running(tested the shell) but when running nutch inject url I get this error: Injector...
|
|
|
Author: dima,
2013-06-18, 14:34
|
|
|
Re: DBUpdateJob failed - Exception job failed: name=update-table, - Nutch - [mail # user]
|
|
...Lewis , I am getting the same error on some other url as well. So there is some issue with such urls which is causing exception in dbupdate job. Any idea how what could be the re...
|
|
|
Author: Tony Mullins,
2013-06-18, 11:32
|
|
|
|