Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: .   Results from 141 to 150 from 302 (0.356s).
Loading phrases to help you
refine your search...
Re: Tika HTML parsing - Nutch - [mail # dev]
...On 2010-08-15 20:01, Ken Krugler wrote:   Cool.   Great, that was one example of invalid HTML from our parse-html tests.   Sounds great.   We have a set of torture tests ...
   Author: Andrzej Bialecki, 2010-08-15, 21:12
Re: Tika HTML parsing - Nutch - [mail # dev]
...On 2010-08-15 06:54, Ken Krugler wrote:  Thanks Ken for pushing forward this work! A few questions:  * does this include image maps as well ()?  * how does the code treat inva...
   Author: Andrzej Bialecki, 2010-08-15, 07:04
Benchmark: max fetcher speed - Nutch - [mail # dev]
...Hi,  Here's a status line from a Benchmark job that I ran recently:  0/0 threads spinwaiting 38996 pages, 1 errors, 557.1 pages/s, 16995  kb/s, 0 URLs in 2 queues > reduce ...
   Author: Andrzej Bialecki, 2010-08-13, 15:27
Re: TikaParser - Nutch - [mail # user]
...On 2010-08-13 09:15, reinhard schwab wrote:  This is a known issue. For now, use parse-html for HTML parsing.   Best regards, Andrzej Bialecki     <   ___. ___ __...
   Author: Andrzej Bialecki, 2010-08-13, 07:57
[NUTCH-876] Remove remaining robots/IP blocking code in lib-http - Nutch - [issue]
...There are remains of the (very old) blocking code in lib-http/.../HttpBase.java. This code was used with the OldFetcher to manage politeness limits. New trunk doesn't have OldFetcher anymore...
http://issues.apache.org/jira/browse/NUTCH-876    Author: Andrzej Bialecki, 2010-08-11, 09:34
Hsqldb 2.0 conflicts with Hsqldb 1.8 in Hadoop - Nutch - [mail # dev]
...Hi,  I was trying to run Benchmark in trunk using MySQL, on a standalone  Hadoop cluster. My conf/gora.properties has this:  gora.sqlstore.jdbc.driver=com.mysql.jdbc.Driver go...
   Author: Andrzej Bialecki, 2010-08-10, 17:03
Re: [VOTE] Apache Nutch 1.2 Release Candidate #1 - Nutch - [mail # dev]
...On 2010-08-08 03:04, Mattmann, Chris A (388J) wrote:  +1 - all tests pass, a sample crawl works without problems, both in  local and in distributed mode.   Best regards, Andrz...
   Author: Andrzej Bialecki, 2010-08-09, 13:44
Re: crawldb - DatanodeRegistration - EOFException - Nutch - [mail # user]
...On 2010-08-06 22:58, Emmanuel de Castro Santana wrote:  Hadoop network usage patterns are sometimes taxing for the network  equipment - I've seen strange errors pop up in situation...
   Author: Andrzej Bialecki, 2010-08-07, 07:57
Re: Embed the Crawl API in my application - Nutch - [mail # user]
...On 2010-08-06 21:01, Roger Marin wrote:  This is currently difficult... The dependency on POSIX utilities can be  cut out from Hadoop but not easily - Java API doesn't give access ...
   Author: Andrzej Bialecki, 2010-08-07, 07:33
[NUTCH-867] Port Nutch benchmark to Nutchbase - Nutch - [issue]
...Bring tools from NUTCH-863 to Nutchbase, and measure the performance of the Nutchbase branch vs. trunk....
http://issues.apache.org/jira/browse/NUTCH-867    Author: Andrzej Bialecki, 2010-08-05, 13:53
Sort:
project
Lucene (487)
Nutch (302)
Solr (108)
Tika (22)
OpenRelevance (12)
ManifoldCF (2)
Droids (1)
Mahout (1)
type
mail # user (132)
mail # dev (93)
issue (77)
date
last 7 days (0)
last 30 days (0)
last 90 days (0)
last 6 months (7)
last 9 months (301)
author
Markus Jelsma (1783)
Lewis John Mcgibbney (1175)
Julien Nioche (816)
Mattmann, Chris A (405)
lewis john mcgibbney (336)
Andrzej Bialecki (302)
Ferdy Galema (229)
Tejas Patil (218)
Bai Shen (177)
kiran chitturi (165)
Sebastian Nagel (163)
alxsss@...)
remi tassing (133)
Lewis John McGibbney (129)
Gabriele Kahlout (115)