| clear query|facets|time |
Search criteria: .
Results from 141 to 150 from
302 (0.356s).
|
|
|
Loading phrases to help you refine your search...
|
|
Re: Tika HTML parsing - Nutch - [mail # dev]
|
|
...On 2010-08-15 20:01, Ken Krugler wrote: Cool. Great, that was one example of invalid HTML from our parse-html tests. Sounds great. We have a set of torture tests ...
|
|
|
Author: Andrzej Bialecki,
2010-08-15, 21:12
|
|
|
Re: Tika HTML parsing - Nutch - [mail # dev]
|
|
...On 2010-08-15 06:54, Ken Krugler wrote: Thanks Ken for pushing forward this work! A few questions: * does this include image maps as well ()? * how does the code treat inva...
|
|
|
Author: Andrzej Bialecki,
2010-08-15, 07:04
|
|
|
Benchmark: max fetcher speed - Nutch - [mail # dev]
|
|
...Hi, Here's a status line from a Benchmark job that I ran recently: 0/0 threads spinwaiting 38996 pages, 1 errors, 557.1 pages/s, 16995 kb/s, 0 URLs in 2 queues > reduce ...
|
|
|
Author: Andrzej Bialecki,
2010-08-13, 15:27
|
|
|
Re: TikaParser - Nutch - [mail # user]
|
|
...On 2010-08-13 09:15, reinhard schwab wrote: This is a known issue. For now, use parse-html for HTML parsing. Best regards, Andrzej Bialecki < ___. ___ __...
|
|
|
Author: Andrzej Bialecki,
2010-08-13, 07:57
|
|
|
[NUTCH-876] Remove remaining robots/IP blocking code in lib-http - Nutch - [issue]
|
|
...There are remains of the (very old) blocking code in lib-http/.../HttpBase.java. This code was used with the OldFetcher to manage politeness limits. New trunk doesn't have OldFetcher anymore...
|
|
|
http://issues.apache.org/jira/browse/NUTCH-876
Author: Andrzej Bialecki,
2010-08-11, 09:34
|
|
|
Hsqldb 2.0 conflicts with Hsqldb 1.8 in Hadoop - Nutch - [mail # dev]
|
|
...Hi, I was trying to run Benchmark in trunk using MySQL, on a standalone Hadoop cluster. My conf/gora.properties has this: gora.sqlstore.jdbc.driver=com.mysql.jdbc.Driver go...
|
|
|
Author: Andrzej Bialecki,
2010-08-10, 17:03
|
|
|
Re: [VOTE] Apache Nutch 1.2 Release Candidate #1 - Nutch - [mail # dev]
|
|
...On 2010-08-08 03:04, Mattmann, Chris A (388J) wrote: +1 - all tests pass, a sample crawl works without problems, both in local and in distributed mode. Best regards, Andrz...
|
|
|
Author: Andrzej Bialecki,
2010-08-09, 13:44
|
|
|
Re: crawldb - DatanodeRegistration - EOFException - Nutch - [mail # user]
|
|
...On 2010-08-06 22:58, Emmanuel de Castro Santana wrote: Hadoop network usage patterns are sometimes taxing for the network equipment - I've seen strange errors pop up in situation...
|
|
|
Author: Andrzej Bialecki,
2010-08-07, 07:57
|
|
|
Re: Embed the Crawl API in my application - Nutch - [mail # user]
|
|
...On 2010-08-06 21:01, Roger Marin wrote: This is currently difficult... The dependency on POSIX utilities can be cut out from Hadoop but not easily - Java API doesn't give access ...
|
|
|
Author: Andrzej Bialecki,
2010-08-07, 07:33
|
|
|
[NUTCH-867] Port Nutch benchmark to Nutchbase - Nutch - [issue]
|
|
...Bring tools from NUTCH-863 to Nutchbase, and measure the performance of the Nutchbase branch vs. trunk....
|
|
|
http://issues.apache.org/jira/browse/NUTCH-867
Author: Andrzej Bialecki,
2010-08-05, 13:53
|
|
|
|