| clear query|facets|time |
Search criteria: .
Results from 1 to 10 from
816 (0.355s).
|
|
|
Loading phrases to help you refine your search...
|
|
Re: HTMLParseFilter equivalent in Nutch 2.2 ??? - Nutch - [mail # user]
|
|
...They are called ParseFilters in 2.x : http://nutch.apache.org/apidocs-2.2/org/apache/nutch/parse/ParseFilter.html as they are not limited to processing HTML documents since Tika generates SA...
|
|
|
Author: Julien Nioche,
2013-06-12, 12:46
|
|
|
Re: Data Extraction from 100+ different sites... - Nutch - [mail # user]
|
|
...What I usually do in cases like these is to propagate an identifier from the seeds and use that in the HTMLParsers to determine whether they should process a page. See url-meta plugin for th...
|
|
|
Author: Julien Nioche,
2013-06-12, 08:01
|
|
|
Re: [DISCUSS] Nutch 1.7 ready for release? - Nutch - [mail # dev]
|
|
...Have just committed NUTCH-1522 for both 2-x and trunk On 10 June 2013 12:07, Julien Nioche wrote: * *Open Source Solutions for Text Engineering http://digita...
|
|
|
Author: Julien Nioche,
2013-06-10, 11:27
|
|
|
Re: [DISCUSS] Nutch 1.7 ready for release? - Nutch - [mail # dev]
|
|
...Have added the upgrade to Tika 1.3 to v1.7 https://issues.apache.org/jira/browse/NUTCH-1522. It should be quite straightforward to include and would be a shame not to do it for this release....
|
|
|
Author: Julien Nioche,
2013-06-10, 11:07
|
|
|
Re: [DISCUSS] Nutch 1.7 ready for release? - Nutch - [mail # dev]
|
|
...+1 to release now but it would have been nice to do https://issues.apache.org/jira/browse/NUTCH-1527 as part of the same release. The main change introduced in this version is the pluggable ...
|
|
|
Author: Julien Nioche,
2013-06-10, 07:48
|
|
|
Re: [RESULT] WAS: Re: [VOTE] Apache Nutch 2.2 Release Candidate - Nutch - [mail # dev]
|
|
...Hi Lewis, The md5, asc and sha are now correct. Thanks for fixing it. Have a nice week end Julien On 7 June 2013 21:16, Lewis John Mcgibbney wrote: * *O...
|
|
|
Author: Julien Nioche,
2013-06-08, 10:09
|
|
|
Job opening in Bristol, UK - Nutch - [mail # dev]
|
|
...[Apologies for cross posting] We are looking for a candidate with the following skills and expertise : * experience in web crawling, ideally with Apache Nutch  ...
|
|
|
Author: Julien Nioche,
2013-06-06, 18:59
|
|
|
Re: Unable to crawl google search results - Nutch - [mail # user]
|
|
...Check your URL filters e.g. that you removed the lines below which are there by default *# skip URLs containing certain characters as probable queries, etc.* *-[?*!@=]* Julien &n...
|
|
|
Author: Julien Nioche,
2013-06-04, 22:23
|
|
|
Re: [VOTE] Apache Nutch 2.2 Release Candidate - Nutch - [mail # dev]
|
|
...Hi, The keys live at http://www.apache.org/dist/nutch/KEYS ( http://nutch.apache.org/dist/KEYS gives a 404) Aren't the .asc files supposed to be there as well? ( http://www.apach...
|
|
|
Author: Julien Nioche,
2013-06-04, 06:34
|
|
|
Re: Generic LinkRank plugin for Nutch - Nutch - [mail # dev]
|
|
...Hi Ahmet, You don't need to use the ScoringFilters at all. The nutch.scoring.webgraph package can be taken as an example of how to do. It works fine as far as I know but what we ...
|
|
|
Author: Julien Nioche,
2013-05-30, 08:41
|
|
|
|