| clear query|facets|time |
Search criteria: .
Results from 1 to 8 from
8 (0.5s).
|
|
|
Loading phrases to help you refine your search...
|
|
Re: Nutch2 + Cassandra - Nutch - [mail # user]
|
|
...When running with gora 0.2.1, the outlinks field was not filed in. Spent quite a bit of time trying to figure out whats wrong but unsuccessfully On Fri, Sep 21, 2012 at 2:41 AM,...
|
|
|
Author: Žygimantas Medelis,
2012-09-21, 05:40
|
|
|
Re: Nutch2 + Cassandra - Nutch - [mail # user]
|
|
...Its the problem with gora v0.2.1 which does not work with current nutch 2. Have also tested with sql store also fails. Changing dependency to gora v0.2 and rebuilding solves the problem &nbs...
|
|
|
Author: Žygimantas Medelis,
2012-09-19, 12:54
|
|
|
Re: Nutch2 + Cassandra - Nutch - [mail # user]
|
|
...After inject [default@webpage] list f; Using default limit of 100 RowKey: 6c742e62616c7361732e7777773a687474702f => (column=6669, value=00278d00, timestamp=1348032953800000) => (...
|
|
|
Author: Žygimantas Medelis,
2012-09-19, 06:07
|
|
|
Nutch2 + Cassandra - Nutch - [mail # user]
|
|
...Hi, I have nutch2 configured with a Cassandra backed (as described there http://sujitpal.blogspot.com/2012/01/exploring-nutch-gora-with-cassandra.html And it fails to fetch pages...
|
|
|
Author: Žygimantas Medelis,
2012-09-18, 13:34
|
|
|
URLFilter based on anchor text - Nutch - [mail # user]
|
|
...Hi, URLFilters allow to filter links based on content of the URL. Is it possible to extend filters so as to filter links based on their anchor text? URLFilter takes only url as its par...
|
|
|
Author: Žygimantas Medelis,
2011-01-14, 22:19
|
|
|
Changing html indexing content - Nutch - [mail # user]
|
|
...Hi, The web pages that I am indexing contain loads of links with anchor text, while links are needed to crawl the pages, anchor text pollutes my index. So I want to get rid of them. My...
|
|
|
Author: Žygimantas Medelis,
2010-11-17, 14:34
|
|
|
Shared objects between plugin instances - Nutch - [mail # user]
|
|
...Hi, Nutch initializes plugins multiple times and so any expensive initialization procedures are being executed more than once. For example plugin needs to make a connection to the data...
|
|
|
Author: Žygimantas Medelis,
2010-11-06, 20:51
|
|
|
Crawling sub-pages but not indexing parent page - Nutch - [mail # user]
|
|
...Hi, I am crawling pages which are organized so as to present a parent category page which in turn lists items in that category. Category page can list subcategories and only then you g...
|
|
|
Author: Žygimantas Medelis,
2010-10-09, 19:51
|
|
|
|