Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Lucene and all its subprojects:

Switch to Threaded View
Nutch >> mail # dev >> Nutch 2.0 roadmap


Copy link to this message
-
Nutch 2.0 roadmap
Hi guys,

I gather that we'll jump straight to  2.0 after 1.1 and that 2.0 will be
based on what is currently referred to as NutchBase. Shall we create a
branch for 2.0 in the Nutch SVN repository and have a label accordingly for
JIRA so that we can file issues / feature requests on 2.0? Do you think that
the current NutchBase could be used as a basis for the 2.0 branch?

Talking about features, what else would we add apart from :

* support for HBase : via ORM or not (see
NUTCH-808<https://issues.apache.org/jira/browse/NUTCH-808>
)
* plugin cleanup : Tika only for parsing - get rid of everything else?
* remove index / search and delegate to SOLR
* new functionalities e.g. sitemap support, canonical tag etc...

I suppose that http://wiki.apache.org/nutch/Nutch2Architecture needs an
update?

I look forward to hearing your thoughts on this

Julien
--
DigitalPebble Ltd
http://www.digitalpebble.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB