Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Nutch, mail # dev - Nutch 2.0 roadmap


Copy link to this message
-
Nutch 2.0 roadmap
Julien Nioche 2010-04-06, 13:43
Hi guys,

I gather that we'll jump straight to  2.0 after 1.1 and that 2.0 will be
based on what is currently referred to as NutchBase. Shall we create a
branch for 2.0 in the Nutch SVN repository and have a label accordingly for
JIRA so that we can file issues / feature requests on 2.0? Do you think that
the current NutchBase could be used as a basis for the 2.0 branch?

Talking about features, what else would we add apart from :

* support for HBase : via ORM or not (see
NUTCH-808<https://issues.apache.org/jira/browse/NUTCH-808>
)
* plugin cleanup : Tika only for parsing - get rid of everything else?
* remove index / search and delegate to SOLR
* new functionalities e.g. sitemap support, canonical tag etc...

I suppose that http://wiki.apache.org/nutch/Nutch2Architecture needs an
update?

I look forward to hearing your thoughts on this

Julien
--
DigitalPebble Ltd
http://www.digitalpebble.com