|
|
-
Re: svn commit: r965815 - in /nutch/branches/nutchbase/src: java/org/apache/nutch/parse/ParseStatus.java java/org/apache/nutch/parse/ParseText.java test/org/apache/nutch/parse/TestParseText.javaAndrzej Bialecki 2010-07-20, 19:01
On 2010-07-20 20:29, Julien Nioche wrote:
> I meant putting the migration code and 1.x Nutch jars in the contrib > directory of the trunk - that shouldn't require a different committers > list or should it? I don't feel strongly about contrib... there is a different precedent: for a while there were migration tools in the main tree for conversion between 0.8 and 0.9+. > A. branch cleaned up, SVN commits, etc., stable working > B. at some point, branch ready to be merged (assumption: branch > devel stops) > C. define branch merge into 3-5 patches Due to a total API incompatibility (CrawlDatum is replaced by a WebPage, content and link storage is different, the way we run jobs in nutchbase is also different) I don't expect more than 2 patches, of which the first one will contain 90% of API changes... > D. foreach patch in C: > create JIRA issue for patch > call for review of patch > if no objections, then commit in 24-48 hours > > E. trunk now ready for 2.0 development > F. schedule current open issues for 2.0, grab any low hanging fruit (1-2 > days) > G. all other issues pushed out to 2.1 > H. release 2.0 > > > Andrzej and myself are in the process of porting the last missing tests > in NutchBase and debugging Gora along the way. There is just a handful > of plugins which have not been ported and I should have finished that > pretty quickly. Hopefully we'll get to (A) soonish and can then follow > the plan above. > > However we still need to address the issue raise by Dogacan i.e shall we > provide tools to convert from 1.x structures to 2.0 and if so how shall > we organise it. Again - some things have been removed fom NutchBase for > the sake of clarity but since they are in the trunk they are not lost > and we can decide what to do with them later. IMO it would take enormous effort to implement a runtime compatibility between 1.x and 2.x, so users will have to either convert or recrawl. I think that at a minimum we should provide a clear procedure on how to export the old crawldb and import into a new db. If there's a strong desire to have a tool to convert 1.x segments into the new crawl job data format we could also implement this - but I don't expect there would be ... after all, segments are a throwaway property with a limited time to live... -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com |