Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Mahout, mail # dev - Helping out with the .7 release


Copy link to this message
-
RE: Helping out with the .7 release
Saikat Kanjilal 2012-02-23, 08:08

Thank you, I'll get started on this over the weekend.

> Date: Thu, 23 Feb 2012 13:33:42 +0530
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
> Subject: Re: Helping out with the .7 release
>
> Saikat,
>
> I have created https://issues.apache.org/jira/browse/MAHOUT-981 for
> refactoring KMeansDriver to use the new ClusterClassificationDriver.
>
> You can provide your patches on this issue. See this to know how to
> provide a patch
> https://cwiki.apache.org/MAHOUT/how-to-contribute.html#HowToContribute-Generatingapatch.
>
> Before KMeans refactoring, we are expecting the
> ClusterClassificationMapperTest from you ( for Mahout-929 ). That test
> case would complete the development of ClusterClassificationDriver and
> the refactoring can start.
>
> Paritosh
>
> On 23-02-2012 04:55, Jeff Eastman wrote:
> > Hi Saikat,
> >
> > Glad you're excited. Paritosh offered one suggestion below. You could
> > look at TestKmeansClustering for patterns you could use to test the
> > ClusterClassificationMapper and Driver in MR mode. That should be
> > straightforward, but please coordinate with Paritosh so you don't
> > duplicate efforts.
> >
> > Another place you might look into would be the KMeansDriver and
> > MAHOUT-930. You could work on refactoring KMeansDriver to use the new
> > ClusterClassificationDriver in MAHOUT-929. That would exercise both
> > its sequential and MR options. It will be interesting to see how much
> > code can be removed.
> >
> > Finally, you could see if you can wrap your mind around the
> > ClusterIterator and how it could be used for further refactoring of
> > the KMeansDriver. See TestClusterClassifier for insight.
> >
> > That enough reading and doing for now?
> > Jeff
> >
> > On 2/22/12 10:06 AM, Saikat Kanjilal wrote:
> >> Jeff,I'm pretty excited to help out with this, so as a starter can
> >> you point me to where I should begin my readings of the code, I
> >> havent looked too closely but are there certain classes in the
> >> clustering area where this refactoring effort is centered around.
> >> Regards
> >>
> >>> Date: Wed, 22 Feb 2012 08:56:23 -0700
> >>> From: [EMAIL PROTECTED]
> >>> To: [EMAIL PROTECTED]
> >>> Subject: Re: Helping out with the .7 release
> >>>
> >>> Hi Saikat,
> >>>
> >>> I agree with Paritosh, that a great place to begin would be to write
> >>> some unit tests. This will familiarize you with the code base and help
> >>> us a lot with our 0.7 housekeeping release. The new clustering
> >>> classification components are going to unify many - but not all - of
> >>> the
> >>> existing clustering algorithms to reduce their complexity by factoring
> >>> out duplication and streamlining their integration into semi-supervised
> >>> classification engines.
> >>>
> >>> Please feel free to post any questions you may have in reading through
> >>> this code. This is a major refactoring effort and we will need all the
> >>> help we can get. Thanks for the offer,
> >>>
> >>> Jeff
> >>>
> >>> On 2/21/12 10:46 PM, Saikat Kanjilal wrote:
> >>>> Hi Paritosh,Yes creating the test case would be a great first
> >>>> start, however are there other tasks you guys need help with before
> >>>> I can do before the test creation, I will sync trunk and start
> >>>> reading through the code in the meantime.Regards
> >>>>
> >>>>> Date: Wed, 22 Feb 2012 10:57:51 +0530
> >>>>> From: [EMAIL PROTECTED]
> >>>>> To: [EMAIL PROTECTED]
> >>>>> Subject: Re: Helping out with the .7 release
> >>>>>
> >>>>> We are creating clustering as classification components which will
> >>>>> help
> >>>>> in moving clustering out. Once the component is ready, then the
> >>>>> clustering algorithms would need refactoring.
> >>>>> The clustering as classification component and the outlier removal
> >>>>> component has been created.
> >>>>>
> >>>>> Most of it is committed, and rest is available as a patch. See
> >>>>> https://issues.apache.org/jira/browse/MAHOUT-929
> >>>>> If you will apply the latest patch available on Mahout-929 you can