Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Lucene and all its subprojects:

Switch to Threaded View
Mahout >> mail # dev >> Helping out with the .7 release


Copy link to this message
-
RE: Helping out with the .7 release

Thank you, I'll get started on this over the weekend.

> Date: Thu, 23 Feb 2012 13:33:42 +0530
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
> Subject: Re: Helping out with the .7 release
>
> Saikat,
>
> I have created https://issues.apache.org/jira/browse/MAHOUT-981 for
> refactoring KMeansDriver to use the new ClusterClassificationDriver.
>
> You can provide your patches on this issue. See this to know how to
> provide a patch
> https://cwiki.apache.org/MAHOUT/how-to-contribute.html#HowToContribute-Generatingapatch.
>
> Before KMeans refactoring, we are expecting the
> ClusterClassificationMapperTest from you ( for Mahout-929 ). That test
> case would complete the development of ClusterClassificationDriver and
> the refactoring can start.
>
> Paritosh
>
> On 23-02-2012 04:55, Jeff Eastman wrote:
> > Hi Saikat,
> >
> > Glad you're excited. Paritosh offered one suggestion below. You could
> > look at TestKmeansClustering for patterns you could use to test the
> > ClusterClassificationMapper and Driver in MR mode. That should be
> > straightforward, but please coordinate with Paritosh so you don't
> > duplicate efforts.
> >
> > Another place you might look into would be the KMeansDriver and
> > MAHOUT-930. You could work on refactoring KMeansDriver to use the new
> > ClusterClassificationDriver in MAHOUT-929. That would exercise both
> > its sequential and MR options. It will be interesting to see how much
> > code can be removed.
> >
> > Finally, you could see if you can wrap your mind around the
> > ClusterIterator and how it could be used for further refactoring of
> > the KMeansDriver. See TestClusterClassifier for insight.
> >
> > That enough reading and doing for now?
> > Jeff
> >
> > On 2/22/12 10:06 AM, Saikat Kanjilal wrote:
> >> Jeff,I'm pretty excited to help out with this, so as a starter can
> >> you point me to where I should begin my readings of the code, I
> >> havent looked too closely but are there certain classes in the
> >> clustering area where this refactoring effort is centered around.
> >> Regards
> >>
> >>> Date: Wed, 22 Feb 2012 08:56:23 -0700
> >>> From: [EMAIL PROTECTED]
> >>> To: [EMAIL PROTECTED]
> >>> Subject: Re: Helping out with the .7 release
> >>>
> >>> Hi Saikat,
> >>>
> >>> I agree with Paritosh, that a great place to begin would be to write
> >>> some unit tests. This will familiarize you with the code base and help
> >>> us a lot with our 0.7 housekeeping release. The new clustering
> >>> classification components are going to unify many - but not all - of
> >>> the
> >>> existing clustering algorithms to reduce their complexity by factoring
> >>> out duplication and streamlining their integration into semi-supervised
> >>> classification engines.
> >>>
> >>> Please feel free to post any questions you may have in reading through
> >>> this code. This is a major refactoring effort and we will need all the
> >>> help we can get. Thanks for the offer,
> >>>
> >>> Jeff
> >>>
> >>> On 2/21/12 10:46 PM, Saikat Kanjilal wrote:
> >>>> Hi Paritosh,Yes creating the test case would be a great first
> >>>> start, however are there other tasks you guys need help with before
> >>>> I can do before the test creation, I will sync trunk and start
> >>>> reading through the code in the meantime.Regards
> >>>>
> >>>>> Date: Wed, 22 Feb 2012 10:57:51 +0530
> >>>>> From: [EMAIL PROTECTED]
> >>>>> To: [EMAIL PROTECTED]
> >>>>> Subject: Re: Helping out with the .7 release
> >>>>>
> >>>>> We are creating clustering as classification components which will
> >>>>> help
> >>>>> in moving clustering out. Once the component is ready, then the
> >>>>> clustering algorithms would need refactoring.
> >>>>> The clustering as classification component and the outlier removal
> >>>>> component has been created.
> >>>>>
> >>>>> Most of it is committed, and rest is available as a patch. See
> >>>>> https://issues.apache.org/jira/browse/MAHOUT-929
> >>>>> If you will apply the latest patch available on Mahout-929 you can
     
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB