Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Mahout >> mail # dev >> Helping out with the .7 release

Copy link to this message
Re: Helping out with the .7 release

I have created https://issues.apache.org/jira/browse/MAHOUT-981 for
refactoring KMeansDriver to use the new ClusterClassificationDriver.

You can provide your patches on this issue. See this to know how to
provide a patch

Before KMeans refactoring, we are expecting the
ClusterClassificationMapperTest from you ( for Mahout-929 ). That test
case would complete the development of ClusterClassificationDriver and
the refactoring can start.


On 23-02-2012 04:55, Jeff Eastman wrote:
> Hi Saikat,
> Glad you're excited. Paritosh offered one suggestion below. You could
> look at TestKmeansClustering for patterns you could use to test the
> ClusterClassificationMapper and Driver in MR mode. That should be
> straightforward, but please coordinate with Paritosh so you don't
> duplicate efforts.
> Another place you might look into would be the KMeansDriver and
> MAHOUT-930. You could work on refactoring KMeansDriver to use the new
> ClusterClassificationDriver in MAHOUT-929. That would exercise both
> its sequential and MR options. It will be interesting to see how much
> code can be removed.
> Finally, you could see if you can wrap your mind around the
> ClusterIterator and how it could be used for further refactoring of
> the KMeansDriver. See TestClusterClassifier for insight.
> That enough reading and doing for now?
> Jeff
> On 2/22/12 10:06 AM, Saikat Kanjilal wrote:
>> Jeff,I'm pretty excited to help out with this, so as a starter can
>> you point me to where I should begin my readings of the code, I
>> havent looked too closely but are there certain classes in the
>> clustering area where this refactoring effort is centered around.
>> Regards
>>> Date: Wed, 22 Feb 2012 08:56:23 -0700
>>> Subject: Re: Helping out with the .7 release
>>> Hi Saikat,
>>> I agree with Paritosh, that a great place to begin would be to write
>>> some unit tests. This will familiarize you with the code base and help
>>> us a lot with our 0.7 housekeeping release. The new clustering
>>> classification components are going to unify many - but not all - of
>>> the
>>> existing clustering algorithms to reduce their complexity by factoring
>>> out duplication and streamlining their integration into semi-supervised
>>> classification engines.
>>> Please feel free to post any questions you may have in reading through
>>> this code. This is a major refactoring effort and we will need all the
>>> help we can get. Thanks for the offer,
>>> Jeff
>>> On 2/21/12 10:46 PM, Saikat Kanjilal wrote:
>>>> Hi Paritosh,Yes creating the test case would be a great first
>>>> start, however are there other tasks you guys need help with before
>>>> I can do before the test creation, I will sync trunk and start
>>>> reading through the code in the meantime.Regards
>>>>> Date: Wed, 22 Feb 2012 10:57:51 +0530
>>>>> Subject: Re: Helping out with the .7 release
>>>>> We are creating clustering as classification components which will
>>>>> help
>>>>> in moving clustering out. Once the component is ready, then the
>>>>> clustering algorithms would need refactoring.
>>>>> The clustering as classification component and the outlier removal
>>>>> component has been created.
>>>>> Most of it is committed, and rest is available as a patch. See
>>>>> https://issues.apache.org/jira/browse/MAHOUT-929
>>>>> If you will apply the latest patch available on Mahout-929 you can
>>>>> see
>>>>> all that is available now.
>>>>> If you want, you can help with the test case of
>>>>> ClusterClassificationMapper available in the patch.
>>>>> On 22-02-2012 10:27, Saikat Kanjilal wrote:
>>>>>> Hi Guys,I was interested in helping out with the clustering
>>>>>> component of mahout, I looked through the JIRA items below and