Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Plain View
Mahout, mail # dev - Re: [jira] [Commented] (MAHOUT-929) Refactor Clustering (Vector Classification) into a Separate Postprocess with Outlier Pruning


Copy link to this message
-
Re: [jira] [Commented] (MAHOUT-929) Refactor Clustering (Vector Classification) into a Separate Postprocess with Outlier Pruning
Jeff Eastman 2012-02-23, 12:29
Just +1 <grin>

On 2/22/12 10:35 PM, Paritosh Ranjan (Commented) (JIRA) wrote:
>      [ https://issues.apache.org/jira/browse/MAHOUT-929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214329#comment-13214329 ]
>
> Paritosh Ranjan commented on MAHOUT-929:
> ----------------------------------------
>
> Assigned to myself.
>
> I think cluster classification driver is developed now. Would wait for some time for the ClusterClassificationMapper's Test case ( patch ) as we asked on dev.
>
> Else I will write it and commit it. Might need help while committing for the first time.
>
> Considering, ClusterClassificationDriver development is done, we need to refactor the KMeans, FuzzyK, Dirichlet, Canopy Drivers.
> I will create separate child issues for refactoring these algos, so that different people can pick it in parallel, if they want. It will help in avoiding duplicate efforts.
>
> Jeff, any comments/suggestions?
>
>> Refactor Clustering (Vector Classification) into a Separate Postprocess with Outlier Pruning
>> --------------------------------------------------------------------------------------------
>>
>>                  Key: MAHOUT-929
>>                  URL: https://issues.apache.org/jira/browse/MAHOUT-929
>>              Project: Mahout
>>           Issue Type: Improvement
>>           Components: Classification, Clustering
>>     Affects Versions: 0.6
>>             Reporter: Jeff Eastman
>>             Assignee: Paritosh Ranjan
>>              Fix For: 0.7
>>
>>          Attachments: Mahout-929, Mahout-929, Mahout-929, Mahout-929
>>
>>
>> The current clustering drivers have a -cp option to produce clusteredPoints directory containing the input vectors classified by the final clusters produced by the algorithm. These options are redundantly implemented in those drivers.
>> - Factor out&  implement an independent post processor to perform the classification step independently of the various clustering implementations.
>> - Implement a pluggable outlier removal capability for this classifier.
>> - Consider building off of the ClusterClassifier&  ClusterIterator ideas.
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
>
>