Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Mahout, mail # dev - Re: [jira] [Commented] (MAHOUT-988) Convert K-means buildClusters to use new ClusterIterator


Copy link to this message
-
Re: [jira] [Commented] (MAHOUT-988) Convert K-means buildClusters to use new ClusterIterator
Jeff Eastman 2012-03-15, 13:54
Yes, that was my point. below It may, in fact, be impossible to
implement and commit them independently since so much of Mahout
clustering depends upon the Cluster sequenceFile. You may be able to get
part way by moving the Canopy mods into the kmeans issue, but then the
cluster dumper and evaluator will not work with kmeans.

Ideas?

On 3/14/12 10:15 PM, Paritosh Ranjan wrote:
> Thanks Jeff. One question, are "Use ClusterIterator" tasks dependent
> on "Modify Canopy etc to use ClusterWritable" task ?
> I am assuming that all subtasks in MAHOUT-933
> <https://issues.apache.org/jira/browse/MAHOUT-933> are independent of
> each other and the order to pick them does not matter. Am I correct?
>
> On 15-03-2012 09:23, Jeff Eastman wrote:
>> Sure Paritosh, go ahead and take a crack at it. I am moving from CO
>> to PA for the next few weeks and won't be able to do much coding
>> during that period. I suspect you will also need to modify Canopy to
>> emit ClusterWritable and also the RandomSeedGenerator.
>>
>> Smooth sailing,
>> Jeff
>>
>> On 3/14/12 8:28 PM, Paritosh Ranjan (Commented) (JIRA) wrote:
>>>      [
>>> https://issues.apache.org/jira/browse/MAHOUT-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229840#comment-13229840
>>> ]
>>>
>>> Paritosh Ranjan commented on MAHOUT-988:
>>> ----------------------------------------
>>>
>>> Jeff, I would like to work on this issue (or MAHOUT-989, or
>>> MAHOUT-990). Can I? I might also need some help ( at least the first
>>> patch review ).
>>>
>>>
>>>> Convert K-means buildClusters to use new ClusterIterator
>>>> --------------------------------------------------------
>>>>
>>>>                  Key: MAHOUT-988
>>>>                  URL: https://issues.apache.org/jira/browse/MAHOUT-988
>>>>              Project: Mahout
>>>>           Issue Type: Sub-task
>>>>           Components: Clustering
>>>>     Affects Versions: 0.6
>>>>             Reporter: Jeff Eastman
>>>>             Assignee: Jeff Eastman
>>>>              Fix For: 0.7
>>>>
>>>>
>>>> Refactor the current K-means implementation to use the
>>>> ClusterIterator/Classifier implementation. This will replace the
>>>> mapper, combiner, reducer, clusterer and many unit tests but will
>>>> not modify the other driver APIs, thus retaining compatibility with
>>>> existing CLI.
>>> --
>>> This message is automatically generated by JIRA.
>>> If you think it was sent incorrectly, please contact your JIRA
>>> administrators:
>>> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
>>>
>>> For more information on JIRA, see:
>>> http://www.atlassian.com/software/jira
>>>
>>>
>>>
>>>
>>
>
>