| clear query|facets|time |
Search criteria: .
Results from 121 to 130 from
1050 (0.451s).
|
|
|
Loading phrases to help you refine your search...
|
|
Re: Judging the quality of clustering - Mahout - [mail # user]
|
|
...Yes, that is the paper I used to implement CDbw. I've tried it a few times along with the simpler ClusterEvaluator metrics I took from Mahout In Action and they look to be reason...
|
|
|
Author: Jeff Eastman,
2012-05-17, 12:58
|
|
|
Re: choosing appropriate t1,t2 for canopy clustering - Mahout - [mail # user]
|
|
...You can use the RepresentativePointsDriver to pick a set of n representative points from each cluster to speed these calculations, but it requires the clusters and clustered poin...
|
|
|
Author: Jeff Eastman,
2012-05-16, 14:34
|
|
|
Re: Judging the quality of clustering - Mahout - [mail # user]
|
|
...Mahout has a ClusterEvaluator and a CDbwEvaluator that compute some quality metrics (inter-cluster distance, intra-cluster-distance, ...) that you may find useful. Both calculate...
|
|
|
Author: Jeff Eastman,
2012-05-16, 14:32
|
|
|
Re: choosing appropriate t1,t2 for canopy clustering - Mahout - [mail # user]
|
|
...Hi Bob, Cosine distance will return distances on 0.0...1.0 as you suggest. While there is no absolutely foolproof technique for priming canopy T1 & T2 values I recommend yo...
|
|
|
Author: Jeff Eastman,
2012-05-15, 15:16
|
|
|
Re: online clustering with mahout - Mahout - [mail # user]
|
|
...+1 you've got it with the iterator and classifier. Mahout really doesn't have good support yet for online clustering. The problem you note will occur if new documents introduce n...
|
|
|
Author: Jeff Eastman,
2012-05-15, 13:00
|
|
|
Mahout 0.7 Code Freeze - Mahout - [mail # dev]
|
|
...Tomorrow is our target for 0.7 code freeze but we still have 24 open issues and 6 with patch available. There are 10 unassigned issues that I will move to 0.8 tomorrow unless som...
|
|
|
Author: Jeff Eastman,
2012-05-14, 16:26
|
|
|
Re: online clustering with mahout - Mahout - [mail # user]
|
|
...Look at ClusterIterator.iterate(). This will do clustering in memory without any Hadoop. ClusterIterator.iterateSeq will do clustering in a single process from/to Hadoop sequence...
|
|
|
Author: Jeff Eastman,
2012-05-14, 13:20
|
|
|
Re: Canopy estimator - Mahout - [mail # user]
|
|
...The reason I use T1==T2 is that T2 is the only threshold that determines the number of clusters. T1 affects how many adjacent points are considered in the centroid calculations. ...
|
|
|
Author: Jeff Eastman,
2012-05-11, 14:58
|
|
|
Re: Canopy estimator - Mahout - [mail # user]
|
|
...No, the issue was discussed but never reached critical mass. I typically do a binary search to find the best value setting T1==T2 and then tweak T1 up a bit. For feeding k-means,...
|
|
|
Author: Jeff Eastman,
2012-05-10, 13:12
|
|
|
Re: kmeans not returning k clusters - Mahout - [mail # user]
|
|
...Does this cluster reduction happen when you prime k-means with canopy? Can you first adjust T1==T2 to get about 200 canopies and feed that to k-means? How wide are your term vect...
|
|
|
Author: Jeff Eastman,
2012-05-09, 19:24
|
|
|
|