Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: .   Results from 121 to 130 from 1050 (0.451s).
Loading phrases to help you
refine your search...
Re: Judging the quality of clustering - Mahout - [mail # user]
...Yes, that is the paper I used to implement CDbw. I've tried it a few  times along with the simpler ClusterEvaluator metrics I took from Mahout  In Action and they look to be reason...
   Author: Jeff Eastman, 2012-05-17, 12:58
Re: choosing appropriate t1,t2 for canopy clustering - Mahout - [mail # user]
...You can use the RepresentativePointsDriver to pick a set of n  representative points from each cluster to speed these calculations, but  it requires the clusters and clustered poin...
   Author: Jeff Eastman, 2012-05-16, 14:34
Re: Judging the quality of clustering - Mahout - [mail # user]
...Mahout has a ClusterEvaluator and a CDbwEvaluator that compute some  quality metrics (inter-cluster distance, intra-cluster-distance, ...)  that you may find useful. Both calculate...
   Author: Jeff Eastman, 2012-05-16, 14:32
Re: choosing appropriate t1,t2 for canopy clustering - Mahout - [mail # user]
...Hi Bob,  Cosine distance will return distances on 0.0...1.0 as you suggest. While  there is no absolutely foolproof technique for priming canopy T1 & T2  values I recommend yo...
   Author: Jeff Eastman, 2012-05-15, 15:16
Re: online clustering with mahout - Mahout - [mail # user]
...+1 you've got it with the iterator and classifier.  Mahout really doesn't have good support yet for online clustering. The  problem you note will occur if new documents introduce n...
   Author: Jeff Eastman, 2012-05-15, 13:00
Mahout 0.7 Code Freeze - Mahout - [mail # dev]
...Tomorrow is our target for 0.7 code freeze but we still have 24 open  issues and 6 with patch available. There are 10 unassigned issues that I  will move to 0.8 tomorrow unless som...
   Author: Jeff Eastman, 2012-05-14, 16:26
Re: online clustering with mahout - Mahout - [mail # user]
...Look at ClusterIterator.iterate(). This will do clustering in memory  without any Hadoop. ClusterIterator.iterateSeq will do clustering in a  single process from/to Hadoop sequence...
   Author: Jeff Eastman, 2012-05-14, 13:20
Re: Canopy estimator - Mahout - [mail # user]
...The reason I use T1==T2 is that T2 is the only threshold that determines  the number of clusters. T1 affects how many adjacent points are  considered in the centroid calculations. ...
   Author: Jeff Eastman, 2012-05-11, 14:58
Re: Canopy estimator - Mahout - [mail # user]
...No, the issue was discussed but never reached critical mass. I typically  do a binary search to find the best value setting T1==T2 and then tweak  T1 up a bit. For feeding k-means,...
   Author: Jeff Eastman, 2012-05-10, 13:12
Re: kmeans not returning k clusters - Mahout - [mail # user]
...Does this cluster reduction happen when you prime k-means with canopy?  Can you first adjust T1==T2 to get about 200 canopies and feed that to  k-means? How wide are your term vect...
   Author: Jeff Eastman, 2012-05-09, 19:24
Sort:
project
Mahout (1050)
type
mail # dev (574)
mail # user (429)
issue (37)
wiki (10)
date
last 7 days (0)
last 30 days (1)
last 90 days (5)
last 6 months (27)
last 9 months (1050)
author
Ted Dunning (3587)
Sean Owen (2762)
Grant Ingersoll (1266)
Jeff Eastman (1050)
Robin Anil (1022)
Lance Norskog (872)
Jake Mannix (826)
Dmitriy Lyubimov (770)
Sebastian Schelter (727)
Benson Margulies (510)
Drew Farris (406)
Isabel Drost (324)
Paritosh Ranjan (274)
Pat Ferrel (244)
Dan Filimon (216)