Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Mahout, mail # user - Canopy estimator


Copy link to this message
-
Re: Canopy estimator
Jeff Eastman 2012-05-10, 13:12
No, the issue was discussed but never reached critical mass. I typically
do a binary search to find the best value setting T1==T2 and then tweak
T1 up a bit. For feeding k-means, this latter step is not so important.

If you could figure out a way to automate this we would be interested.
Conceptually, using the RandomSeedGenerator to sample a few vectors and
comparing them with your chosen DistanceMeasure would give you a hint at
the T-value to begin the search. A utility to do that would be a useful
contribution.

On 5/9/12 8:36 PM, Pat Ferrel wrote:
> Some thoughts on https://issues.apache.org/jira/browse/MAHOUT-563
>
> Did anything ever get done with this? Ted mentions limited usefulness.
> This may be true but the cases he mentions as counter examples are
> also not very good for using canopy ahead of kmeans, no? That info
> would be a useful result. To use canopies I find myself running it
> over and over trying to see some inflection in the number of clusters.
> Why not automate this? Even if the data shows nothing, that is itself
> an answer of value and it would save a lot of hand work to find out
> the same thing.
>
>