| clear query|facets|time |
Search criteria: ClusterEvaluator.
Results from 91 to 100 from
546 (3.34s).
|
|
|
Loading phrases to help you refine your search...
|
|
some new clustering code - Mahout - [mail # dev]
|
|
...I have some new clustering code that I have been working. It will probably be targeted back at Mahout at some point, but for reasons of agility, I have been running it out of github...
|
|
.... The salient point is that there are essentially no knobs that need turning other than specifying a distance measure and possibly a large minimum number of clusters. The output is a clustering...
|
[+ show more]
[- hide]
| ... mode, this code is able to cluster 1,000,000 points in 20 dimensions into 1000 clusters in about a minute. See the StreamingKmeans class at https://github.com/tdunning/knn for more info... |
| .... The algorithm is based loosely on http://web.engr.oregonstate.edu/~shindler/papers/FastKMeans_nips11.pdf This code does not yet use the Mahout clustering API conventions, but is based entirely... |
|
|
Author: Ted Dunning,
2012-04-04, 22:25
|
|
|
Re: Streaming KMeans 20newsgroups clustering - Mahout - [mail # dev]
|
|
...Hmm... I will have to take a look. Is your CSV file on EC2 as before? On Thu, Nov 29, 2012 at 1:26 PM, Dan Filimon wrote: ...
|
|
|
Author: Ted Dunning,
2012-11-29, 22:46
|
|
|
Re: Streaming KMeans 20newsgroups clustering - Mahout - [mail # dev]
|
|
...Wrong in the sense of clustering is hard to define. Certainly a wide range of cluster sizes looks dubious, but not definitive. Next easy steps include cosine normalizing the vectors...
|
|
... and doing semi-supervised clustering. Clustering the 50d data in R might also be useful. Normalizing is a single method call in the normal flow. It can be done on the projected vectors without...
|
[+ show more]
[- hide]
| ... loss of generality. After cosine normalization, semi-supervised clustering can be done by adding an additional 20 dimensions with a 1 of n encoding of the correct newsgroup. IN the test data... |
| ..., these can be set to all zeros. This gives the clustering algorithm a strong hint about what you are thinking about. It is also worth checking the sum os squared distance to make sure... |
|
|
Author: Ted Dunning,
2012-11-27, 14:29
|
|
|
Re: Streaming KMeans 20newsgroups clustering - Mahout - [mail # dev]
|
|
...Dan, Cool results. The headers can be useful. This is a problem where clustering doesn't actually necessarily work. We need to assess what alternative clustering algorithms would...
|
|
|
Author: Ted Dunning,
2012-11-27, 13:40
|
|
|
Re: Clustering or classification? - Mahout - [mail # user]
|
|
...If you have supervised training data (and it sounds that way), then classification is likely to be more effective. On Tue, Jan 24, 2012 at 7:44 PM, Vikas Pandya wrote: ...
|
|
|
Author: Ted Dunning,
2012-01-25, 03:58
|
|
|
Re: Clustering user profiles - Mahout - [mail # user]
|
|
...On Sun, Jan 15, 2012 at 2:13 PM, Raviv Pavel wrote: Yes. I mean normalized so that their squared magnitude is 1. WIth a mahout Vector v, you can use v.norm(2)...
|
|
|
Author: Ted Dunning,
2012-01-15, 19:21
|
|
|
Re: Clustering user profiles - Mahout - [mail # user]
|
|
...I usually prefer to represent location as an xyz triple on a unit sphere. That allows Euclidean distance to be useful. On the 1 of n encoded values. Euclidean works as well. &nbs...
|
|
|
Author: Ted Dunning,
2012-01-13, 20:49
|
|
|
Re: Possible multi thread issue in AbstractDifferenceRecommenderEvaluator - Mahout - [mail # user]
|
|
...Is that the complete stack trace? Threaded code like this usually has two or three levels of "Caused by" seconds. The last is the critical one. On Fri, Sep 24, 2010 at 1:07...
|
|
|
Author: Ted Dunning,
2010-09-25, 00:01
|
|
|
Re: Possible multi thread issue in AbstractDifferenceRecommenderEvaluator - Mahout - [mail # user]
|
|
...I think not. See http://download.oracle.com/javase/6/docs/api/java/util/concurrent/ExecutorService.htmlfor a definition of invokeAll. By definition, invokeAll returns only when *...
|
|
|
Author: Ted Dunning,
2010-09-24, 06:43
|
|
|
Re: Possible multi thread issue in AbstractDifferenceRecommenderEvaluator - Mahout - [mail # user]
|
|
...I don't think that the future.get() will ever be done. Testing for !future.done() will always return false after invokeAll because invokeAll waits for all tasks to complete. On T...
|
|
|
Author: Ted Dunning,
2010-09-24, 04:34
|
|
|
|