| clear query|facets|time |
Search criteria: clusterer.
Results from 1 to 10 from
611 (1.315s).
|
|
|
Did you mean:
|
|
Loading phrases to help you refine your search...
|
|
Canopy Clustering - Apache Mahout - Apache Software Foundation - Mahout - [wiki]
|
|
... Clustering
Canopy Clustering is a very simple, fast and surprisingly accurate method for grouping objects into clusters. All objects are represented as a point in a multidimensional feature space...
|
|
... a Canopy containing this point and iterate through the remainder of the point set. At each point, if its distance from the first point is < T1, then add the point to the cluster. If, in addition...
|
[+ show more]
[- hide]
| ..., accumulating a set of Canopies, each containing one or more points. A given point may occur in more than one Canopy.
Canopy Clustering is often used as an initial step in more rigorous clustering... |
| ... techniques, such as K-Means Clustering. By starting with an initial clustering the number of more expensive distance measurements can be significantly reduced by ignoring points outside... |
| ... of the initial canopies.
Strategy for parallelization
Looking at the sample Hadoop implementation in http://code.google.com/p/canopy-clustering/ the processing is done in 3 M/R steps:
The data... |
|
|
https://cwiki.apache.org/confluence/display/MAHOUT/Canopy+Clustering
Author: Jeff Eastman,
2012-06-29, 00:00
|
|
|
Dirichlet Process Clustering - Apache Mahout - Apache Software Foundation - Mahout - [wiki]
|
|
...
The Dirichlet Process Clustering algorithm performs Bayesian mixture modeling.
The idea is that we use a probabilistic mixture of a number of models that we use to explain some observed data. Each...
|
|
... model each data point came from.
In addition, since this is a Bayesian clustering algorithm, we don't want to actually commit to any single explanation, but rather to sample from...
|
|
|
https://cwiki.apache.org/confluence/display/MAH.../Dirichlet+Process+Clustering
Author: Jeff Eastman,
2012-06-29, 00:00
|
|
|
Top Down Clustering - Apache Mahout - Apache Software Foundation - Mahout - [wiki]
|
|
... Clustering
Top Down clustering is a type of Hierarchical Clustering. It tries to find bigger clusters first and then does fine grained clustering on these clusters. Hence the name Top Down.
Any...
|
|
... clustering algorithm can be used to perform the Top Level Clustering ( finding bigger clusters ) and the Bottom Level Clustering ( fine grained clustering on each of the top level clusters). So, all...
|
[+ show more]
[- hide]
| ... clustering algorithms available in Mahout, other than the MinHash Clustering algorithm ( which is a "Bottom Up" Clustering Algorithm ), are suitable to be used for Top Down Clustering, on both Top... |
| ... Level and Bottom Level.
The top level clustering output needs to be post processed in order to identify all top level clusters and, to group vectors into their respective top level clusters... |
| .... So, that, the bottom level clustering can execute on each of them.
The first step to execute the top down clustering, would be to run any clustering algorithm of your choice, preferably... |
|
|
https://cwiki.apache.org/confluence/display/MAHOUT/Top+Down+Clustering
Author: Jeff Eastman,
2011-12-10, 00:00
|
|
|
Mean Shift Clustering - Apache Mahout - Apache Software Foundation - Mahout - [wiki]
|
|
... to work in pattern recognition in 1975. The paper contains a detailed derivation and several examples of the use of mean shift for image smooting and segmentation. "Mean Shift Clustering" (http...
|
|
...://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/TUZEL1/MeanShift.pdf) presents an overview of the algorithm with a summary of the derivation. An attractive feature of mean shift clustering...
|
[+ show more]
[- hide]
| ... is that it does not require a-priori knowledge of the number of clusters (as required in k-means) and it will produce arbitrarily-shaped clusters that depend upon the topology of the data (unlike canopy... |
| ... in the density function and the vector becomes negligable.
Reference Implementation
The implementation introduced by MAHOUT-15 uses modified Canopy Clustering canopies to represent the mean shift... |
| ..., the remaining canopies contain sets of points which are the members of their cluster.
Map/Reduce Implementation
Each mapper receives a subset of the canopies for each iteration. It compares each... |
|
|
https://cwiki.apache.org/confluence/display/MAHOUT/Mean+Shift+Clustering
Author: Jeff Eastman,
2010-10-10, 00:00
|
|
|
Fuzzy K-Means - Apache Mahout - Apache Software Foundation - Mahout - [wiki]
|
|
...-Means (also called Fuzzy C-Means) is an extension of K-Means, the popular simple clustering technique. While K-Means discovers hard clusters (a point belong to only one cluster), Fuzzy K...
|
|
...-Means is a more statistically formalized method and discovers soft clusters where a particular point can belong to more than one cluster with certain probability.
Algorithm
Like K-Means, Fuzzy K...
|
[+ show more]
[- hide]
| ...-Means works on those objects which can be represented in n-dimensional vector space and a distance measure is defined.
The algorithm is similar to k-means.
Initialize k clusters
Until converged... |
| ...
Compute the probability of a point belong to a cluster for every <point,cluster> pair
Recompute the cluster centers using above probability membership values of points to clusters... |
| ...
Design Implementation
The design is similar to K-Means present in Mahout. It accepts an input file containing vector points. User can either provide the cluster centers as input or can allow... |
|
|
https://cwiki.apache.org/confluence/display/MAHOUT/Fuzzy+K-Means
Author: Jeff Eastman,
2012-06-29, 00:00
|
|
|
fuzzy-k-means-commandline - Apache Mahout - Apache Software Foundation - Mahout - [wiki]
|
|
... k-Means Clustering from the Command Line
Mahout's Fuzzy k-Means clustering can be launched from the same command line invocation whether you are running on a single machine in stand...
|
|
...-alone mode or on a larger Hadoop cluster. The difference is determined by the $HADOOP_HOME and $HADOOP_CONF_DIR environment variables. If both are set to an operating Hadoop cluster on the target...
|
[+ show more]
[- hide]
| ... machine then the invocation will run FuzzyK on that cluster. If either of the environment variables are missing then the stand-alone Hadoop configuration will be invoked instead.
./bin... |
| ... number. For example, when using Mahout 0.3 release, the job will be mahout-core-0.3.job
Testing it on one single machine w/o cluster
Put the data: cp <PATH TO DATA> testdata
Run... |
| ... the Job:
./bin/mahout fkmeans -i testdata <OPTIONS>
Running it on the cluster
(As needed) Start up Hadoop: $HADOOP_HOME/bin/start-all.sh
Put the data: $HADOOP... |
|
|
https://cwiki.apache.org/confluence/display/MAHOUT/fuzzy-k-means-commandline
Author: Jeff Eastman,
2011-07-21, 00:00
|
|
|
k-means-commandline - Apache Mahout - Apache Software Foundation - Mahout - [wiki]
|
|
...
This quick start page describes how to run the kMeans clustering algorithm on a Hadoop cluster.
Steps
Mahout's k-Means clustering can be launched from the same command line invocation whether...
|
|
... you are running on a single machine in stand-alone mode or on a larger Hadoop cluster. The difference is determined by the $HADOOP_HOME and $HADOOP_CONF_DIR environment variables. If both...
|
[+ show more]
[- hide]
| ... are set to an operating Hadoop cluster on the target machine then the invocation will run k-Means on that cluster. If either of the environment variables are missing then the stand-alone Hadoop... |
| ..._HOME/core/target/ and it's name will contain the Mahout version number. For example, when using Mahout 0.3 release, the job will be mahout-core-0.3.job
Testing it on one single machine w/o cluster
Put... |
| ... the data: cp <PATH TO DATA> testdata
Run the Job:
./bin/mahout kmeans -i testdata -o output -c clusters -dm org.apache.mahout.common.distance.CosineDistanceMeasure -x 5 -ow -cd 1 -k 25... |
|
|
https://cwiki.apache.org/confluence/display/MAHOUT/k-means-commandline
Author: Jeff Eastman,
2010-06-04, 00:00
|
|
|
dirichlet-commandline - Apache Mahout - Apache Software Foundation - Mahout - [wiki]
|
|
... Dirichlet Process Clustering from the Command Line
Mahout's Dirichlet clustering can be launched from the same command line invocation whether you are running on a single machine in stand...
|
|
...-alone mode or on a larger Hadoop cluster. The difference is determined by the $HADOOP_HOME and $HADOOP_CONF_DIR environment variables. If both are set to an operating Hadoop cluster on the target...
|
[+ show more]
[- hide]
| ... machine then the invocation will run Dirichlet on that cluster. If either of the environment variables are missing then the stand-alone Hadoop configuration will be invoked instead.
./bin... |
| ... number. For example, when using Mahout 0.3 release, the job will be mahout-core-0.3.job
Testing it on one single machine w/o cluster
Put the data: cp <PATH TO DATA> testdata
Run... |
| ... the Job:
./bin/mahout dirichlet -i testdata <OTHER OPTIONS>
Running it on the cluster
(As needed) Start up Hadoop: $HADOOP_HOME/bin/start-all.sh
Put the data: $HADOOP... |
|
|
https://cwiki.apache.org/confluence/display/MAHOUT/dirichlet-commandline
Author: Jeff Eastman,
2010-06-04, 00:00
|
|
|
canopy-commandline - Apache Mahout - Apache Software Foundation - Mahout - [wiki]
|
|
... Clustering from the Command Line
Mahout's Canopy clustering can be launched from the same command line invocation whether you are running on a single machine in stand-alone mode or on a larger...
|
|
... Hadoop cluster. The difference is determined by the $HADOOP_HOME and $HADOOP_CONF_DIR environment variables. If both are set to an operating Hadoop cluster on the target machine...
|
[+ show more]
[- hide]
| ... then the invocation will run Canopy on that cluster. If either of the environment variables are missing then the stand-alone Hadoop configuration will be invoked instead.
./bin/mahout canopy <... |
| ... Mahout 0.3 release, the job will be mahout-core-0.3.job
Testing it on one single machine w/o cluster
Put the data: cp <PATH TO DATA> testdata
Run the Job:
./bin/mahout canopy -i... |
| ... testdata -o output -dm org.apache.mahout.common.distance.CosineDistanceMeasure -ow -t1 5 -t2 2
Running it on the cluster
(As needed) Start up Hadoop: $HADOOP_HOME/bin/start-all.sh
Put... |
|
|
https://cwiki.apache.org/confluence/display/MAHOUT/canopy-commandline
Author: Jeff Eastman,
2010-06-04, 00:00
|
|
|
Re: Mahout Cluster attributes - Mahout - [mail # user]
|
|
...Another option would be to add a new command line option to the ClusterDumper to produce the abbreviated output you desire. Then you could submit it as a patch and everybody could benefit...
|
|
|
Author: Jeff Eastman,
2013-05-24, 13:33
|
|
|
|