|
|
-
Re: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering errorJeff Eastman 2012-02-15, 15:20
The error message describes what the algorithm can see: that there are
no initial clusters. The wiki documentation seems reasonably clear on the use of -k (https://cwiki.apache.org/confluence/display/MAHOUT/K-Means+Clustering) to obtain them by sampling the input dataset, otherwise -c needs to contain clusters produced by the user. On 2/14/12 8:04 PM, Lance Norskog wrote: > Could the error message describe the user's mistake? > > On Tue, Feb 14, 2012 at 9:16 AM, Jeff Eastman > <[EMAIL PROTECTED]> wrote: >> +1 bingo. K-Means is expecting you to provide the prior cluster centers in >> -c. If you want it to sample from your input data you need to add the -k >> option to tell it how many you want. This has been a constant part of the >> api for some time, hence 0.4, 0.5 and 0.6 will all give the same error if >> you overlook this argument. >> >> >> >> On 2/14/12 8:56 AM, Suneel Marthi wrote: >>> You are not specifying the number of clusters that need to be generated, >>> try running again by specifying a -k<number of clusters> option. You also >>> need to specify that you need clustering to be done with -cl. >>> >>> For example:- >>> >>> ./bin/mahout kmeans -i >>> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c >>> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x >>> 10 -ow -k 20 -cl >>> >>> >>> >>> ________________________________ >>> From: qiang xu (Issue Comment Edited) (JIRA)<[EMAIL PROTECTED]> >>> To: [EMAIL PROTECTED] >>> Sent: Tuesday, February 14, 2012 10:48 AM >>> Subject: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering >>> error >>> >>> >>> [ >>> https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207675#comment-13207675 >>> ] >>> >>> qiang xu edited comment on MAHOUT-504 at 2/14/12 3:46 PM: >>> ---------------------------------------------------------- >>> >>> This problem still exist in mahout 0.5 and 0.6 >>> ./bin/mahout kmeans -i >>> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c >>> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10 >>> -ow >>> Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/ >>> HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/ >>> 12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments: >>> {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5, >>> --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, >>> --endPhase=2147483647, >>> --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/, >>> --maxIter=10, --method=mapreduce, >>> --output=./examples/bin/work/reuters-kmeans, --overwrite=null, >>> --startPhase=0, --tempDir=temp} >>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input: >>> examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In: >>> examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans Distance: >>> org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure >>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max >>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input >>> Vectors: {} >>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1 >>> 12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to process >>> : 1 >>> 12/02/14 20:56:06 INFO mapred.JobClient: Running job: >>> job_201202131515_0122 >>> 12/02/14 20:56:07 INFO mapred.JobClient: map 0% reduce 0% >>> 12/02/14 20:56:16 INFO mapred.JobClient: Task Id : >>> attempt_201202131515_0122_m_000000_0, Status : FAILED >>> java.lang.IllegalStateException: No clusters found. Check your -c path. >>> at >>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60) >>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) >>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) >>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) |