Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Mahout, mail # dev - Re: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering error


Copy link to this message
-
Re: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering error
Jeff Eastman 2012-02-14, 17:16
+1 bingo. K-Means is expecting you to provide the prior cluster centers
in -c. If you want it to sample from your input data you need to add the
-k option to tell it how many you want. This has been a constant part of
the api for some time, hence 0.4, 0.5 and 0.6 will all give the same
error if you overlook this argument.
On 2/14/12 8:56 AM, Suneel Marthi wrote:
> You are not specifying the number of clusters that need to be generated, try running again by specifying a -k<number of clusters>  option. You also need to specify that you need clustering to be done with -cl.
>
> For example:-
>
> ./bin/mahout kmeans -i
> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x
> 10  -ow -k 20 -cl
>
>
>
> ________________________________
>   From: qiang xu (Issue Comment Edited) (JIRA)<[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Tuesday, February 14, 2012 10:48 AM
> Subject: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering error
>
>
>      [ https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207675#comment-13207675 ]
>
> qiang xu edited comment on MAHOUT-504 at 2/14/12 3:46 PM:
> ----------------------------------------------------------
>
> This problem still exist in mahout 0.5 and 0.6
> ./bin/mahout kmeans -i ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10  -ow
> Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
> HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
> 12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments: {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5, --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, --endPhase=2147483647, --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/, --maxIter=10, --method=mapreduce, --output=./examples/bin/work/reuters-kmeans, --overwrite=null, --startPhase=0, --tempDir=temp}
> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input: examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In: examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans Distance: org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input Vectors: {}
> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1
> 12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to process : 1
> 12/02/14 20:56:06 INFO mapred.JobClient: Running job: job_201202131515_0122
> 12/02/14 20:56:07 INFO mapred.JobClient:  map 0% reduce 0%
> 12/02/14 20:56:16 INFO mapred.JobClient: Task Id : attempt_201202131515_0122_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: No clusters found. Check your -c path.
>          at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
>          at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>          at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>          at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>          at org.apache.hadoop.mapred.Child.main(Child.java:170)
> It is really weired that cluster is gernerated
> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls /user/root/examples/bin/work/
> Found 4 items
> drwxr-xr-x   - root supergroup          0 2012-02-14 20:55 /user/root/examples/bin/work/clusters
> drwxr-xr-x   - root supergroup          0 2012-02-14 20:56 /user/root/examples/bin/work/reuters-kmeans
> drwxr-xr-x   - root supergroup          0 2012-02-14 20:29 /user/root/examples/bin/work/reuters-out-seqdir
> drwxr-xr-x   - root supergroup          0 2012-02-14 20:32 /user/root/examples/bin/work/reuters-out-seqdir-sparse
> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls /user/root/examples/bin/work/clusters