|
|
-
Re: can't get <point-id, cluster-id> thru "-p"Baoqiang Cao 2012-03-14, 22:18
Thanks a lot. But I don't know if I miss anything in front of my teary
eyes because of Wednesday afternoon or ? I have equivalent inputs as yours: mahout clusterdump -s /mahout/kmeans/clusters-15-final -d /mahout/sparse/dictionary.file-0 -dt sequencefile -p /mahout/points the cluster files after 15 iterations are /mahout/kmeans/clusters-15-final. /mahout/points is a directory I created in prior. On screen, the output are something like "VL-1721020{n=186 c=[...". It just is no any output files under that directory. Any help , please On Wed, Mar 14, 2012 at 2:13 PM, Pat Ferrel <[EMAIL PROTECTED]> wrote: > The -p parameter is an input. You should pass in the clusterPoints/ > directory that was generated by the cluster driver you used. > > My use of fkmeans might be an example: > > mahout fkmeans -i wikipedia-vectors/tfidf-vectors/ -c > wikipedia-fkmeans-centroids -o wikipedia-fkmeans-clusters -k 100 -m > 2 -ow -x 10 -dm org.apache.mahout.common.distance.CosineDistanceMeasure > > This will create wikipedia-clusters/clusters/clusteredPoints/part-m-00000 > which is the file with the clustered points. I then did a clusterdump > > mahout clusterdump -s > wikipedia-fkmeans-clusters/clusters/clusters-1/part-r-00000 -p > wikipedia-fkmeans-clusters/clusteredPoints/ -d > wikipedia-fkmeans-clusters/dictionary.file-0 -dt sequencefile -dm > org.apache.mahout.common.distance.CosineDistanceMeasure > > This will output to the screen. Use -o to specify an output file. > > Good advice for any user of mahout is read the output of the help very > carefully. IMHO it is very easy to misunderstand the parameters, inputs, and > outputs. I think I only understand about 10%. Try: > > mahout fkmeans --help > > > > On 3/14/12 10:52 AM, Baoqiang Cao wrote: >> >> Hi, >> >> Very sorry for such a trivial question but ran out of luck. I'm trying >> to see which points (thru point-ids) belong to which cluster center. >> Here is what I did: >> >> mahout clusterdump -s /mahout/kmeans/clusters-15-final -d >> /mahout/sparse/dictionary.file-0 -dt sequencefile -p /mahout/points >>> >>> out >> >> The onscreen output is: >> >> 12/03/14 12:39:52 INFO common.AbstractJob: Command line arguments: >> {--dictionary=/mahout/sparse/dictionary.file-0, >> --dictionaryType=sequencefile, >> >> --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, >> --endPhase=2147483647, --outputFormat=TEXT, >> --pointsDir=/mahout/points, >> --seqFileDir=/mahout/kmeans/clusters-15-final, --startPhase=0, >> --tempDir=temp} >> 12/03/14 12:39:55 WARN snappy.LoadSnappy: Snappy native library is >> available >> 12/03/14 12:39:55 INFO util.NativeCodeLoader: Loaded the native-hadoop >> library >> 12/03/14 12:39:55 INFO snappy.LoadSnappy: Snappy native library loaded >> 12/03/14 12:39:55 INFO compress.CodecPool: Got brand-new decompressor >> 12/03/14 12:39:55 INFO compress.CodecPool: Got brand-new decompressor >> 12/03/14 12:39:55 INFO compress.CodecPool: Got brand-new decompressor >> 12/03/14 12:39:55 INFO compress.CodecPool: Got brand-new decompressor >> 12/03/14 12:42:07 INFO clustering.ClusterDumper: Wrote 5188 clusters >> 12/03/14 12:42:07 INFO driver.MahoutDriver: Program took 135276 ms >> (Minutes: 2.2546) >> >> >> There is nothing under "/mahout/points". Any help on why and how? >> >> Thanks in advance. >> Baoqiang >> > |