Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Mahout, mail # user - How to use clusterpp?


Copy link to this message
-
Re: How to use clusterpp?
Paritosh Ranjan 2012-02-17, 09:06
Check this out https://cwiki.apache.org/MAHOUT/top-down-clustering.html.

It tells how to use clusterpp.

You will not get a human readable version.
The output will be in SequenceFileFormat, which is not human readable.
SequeneFileFormat is a key value format. You will have to iterate over
it and read the key value and print into a text file or console.

Look into this package org.apache.mahout.common.iterator.sequencefile.
This package contains some utility classes which can help you iterate
through SequenceFileFormat files.

On 17-02-2012 14:18, Tharindu Mathew wrote:
> Hi,
>
> I'm trying to reproduce https://issues.apache.org/jira/browse/MAHOUT-966
>
> When executing clusterpp, I get out put such as this:
>
> $bin/hadoop fs -cat /user/mackie/output/ppclusters/part-r-00999
> SEQorg.apache.hadoop.io.Text%org.apache.mahout.math.VectorWritable_䪖?g???8?-??
>
> Is this normal? I thought I would get some human readable output when this
> was used... I tried searching around but couldn't get any documentation
> regarding clusterpp
>