|
|
-
Re: Kmeans cluster mapping to actual document IDsBaoqiang Cao 2012-04-11, 16:20
My very limited experience is that
in seq2sparse step, you need use "-nv" option so that in clusterdump output, you will see document ID. Best, Baoqiang On Wed, Apr 11, 2012 at 5:15 AM, Hossein Kazemi <[EMAIL PROTECTED]> wrote: > Hi, > I have clustered a set of documents using the Mahout's Kmeans (map-reduce) I > used Sparse Vectors due to the large size of my corpus. In the book it says > that the folder named ClusteredPoints contains the mapping between the > clustered documents and the document IDs. However, all I can see is just a > "1:0" , a feature-vector and a ClusterID. where can I find the actual > document names/ids ? > thx > |