|
|
-
Kmeans cluster mapping to actual document IDs
Hossein Kazemi 2012-04-11, 10:15
Hi, I have clustered a set of documents using the Mahout's Kmeans (map-reduce) I used Sparse Vectors due to the large size of my corpus. In the book it says that the folder named ClusteredPoints contains the mapping between the clustered documents and the document IDs. However, all I can see is just a "1:0" , a feature-vector and a ClusterID. where can I find the actual document names/ids ? thx
+
Hossein Kazemi 2012-04-11, 10:15
-
Re: Kmeans cluster mapping to actual document IDs
Baoqiang Cao 2012-04-11, 16:20
My very limited experience is that
in seq2sparse step, you need use "-nv" option so that in clusterdump output, you will see document ID.
Best, Baoqiang
On Wed, Apr 11, 2012 at 5:15 AM, Hossein Kazemi <[EMAIL PROTECTED]> wrote: > Hi, > I have clustered a set of documents using the Mahout's Kmeans (map-reduce) I > used Sparse Vectors due to the large size of my corpus. In the book it says > that the folder named ClusteredPoints contains the mapping between the > clustered documents and the document IDs. However, all I can see is just a > "1:0" , a feature-vector and a ClusterID. where can I find the actual > document names/ids ? > thx >
+
Baoqiang Cao 2012-04-11, 16:20
|
|
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by
Sematext