-Re: Kmeans cluster mapping to actual document IDs
Baoqiang Cao 2012-04-11, 16:20
My very limited experience is that
in seq2sparse step, you need use "-nv" option so that in clusterdump
output, you will see document ID.
On Wed, Apr 11, 2012 at 5:15 AM, Hossein Kazemi <[EMAIL PROTECTED]> wrote:
> I have clustered a set of documents using the Mahout's Kmeans (map-reduce) I
> used Sparse Vectors due to the large size of my corpus. In the book it says
> that the folder named ClusteredPoints contains the mapping between the
> clustered documents and the document IDs. However, all I can see is just a
> "1:0" , a feature-vector and a ClusterID. where can I find the actual
> document names/ids ?