Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Mahout, mail # user - Kmeans cluster mapping to actual document IDs


Copy link to this message
-
Re: Kmeans cluster mapping to actual document IDs
Baoqiang Cao 2012-04-11, 16:20
My very limited experience is that

in seq2sparse step, you need use "-nv" option so that in clusterdump
output, you will see document ID.

Best,
Baoqiang

On Wed, Apr 11, 2012 at 5:15 AM, Hossein Kazemi <[EMAIL PROTECTED]> wrote:
> Hi,
> I have clustered a set of documents using the Mahout's Kmeans (map-reduce) I
> used Sparse Vectors due to the large size of my corpus. In the book it says
> that the folder named ClusteredPoints contains the mapping between the
> clustered documents and the document IDs. However, all I can see is just a
> "1:0" , a feature-vector and a ClusterID. where can I find the actual
> document names/ids ?
> thx
>