Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Mahout, mail # user - RowSimilarityJob


Copy link to this message
-
Re: RowSimilarityJob
Suneel Marthi 2012-03-20, 18:52
I should have been more elaborate in my previous reply.
RowId job creates a matrix which is of type <IntWritable, VectorWritable> and a docIndex <IntWritable, Text>

docIndex is a map of the rowId to the keys generated from seq2sparse.

What you would need to do is to join the output of RowSimilarity to docIndex to get the format u r looking for.
Hope that helps.
Suneel
________________________________
 From: Suneel Marthi <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Sent: Tuesday, March 20, 2012 1:41 PM
Subject: Re: RowSimilarityJob
 
Docindex is ur answer

Sent from my iPhone

On Mar 20, 2012, at 12:28 PM, Pat Ferrel <[EMAIL PROTECTED]> wrote:

> How do you map the output of RowSimilarity to documents? What I really need is to create an association of
>
>   doc1 --> docn, docm, doci, etc.
>
> The output of rowsimilarity looks like
>
>   rowid --> vector of rowids : distances
>
> for example:
>
>   Key: 0: Value: {14458:0.2966480826934176,11399:0.30290014772966095,
>   12793:0.22009858979452146,3275:0.1871791030103281,
>   14613:0.3534278632679437,4411:0.2516380602790199,
>   17520:0.3139731583634198,13611:0.18968888212315968,
>   14354:0.17673965754661425,0:1.0000000000000004}
>
> It would be nice to use the same keys as they are output by seq2aparse, in my case named vectors so file names would appear in the output as rowids. Creating my association would be trivial.
>
> Have I missed a dictionary containing rowid to docid(name) mapping?
>