Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Mahout, mail # user - Question on RowSimilarityJob


Copy link to this message
-
Re: Question on RowSimilarityJob
Sebastian Schelter 2012-01-20, 17:58
Hi,

'maxSimilaritiesPerRow' denotes the maximum number of similar rows
(documents in your use case) to keep per document.
'excludeSelfSimilarity' means that rows (documents) should not be
compared to themselves.

Sry for the lack of documentation, RowSimilarityJob was originally only
an internal job for the recommendation code. I'll try to add something
on the wiki in the next days.

--sebastian
On 20.01.2012 17:38, Suneel Marthi wrote:
> I am working on determining document similarity of a corpus I am working with using RowSimilarity.
>
> Questions:-
>
> a) What do the parameters - 'maxSimilaritiesPerRow' and 'excludeSelfSimilarity' mean?
> b) Are there any docs available on RowSimilarityJob available, this is the best I could find on Sebastian's blog - http://ssc.io/rowsimilarityjob-on-steroids/ .
>
> c) Also do we have any docs on RowIdJob ?
>
> Thanks and Regards,
> Suneel
>