Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Plain View
Mahout, mail # user - Re: How to find the k most similar docs


+
Suneel Marthi 2012-02-20, 05:00
+
Pat Ferrel 2012-02-20, 19:10
+
Suneel Marthi 2012-02-20, 20:28
+
Lance Norskog 2012-02-21, 10:37
+
Pat Ferrel 2012-03-05, 19:29
+
Sebastian Schelter 2012-03-05, 19:32
Copy link to this message
-
Re: How to find the k most similar docs
Suneel Marthi 2012-03-05, 19:48
Pat,

Your input to RowSimilarity seems to be the tfidf-vectors directory which is <Text, vectorWritable>.

Before executing the RowSimilarity job u need to run the RowIdJob which creates a matrix of <IntWritable, VectorWritable>.  This matrix should be the input to RowSimilarity.

Also from your command, you seem to be missing --tempDir argument, you would need that too.

Suneel
________________________________
 From: Sebastian Schelter <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Monday, March 5, 2012 2:32 PM
Subject: Re: How to find the k most similar docs
 
That's the problem:

org.apache.hadoop.io.Text cannot be
   cast to org.apache.hadoop.io.IntWritable

RowSimilarityJob expects <IntWritable,VectorWritable> as input, it seems
you supply <Text,VectorWritable>.

--sebastian

On 05.03.2012 20:29, Pat Ferrel wrote:
> org.apache.hadoop.io.Text cannot be
>    cast to org.apache.hadoop.io.IntWritable
+
Fernando Fernández 2012-03-06, 09:00
+
Pat Ferrel 2012-03-07, 01:14
+
Suneel Marthi 2012-03-07, 02:25
+
Sebastian Schelter 2012-03-07, 07:09
+
Pat Ferrel 2012-03-07, 16:38
+
Sebastian Schelter 2012-03-07, 16:50
+
Pat Ferrel 2012-03-09, 00:14
+
Suneel Marthi 2012-03-09, 12:26
+
Pat Ferrel 2012-03-09, 17:50
+
Lance Norskog 2012-03-10, 01:57
+
Alex Merritt 2012-02-19, 15:25
+
Pat Ferrel 2012-02-18, 19:39
+
Suneel Marthi 2012-02-18, 21:27
+
Pat Ferrel 2012-02-19, 21:11
+
Sebastian Schelter 2012-02-19, 21:33