Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Mahout, mail # user - KmeansDriver Question


Copy link to this message
-
Re: KmeansDriver Question
Paritosh Ranjan 2012-09-15, 08:48
I don't think that it is a kmeans driver error.
SequenceFileTokenizerMapper is not used in KmeansDriver. I think you are
getting error while transforming data.

On 15-09-2012 12:59, jung hoon sohn wrote:
> Hello, I am trying to cluster the input data using KmeansDriver.
> The input vector is transformed from the lucene vector using the
> "bin/mahout lucene.vector ..." commands and when I run the
> KmeansDriver using the run method, I get
>
> 12/09/15 15:18:13 INFO mapred.JobClient: Task Id :
> attempt_201209121951_0067_m_000000_1, Status : FAILED
> java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
> cast to org.apache.hadoop.io.Text
>          at
> org.apache.mahout.vectorizer.document.SequenceFileTokenizerMapper.map(SequenceFileTokenizerMapper.java:37)
>          at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>          at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>          at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>          at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>          at java.security.AccessController.doPrivileged(Native Method)
>          at javax.security.auth.Subject.doAs(Subject.java:415)
>          at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
>          at org.apache.hadoop.mapred.Child.main(Child.java:249)
>
> for several attempts but the process goes on and generates the output data.
> I can even run the clusterdump using the output cluster data however I am
> concerned about the effect of above errors.
>
> Please help me to get through the problem.
>
> Thanks.
>
> Jung Hoon
>