|
|
jung hoon sohn 2012-09-15, 07:29
Hello, I am trying to cluster the input data using KmeansDriver. The input vector is transformed from the lucene vector using the "bin/mahout lucene.vector ..." commands and when I run the KmeansDriver using the run method, I get
12/09/15 15:18:13 INFO mapred.JobClient: Task Id : attempt_201209121951_0067_m_000000_1, Status : FAILED java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text at org.apache.mahout.vectorizer.document.SequenceFileTokenizerMapper.map(SequenceFileTokenizerMapper.java:37) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093) at org.apache.hadoop.mapred.Child.main(Child.java:249)
for several attempts but the process goes on and generates the output data. I can even run the clusterdump using the output cluster data however I am concerned about the effect of above errors.
Please help me to get through the problem.
Thanks.
Jung Hoon
-
Re: KmeansDriver Question
Paritosh Ranjan 2012-09-15, 08:48
I don't think that it is a kmeans driver error. SequenceFileTokenizerMapper is not used in KmeansDriver. I think you are getting error while transforming data.
On 15-09-2012 12:59, jung hoon sohn wrote: > Hello, I am trying to cluster the input data using KmeansDriver. > The input vector is transformed from the lucene vector using the > "bin/mahout lucene.vector ..." commands and when I run the > KmeansDriver using the run method, I get > > 12/09/15 15:18:13 INFO mapred.JobClient: Task Id : > attempt_201209121951_0067_m_000000_1, Status : FAILED > java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be > cast to org.apache.hadoop.io.Text > at > org.apache.mahout.vectorizer.document.SequenceFileTokenizerMapper.map(SequenceFileTokenizerMapper.java:37) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > > for several attempts but the process goes on and generates the output data. > I can even run the clusterdump using the output cluster data however I am > concerned about the effect of above errors. > > Please help me to get through the problem. > > Thanks. > > Jung Hoon >
-
Re: KmeansDriver Question
jung hoon sohn 2012-09-17, 07:09
Thank you for the reply. However the error was thrown during the process of the map ( org.apache.hadoop.mapreduce.**Mapper.run). Isn't the mapping function part of the KmeansDriver class?
Thank You.
Jung Hoon
On Sat, Sep 15, 2012 at 5:48 PM, Paritosh Ranjan <[EMAIL PROTECTED]> wrote:
> I don't think that it is a kmeans driver error. > SequenceFileTokenizerMapper is not used in KmeansDriver. I think you are > getting error while transforming data. > > > On 15-09-2012 12:59, jung hoon sohn wrote: > >> Hello, I am trying to cluster the input data using KmeansDriver. >> The input vector is transformed from the lucene vector using the >> "bin/mahout lucene.vector ..." commands and when I run the >> KmeansDriver using the run method, I get >> >> 12/09/15 15:18:13 INFO mapred.JobClient: Task Id : >> attempt_201209121951_0067_m_**000000_1, Status : FAILED >> java.lang.ClassCastException: org.apache.hadoop.io.**LongWritable cannot >> be >> cast to org.apache.hadoop.io.Text >> at >> org.apache.mahout.vectorizer.**document.**SequenceFileTokenizerMapper.** >> map(**SequenceFileTokenizerMapper.**java:37) >> at org.apache.hadoop.mapreduce.**Mapper.run(Mapper.java:144) >> at org.apache.hadoop.mapred.**MapTask.runNewMapper(MapTask.** >> java:764) >> at org.apache.hadoop.mapred.**MapTask.run(MapTask.java:370) >> at org.apache.hadoop.mapred.**Child$4.run(Child.java:255) >> at java.security.**AccessController.doPrivileged(**Native >> Method) >> at javax.security.auth.Subject.**doAs(Subject.java:415) >> at >> org.apache.hadoop.security.**UserGroupInformation.doAs(** >> UserGroupInformation.java:**1093) >> at org.apache.hadoop.mapred.**Child.main(Child.java:249) >> >> for several attempts but the process goes on and generates the output >> data. >> I can even run the clusterdump using the output cluster data however I am >> concerned about the effect of above errors. >> >> Please help me to get through the problem. >> >> Thanks. >> >> Jung Hoon >> >> > >
-
Re: KmeansDriver Question
Paritosh Ranjan 2012-09-17, 07:59
AFAIK SequenceFileTokenizerMapper is not called from KMeansdriver.
The mapper is tokenizing sequence files, so, the error might be during that step.
On 17-09-2012 12:39, jung hoon sohn wrote: > Thank you for the reply. > However the error was thrown during the process of the map ( > org.apache.hadoop.mapreduce.**Mapper.run). > Isn't the mapping function part of the KmeansDriver class? > > Thank You. > > Jung Hoon > > On Sat, Sep 15, 2012 at 5:48 PM, Paritosh Ranjan <[EMAIL PROTECTED]> wrote: > >> I don't think that it is a kmeans driver error. >> SequenceFileTokenizerMapper is not used in KmeansDriver. I think you are >> getting error while transforming data. >> >> >> On 15-09-2012 12:59, jung hoon sohn wrote: >> >>> Hello, I am trying to cluster the input data using KmeansDriver. >>> The input vector is transformed from the lucene vector using the >>> "bin/mahout lucene.vector ..." commands and when I run the >>> KmeansDriver using the run method, I get >>> >>> 12/09/15 15:18:13 INFO mapred.JobClient: Task Id : >>> attempt_201209121951_0067_m_**000000_1, Status : FAILED >>> java.lang.ClassCastException: org.apache.hadoop.io.**LongWritable cannot >>> be >>> cast to org.apache.hadoop.io.Text >>> at >>> org.apache.mahout.vectorizer.**document.**SequenceFileTokenizerMapper.** >>> map(**SequenceFileTokenizerMapper.**java:37) >>> at org.apache.hadoop.mapreduce.**Mapper.run(Mapper.java:144) >>> at org.apache.hadoop.mapred.**MapTask.runNewMapper(MapTask.** >>> java:764) >>> at org.apache.hadoop.mapred.**MapTask.run(MapTask.java:370) >>> at org.apache.hadoop.mapred.**Child$4.run(Child.java:255) >>> at java.security.**AccessController.doPrivileged(**Native >>> Method) >>> at javax.security.auth.Subject.**doAs(Subject.java:415) >>> at >>> org.apache.hadoop.security.**UserGroupInformation.doAs(** >>> UserGroupInformation.java:**1093) >>> at org.apache.hadoop.mapred.**Child.main(Child.java:249) >>> >>> for several attempts but the process goes on and generates the output >>> data. >>> I can even run the clusterdump using the output cluster data however I am >>> concerned about the effect of above errors. >>> >>> Please help me to get through the problem. >>> >>> Thanks. >>> >>> Jung Hoon >>> >>> >>
|
|