|
|
-
Re: error in itemsimilaritySebastian Schelter 2010-11-26, 07:54
ItemSimilarityJob can not be used to compute the similarity between text
documents. It's thought to be used for Collaborative Filtering as described here: https://cwiki.apache.org/confluence/display/MAHOUT/Itembased+Collaborative+Filtering Am 26.11.2010 08:50, schrieb Divya: > Hi, > > I am getting following exception when I try to run itemsimilarity from CL. > > My input data is a text file which just has one line of text > > Can any one please help me in resolving the error. > > > > > > $ bin/mahout itemsimilarity -i D:/MahoutResult/ItemSimilarity/Input_Data -o > D:/MahoutResult/ItemSimilarity/Output -s DistributedUncen > > teredCosineVectorSimilarity.class > > Running on hadoop, using HADOOP_HOME=C:\cygwin\home\Divya\hadoop-0.20.2 > > HADOOP_CONF_DIR=C:\cygwin\home\Divya\hadoop-0.20.2\conf > > 10/11/26 15:43:50 INFO common.AbstractJob: Command line arguments: > {--booleanData=false, --endPhase=2147483647, --input=D:/MahoutResult > > /ItemSimilarity/Input_Data, --maxCooccurrencesPerItem=100, > --maxSimilaritiesPerItem=100, --output=D:/MahoutResult/ItemSimilarity/Output > > , --similarityClassname=DistributedUncenteredCosineVectorSimilarity.class, > --startPhase=0, --tempDir=temp} > > 10/11/26 15:43:51 INFO jvm.JvmMetrics: Initializing JVM Metrics with > processName=JobTracker, sessionId> > 10/11/26 15:43:52 INFO input.FileInputFormat: Total input paths to process : > 2 > > 10/11/26 15:43:53 INFO mapred.JobClient: Running job: job_local_0001 > > 10/11/26 15:43:53 INFO input.FileInputFormat: Total input paths to process : > 2 > > 10/11/26 15:43:53 INFO mapred.MapTask: io.sort.mb = 100 > > 10/11/26 15:43:53 INFO mapred.MapTask: data buffer = 79691776/99614720 > > 10/11/26 15:43:53 INFO mapred.MapTask: record buffer = 262144/327680 > > 10/11/26 15:43:53 WARN mapred.LocalJobRunner: job_local_0001 > > java.lang.ArrayIndexOutOfBoundsException: 1 > > at > org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapp > er.java:47) > > at > org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapp > er.java:31) > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) > > 10/11/26 15:43:54 INFO mapred.JobClient: map 0% reduce 0% > > 10/11/26 15:43:54 INFO mapred.JobClient: Job complete: job_local_0001 > > 10/11/26 15:43:54 INFO mapred.JobClient: Counters: 0 > > 10/11/26 15:43:54 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with > processName=JobTracker, sessionId= - already initialized > > 10/11/26 15:43:55 INFO input.FileInputFormat: Total input paths to process : > 2 > > 10/11/26 15:43:55 INFO mapred.JobClient: Running job: job_local_0002 > > 10/11/26 15:43:55 INFO input.FileInputFormat: Total input paths to process : > 2 > > 10/11/26 15:43:56 INFO mapred.MapTask: io.sort.mb = 100 > > 10/11/26 15:43:56 INFO mapred.MapTask: data buffer = 79691776/99614720 > > 10/11/26 15:43:56 INFO mapred.MapTask: record buffer = 262144/327680 > > 10/11/26 15:43:56 WARN mapred.LocalJobRunner: job_local_0002 > > java.lang.NumberFormatException: For input string: "For a young person who > is years and above and below years he may be employed in an > > industrial undertaking His employer however is required to notify " > > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:48 > ) > > at java.lang.Long.parseLong(Long.java:410) > > at java.lang.Long.parseLong(Long.java:468) > > at > org.apache.mahout.cf.taste.hadoop.similarity.item.CountUsersMapper.map(Count > UsersMapper.java:40) > > at > org.apache.mahout.cf.taste.hadoop.similarity.item.CountUsersMapper.map(Count > UsersMapper.java:31) > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) |