Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Mahout, mail # user - error  in itemsimilarity


Copy link to this message
-
Re: error  in itemsimilarity
Sebastian Schelter 2010-11-26, 07:54
ItemSimilarityJob can not be used to compute the similarity between text
documents. It's thought to be used for Collaborative Filtering as
described here:
https://cwiki.apache.org/confluence/display/MAHOUT/Itembased+Collaborative+Filtering

Am 26.11.2010 08:50, schrieb Divya:
> Hi,
>
> I am getting following exception when I try to run itemsimilarity from CL.
>
> My input data is a text file which just has one line of text
>
> Can any one please help me in resolving the error.
>
>  
>
>  
>
> $ bin/mahout itemsimilarity -i  D:/MahoutResult/ItemSimilarity/Input_Data -o
> D:/MahoutResult/ItemSimilarity/Output -s DistributedUncen
>
> teredCosineVectorSimilarity.class
>
> Running on hadoop, using HADOOP_HOME=C:\cygwin\home\Divya\hadoop-0.20.2
>
> HADOOP_CONF_DIR=C:\cygwin\home\Divya\hadoop-0.20.2\conf
>
> 10/11/26 15:43:50 INFO common.AbstractJob: Command line arguments:
> {--booleanData=false, --endPhase=2147483647, --input=D:/MahoutResult
>
> /ItemSimilarity/Input_Data, --maxCooccurrencesPerItem=100,
> --maxSimilaritiesPerItem=100, --output=D:/MahoutResult/ItemSimilarity/Output
>
> , --similarityClassname=DistributedUncenteredCosineVectorSimilarity.class,
> --startPhase=0, --tempDir=temp}
>
> 10/11/26 15:43:51 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> processName=JobTracker, sessionId>
> 10/11/26 15:43:52 INFO input.FileInputFormat: Total input paths to process :
> 2
>
> 10/11/26 15:43:53 INFO mapred.JobClient: Running job: job_local_0001
>
> 10/11/26 15:43:53 INFO input.FileInputFormat: Total input paths to process :
> 2
>
> 10/11/26 15:43:53 INFO mapred.MapTask: io.sort.mb = 100
>
> 10/11/26 15:43:53 INFO mapred.MapTask: data buffer = 79691776/99614720
>
> 10/11/26 15:43:53 INFO mapred.MapTask: record buffer = 262144/327680
>
> 10/11/26 15:43:53 WARN mapred.LocalJobRunner: job_local_0001
>
> java.lang.ArrayIndexOutOfBoundsException: 1
>
>         at
> org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapp
> er.java:47)
>
>         at
> org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapp
> er.java:31)
>
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>
> 10/11/26 15:43:54 INFO mapred.JobClient:  map 0% reduce 0%
>
> 10/11/26 15:43:54 INFO mapred.JobClient: Job complete: job_local_0001
>
> 10/11/26 15:43:54 INFO mapred.JobClient: Counters: 0
>
> 10/11/26 15:43:54 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with
> processName=JobTracker, sessionId= - already initialized
>
> 10/11/26 15:43:55 INFO input.FileInputFormat: Total input paths to process :
> 2
>
> 10/11/26 15:43:55 INFO mapred.JobClient: Running job: job_local_0002
>
> 10/11/26 15:43:55 INFO input.FileInputFormat: Total input paths to process :
> 2
>
> 10/11/26 15:43:56 INFO mapred.MapTask: io.sort.mb = 100
>
> 10/11/26 15:43:56 INFO mapred.MapTask: data buffer = 79691776/99614720
>
> 10/11/26 15:43:56 INFO mapred.MapTask: record buffer = 262144/327680
>
> 10/11/26 15:43:56 WARN mapred.LocalJobRunner: job_local_0002
>
> java.lang.NumberFormatException: For input string: "For a young person who
> is years and above and below  years he may be employed in an
>
>  industrial undertaking His employer however is required to notify "
>
>         at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48
> )
>
>         at java.lang.Long.parseLong(Long.java:410)
>
>         at java.lang.Long.parseLong(Long.java:468)
>
>         at
> org.apache.mahout.cf.taste.hadoop.similarity.item.CountUsersMapper.map(Count
> UsersMapper.java:40)
>
>         at
> org.apache.mahout.cf.taste.hadoop.similarity.item.CountUsersMapper.map(Count
> UsersMapper.java:31)
>
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)