Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Mahout, mail # user - is there some place to study Singular Value Decomposition algorithms


Copy link to this message
-
Re:Re:Re: Re: is there some place to study Singular Value Decomposition algorithms
myn 2011-08-29, 11:05
thanks evey body , my chinese english ,
At 2011-08-29 19:03:59,myn <[EMAIL PROTECTED]> wrote:

the best way is to read the sorce code ;
 
@_@
At 2011-08-29 16:02:57,"Lance Norskog" <[EMAIL PROTECTED]> wrote:
>'R' also has an svd implementation, directly in the base package.
>
>There are a few answers to your question:
>1) What is SVD? The video lecture above will help. Also, searching for
>'singular value decomposition' on Baidu finds a lot of basic explanations.
>2) Why do you want it? It creates in on pass a few different unique
>explanations of what is going on inside your dataset.
>3) Mahout Distributed Matrix code, DistributedLanczos etc. are
>implementations specifically for large-scale problems. There are sub-parts
>of SVD that you may not need for your problem, and these jobs avoid some of
>the work.
>
>Until you have a solid grasp of what SVD can tell you, there is no point
>trying the distributed mahout jobs. The SingularValueDecomposition class in
>Mahout has served me well in my researches.
>
>Lance
>
>On Mon, Aug 29, 2011 at 12:50 AM, Danny Bickson <[EMAIL PROTECTED]>wrote:
>
>>  Mahout - SVD matrix factorization - formatting input matrix
>>  Converting Input Format into Mahout's SVD Distributed Matrix Factorization
>> Solver
>>
>> Purpose
>> The code below, converts a matrix from csv format:
>> <from row>,<to col>,<value>\n
>> Into Mahout's SVD solver format.
>>
>>
>> For example,
>> The 3x3 matrix:
>> 0    1.0 2.1
>> 3.0  4.0 5.0
>> -5.0 6.2 0
>>
>>
>> Will be given as input in a csv file as:
>> 1,0,3.0
>> 2,0,-5.0
>> 0,1,1.0
>> 1,1,4.0
>> 2,1,6.2
>> 0,2,2.1
>> 1,2,5.0
>>
>> NOTE: I ASSUME THE MATRIX IS SORTED BY THE COLUMNS ORDER
>> This code is based on code by Danny Leshem, ContextIn.
>>
>> Command line arguments:
>>  args[0] - path to csv input file
>> args[1] - cardinality of the matrix (number of columns)
>> args[2] - path the resulting Mahout's SVD input file
>>
>> Method:
>> The code below, goes over the csv file, and for each matrix column, creates
>> a SequentialAccessSparseVector which contains all the non-zero row entries
>> for this column.
>> Then it appends the column vector to file.
>>
>> Compilation:
>> Copy the java code below into an java file named Convert2SVD.java
>> Add to your IDE project path both Mahout and Hadoop jars. Alternatively, a
>> command line option for compilation is given below.
>>
>>
>> view plain<
>> http://bickson.blogspot.com/2011/02/mahout-svd-matrix-factorization.html#>
>> print<
>> http://bickson.blogspot.com/2011/02/mahout-svd-matrix-factorization.html#>
>> ?<
>> http://bickson.blogspot.com/2011/02/mahout-svd-matrix-factorization.html#>
>>
>>   1. import java.io.BufferedReader;
>>   2. import java.io.FileReader;
>>   3. import java.util.StringTokenizer;
>>   4.
>>   5. import org.apache.mahout.math.SequentialAccessSparseVector;
>>   6. import org.apache.mahout.math.Vector;
>>   7. import org.apache.mahout.math.VectorWritable;
>>   8. import org.apache.hadoop.conf.Configuration;
>>   9. import org.apache.hadoop.fs.FileSystem;
>>   10. import org.apache.hadoop.fs.Path;
>>   11. import org.apache.hadoop.io.IntWritable;
>>   12. import org.apache.hadoop.io.SequenceFile;
>>   13. import org.apache.hadoop.io.SequenceFile.CompressionType;
>>   14.
>>   15. /**
>>   16.  * Code for converting CSV format to Mahout's SVD format
>>   17.  * @author Danny Bickson, CMU
>>   18.
>>    * Note: I ASSUME THE CSV FILE IS SORTED BY THE COLUMN (NAMELY THE
>> SECOND FIELD).
>>
>>   19.  *
>>   20.  */
>>   21.
>>   22. public class Convert2SVD {
>>   23.
>>   24.
>>   25.         public static int Cardinality;
>>   26.
>>   27.         /**
>>   28.          *
>>   29.          * @param args[0] - input csv file
>>   30.          * @param args[1] - cardinality (length of vector)
>>   31.          * @param args[2] - output file for svd
>>   32.          */
>>   33.         public static void main(String[] args){
>>   34.
>>   35. try {
>>   36.         Cardinality = Integer.parseInt(args[1]);