|
|
-
Re: Factorizing SVD (online SVDRecommender)Sebastian Schelter 2011-11-07, 07:58
I think its also important to note that in the recommendation world most
of the time something which only looks like an SVD is used. In most recommendation usecases, only a tiny fraction of the user-item-matrix is observed. Therefore "standard" algorithms for obtaining the SVD cannot be used (at least that's what the papers say). The usual approach is to decompose the rating matrix A (users x items) into the product of two other matrices X (users x features) and Y (items x features). A is approximated by XY'. This decomposition is obtained by selecting X and Y in a way that minimizes the overall squared error in regard to the observed ratings (plus some regularization to avoid overfitting). For all observed user-item pairs (u,i) the squared error of the prediction is obtained by multiplying the corresponding user and item vector from the latent feature space: f(X,Y) = sum_{u,i} (r_{u,i} - x_u' y_i)^2 + some regularization term There are several ways to find the X and Y that minimize this function. One approach is to use Stochastic Gradient Descent. This approach was made by popular by Simon Funk's famous article during the netflix prize: "Netflix Update: Try this at home" http://sifter.org/~simon/journal/20061211.html I think that's also what's implemented in our ExpectationMaximizationSVDFactorizer. Other approaches use a technique called "Alternating Least Squares" where you iteratively fix either X or Y and solve the equation for the other. This approach is described in: "Large-scale Parallel Collaborative Filtering for the Netflix Prize", http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf Our sequential ALSWRFactorizer is an implementation of this as well as the recently refactored ParallelALSFactorizationJob which takes this algorithm onto hadoop. There is a second very interesting paper that shows how to use the ALS approach with implicit feedback data: "Collaborative Filtering for Implicit Feedback Datasets", http://research.yahoo.com/pub/2433 I think Tamas's sequential factorizer in MAHOUT-737 is a direct implementation of this and I also plan to include it in ParallelALSFactorizationJob. --sebastian On 07.11.2011 07:10, Sean Owen wrote: > It's embedded in both as far as I can tell, though I don't know enough > about the implementation to say what the split is. Ted remarked once > that it's usual to split sqrt(S) across both. > > Dotting two vectors to get one rating is really just the process of > matrix multiplication to recover a value from the approximate > user-item matrix. A = U S V', and we have some truncated versions of > those right matrices. Uk Sk V'k gives Ak, which is some approximation > of the original A (the input) but with many new values filled in from > which to make recommendations. The Uk and V'k actually already have Sk > embedded. So to get one value in Ak is just a dot of a row of Uk with > a column of V'k, per usual matrix multiplication. > > On Sun, Nov 6, 2011 at 11:53 PM, Lance Norskog <[EMAIL PROTECTED]> wrote: >> This thread is about the class SVDRecommender, which uses an externally >> created factorization to do recommendation. >> >> A: The Factorization classes do not extract the scaling diagonal matrix. Is >> this embedded in the left or right matrix? Or spread across both? >> B: Is there an explanation of why the dot product of two feature vectors >> creates the preference? A paper or blog post? Or a paragraph in a reply? >> >> -- >> Lance Norskog >> [EMAIL PROTECTED] >> |