Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Mahout, mail # user - How does SVDRecommender work in mahout?


Copy link to this message
-
Re: How does SVDRecommender work in mahout?
Daniel Quach 2012-04-25, 19:17
Regarding the factorization (I am using ALSWRFactorizer), is there a limit to how large a data set that can be factorized?

I am trying to apply it on the 100K rating data set from group lens (approximately 1000 users by 1600 movies).

It's been running for at least 10 minutes now, I am getting the feeling it might not be wise to apply the factorizer on a some of group lens's larger data sets...

On Apr 18, 2012, at 1:09 PM, Sean Owen wrote:

> This paper doesn't address how to compute the SVD. There are two
> approaches implemented with SVDRecommender. One computes a SVD, one
> doesn't :) Really it ought to be called something like
> MatrixFactorizationRecommender. The SVD factorizer uses a fairly
> simple expectation maximization approach. I don't know how well this
> scales. The other factorizer uses alternating-least-squares.
>
> What you come out with are not 3 matrices, from an SVD, but 2. The "S"
> matrix in the SVD of singular values is mashed into the left/right
> singular vectors.
>
> So to answer your question now, the prediction expression is
> essentially the same, with two caveats:
>
> 1. It shows it as the product of U, sqrt(S), sqrt(S), and V. What you
> get out of the factorizer are really more like the "U" and "V" with
> the two sqrt(S) bits already multiplied in. The product comes out the
> same, there is a conceptual difference I suppose but not a practical
> one. In both cases you're really just multiplying the matrix factors
> all back together to make the predictions.
>
> 2. This model subtracts the customer average rating in the beginning,
> and adds it back at the end here. The SVDRecommender doesn't do that,
> because, quite crucially, it turns sparse data into dense data (all
> the zeroes become non-zero) and this crushes scalability.
>
> The answer is "mostly the same thing" yes. In fact this is broadly how
> all matrix factorization approaches work.
>
> On Wed, Apr 18, 2012 at 2:49 PM, Daniel Quach <[EMAIL PROTECTED]> wrote:
>> I am basing my knowledge off this paper: http://www.grouplens.org/papers/pdf/webKDD00.pdf
>>
>> Your book provided algorithms for the user-based, item-based, and slope one recommendation, but none for the SVDRecommender (I'm guessing because it was experimental)
>>
>> Does the SVDRecommender just compute the resultant matrices and follow a formula similar to the one at the top of page 5 in the linked paper? I think I understand the process of SVD but I'm just wondering how it's exactly applied to obtain recommendations in mahout's case.
>>
>>
>> On Apr 18, 2012, at 12:13 PM, Sean Owen wrote:
>>
>>> Yes you could call it a model-based approach. I suppose I was thinking
>>> more of Bayesian implementations when I wrote that sentence.
>>>
>>> SVD is the Singular Value Decomposition -- are you asking what the SVD
>>> is, or what matrix factorization is, or something about specific code
>>> here? You can look up the SVD online.
>>>
>>> On Wed, Apr 18, 2012 at 12:49 PM, Daniel Quach <[EMAIL PROTECTED]> wrote:
>>>> I had originally thought the experimental SVDrecommender in mahout was a model-based collaborative filtering technique. Looking at the book "Mahout in Action", it mentions that model-based recommenders are a future goal for mahout, which implies to me that the SVDRecommender is not considered model-based.
>>>>
>>>> How exactly does the SVDRecommender work in mahout? I can't seem to find any description of the algorithm underneath it
>>