|
Daniel Quach
2012-04-18, 17:49
Sean Owen
2012-04-18, 19:13
Daniel Quach
2012-04-18, 19:49
Sean Owen
2012-04-18, 20:09
Manuel Blechschmidt
2012-04-18, 20:28
Daniel Quach
2012-04-25, 19:17
Sean Owen
2012-04-25, 19:25
Daniel Quach
2012-04-25, 20:28
Sean Owen
2012-04-25, 23:26
Daniel Quach
2012-04-29, 21:24
Sean Owen
2012-04-29, 21:38
Daniel Quach
2012-04-29, 21:45
Sean Owen
2012-04-29, 21:48
Sebastian Schelter
2012-04-30, 06:31
Daniel Quach
2012-05-02, 04:50
|
-
How does SVDRecommender work in mahout?Daniel Quach 2012-04-18, 17:49
I had originally thought the experimental SVDrecommender in mahout was a model-based collaborative filtering technique. Looking at the book "Mahout in Action", it mentions that model-based recommenders are a future goal for mahout, which implies to me that the SVDRecommender is not considered model-based.
How exactly does the SVDRecommender work in mahout? I can't seem to find any description of the algorithm underneath it
-
Re: How does SVDRecommender work in mahout?Sean Owen 2012-04-18, 19:13
Yes you could call it a model-based approach. I suppose I was thinking
more of Bayesian implementations when I wrote that sentence. SVD is the Singular Value Decomposition -- are you asking what the SVD is, or what matrix factorization is, or something about specific code here? You can look up the SVD online. On Wed, Apr 18, 2012 at 12:49 PM, Daniel Quach <[EMAIL PROTECTED]> wrote: > I had originally thought the experimental SVDrecommender in mahout was a model-based collaborative filtering technique. Looking at the book "Mahout in Action", it mentions that model-based recommenders are a future goal for mahout, which implies to me that the SVDRecommender is not considered model-based. > > How exactly does the SVDRecommender work in mahout? I can't seem to find any description of the algorithm underneath it
-
Re: How does SVDRecommender work in mahout?Daniel Quach 2012-04-18, 19:49
I am basing my knowledge off this paper: http://www.grouplens.org/papers/pdf/webKDD00.pdf
Your book provided algorithms for the user-based, item-based, and slope one recommendation, but none for the SVDRecommender (I'm guessing because it was experimental) Does the SVDRecommender just compute the resultant matrices and follow a formula similar to the one at the top of page 5 in the linked paper? I think I understand the process of SVD but I'm just wondering how it's exactly applied to obtain recommendations in mahout's case. On Apr 18, 2012, at 12:13 PM, Sean Owen wrote: > Yes you could call it a model-based approach. I suppose I was thinking > more of Bayesian implementations when I wrote that sentence. > > SVD is the Singular Value Decomposition -- are you asking what the SVD > is, or what matrix factorization is, or something about specific code > here? You can look up the SVD online. > > On Wed, Apr 18, 2012 at 12:49 PM, Daniel Quach <[EMAIL PROTECTED]> wrote: >> I had originally thought the experimental SVDrecommender in mahout was a model-based collaborative filtering technique. Looking at the book "Mahout in Action", it mentions that model-based recommenders are a future goal for mahout, which implies to me that the SVDRecommender is not considered model-based. >> >> How exactly does the SVDRecommender work in mahout? I can't seem to find any description of the algorithm underneath it
-
Re: How does SVDRecommender work in mahout?Sean Owen 2012-04-18, 20:09
This paper doesn't address how to compute the SVD. There are two
approaches implemented with SVDRecommender. One computes a SVD, one doesn't :) Really it ought to be called something like MatrixFactorizationRecommender. The SVD factorizer uses a fairly simple expectation maximization approach. I don't know how well this scales. The other factorizer uses alternating-least-squares. What you come out with are not 3 matrices, from an SVD, but 2. The "S" matrix in the SVD of singular values is mashed into the left/right singular vectors. So to answer your question now, the prediction expression is essentially the same, with two caveats: 1. It shows it as the product of U, sqrt(S), sqrt(S), and V. What you get out of the factorizer are really more like the "U" and "V" with the two sqrt(S) bits already multiplied in. The product comes out the same, there is a conceptual difference I suppose but not a practical one. In both cases you're really just multiplying the matrix factors all back together to make the predictions. 2. This model subtracts the customer average rating in the beginning, and adds it back at the end here. The SVDRecommender doesn't do that, because, quite crucially, it turns sparse data into dense data (all the zeroes become non-zero) and this crushes scalability. The answer is "mostly the same thing" yes. In fact this is broadly how all matrix factorization approaches work. On Wed, Apr 18, 2012 at 2:49 PM, Daniel Quach <[EMAIL PROTECTED]> wrote: > I am basing my knowledge off this paper: http://www.grouplens.org/papers/pdf/webKDD00.pdf > > Your book provided algorithms for the user-based, item-based, and slope one recommendation, but none for the SVDRecommender (I'm guessing because it was experimental) > > Does the SVDRecommender just compute the resultant matrices and follow a formula similar to the one at the top of page 5 in the linked paper? I think I understand the process of SVD but I'm just wondering how it's exactly applied to obtain recommendations in mahout's case. > > > On Apr 18, 2012, at 12:13 PM, Sean Owen wrote: > >> Yes you could call it a model-based approach. I suppose I was thinking >> more of Bayesian implementations when I wrote that sentence. >> >> SVD is the Singular Value Decomposition -- are you asking what the SVD >> is, or what matrix factorization is, or something about specific code >> here? You can look up the SVD online. >> >> On Wed, Apr 18, 2012 at 12:49 PM, Daniel Quach <[EMAIL PROTECTED]> wrote: >>> I had originally thought the experimental SVDrecommender in mahout was a model-based collaborative filtering technique. Looking at the book "Mahout in Action", it mentions that model-based recommenders are a future goal for mahout, which implies to me that the SVDRecommender is not considered model-based. >>> >>> How exactly does the SVDRecommender work in mahout? I can't seem to find any description of the algorithm underneath it >
-
Re: How does SVDRecommender work in mahout?Manuel Blechschmidt 2012-04-18, 20:28
Hi Daniel,
so SVD is a model based recommender. Here is the definition from my master thesis: ... Model based approaches use statistical models from machine learning research to produce a model to find the underlying logic for preferences. They try to learn a model, for example, a baysian belief network. As an alternative, a dimensionality reduction technique like SVD [44] is used to extract the hidden factors. ... 2010 An architecture for evaluating recommender systems in real world scenarios - Manuel Blechschmidt Mahout contains a SVDRecommender. The tricky part for SVD is the estimation of the 3 matrixes most of the time called U, S and V. Mahout contains two implementation both from Sebastian Schelter as far as I know for estimating them. The first one is an ALSWRFActorizer based on the following paper: Large-scale Collaborative Filtering for the Netflix Prize Alternating-Least-Squares with Weighted-λ-Regularization http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf The second one is a Expectation Maximization SVD Factorizer. http://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm If you want to understand SVD I would recommend the following R examples: https://github.com/ManuelB/facebook-recommender-demo/blob/master/docs/BedConExamples.R You can find a Mahout version using the ExpectationMaximationRecommender here: https://github.com/ManuelB/facebook-recommender-demo/blob/master/src/main/java/de/apaxo/bedcon/AnimalFoodRecommender.java The both examples above are documented in the following presentation: http://www.slideshare.net/ManuelB86/how-to-build-a-recommender-system-based-on-mahout-and-java-ee Further there is a SVD implementation from Sean Owen and his new company myrrix.com it is also based on an Alternate Least Square algorithm for factorization: http://myrrix.com/docs/serving/javadoc/net/myrrix/online/generation/AlternatingLeastSquares.html "Collaborative Filtering for Implicit Feedback Datasets" by Yifan Hu, Yehuda Koren, and Chris Volinsky http://myrrix.com/docs/serving/javadoc/net/myrrix/online/generation/www2.research.att.com/~yifanhu/PUB/cf.pdf /Manuel [44] Koren, Yehuda ; Bell, Robert ; Volinsky, Chris: Matrix Factorization Tech- niques for Recommender Systems. In: IEEE Computer 42 (2009), Nr. 8, S. 30–37. http://dx.doi.org/10.1109/MC.2009.263. – DOI 10.1109/MC.2009.263 On 18.04.2012, at 21:49, Daniel Quach wrote: > I am basing my knowledge off this paper: http://www.grouplens.org/papers/pdf/webKDD00.pdf > > Your book provided algorithms for the user-based, item-based, and slope one recommendation, but none for the SVDRecommender (I'm guessing because it was experimental) > > Does the SVDRecommender just compute the resultant matrices and follow a formula similar to the one at the top of page 5 in the linked paper? I think I understand the process of SVD but I'm just wondering how it's exactly applied to obtain recommendations in mahout's case. > > > On Apr 18, 2012, at 12:13 PM, Sean Owen wrote: > >> Yes you could call it a model-based approach. I suppose I was thinking >> more of Bayesian implementations when I wrote that sentence. >> >> SVD is the Singular Value Decomposition -- are you asking what the SVD >> is, or what matrix factorization is, or something about specific code >> here? You can look up the SVD online. >> >> On Wed, Apr 18, 2012 at 12:49 PM, Daniel Quach <[EMAIL PROTECTED]> wrote: >>> I had originally thought the experimental SVDrecommender in mahout was a model-based collaborative filtering technique. Looking at the book "Mahout in Action", it mentions that model-based recommenders are a future goal for mahout, which implies to me that the SVDRecommender is not considered model-based. >>> >>> How exactly does the SVDRecommender work in mahout? I can't seem to find any description of the algorithm underneath it > -- Manuel Blechschmidt Dortustr. 57 14467 Potsdam Mobil: 0173/6322621 Twitter: http://twitter.com/Manuel_B
-
Re: How does SVDRecommender work in mahout?Daniel Quach 2012-04-25, 19:17
Regarding the factorization (I am using ALSWRFactorizer), is there a limit to how large a data set that can be factorized?
I am trying to apply it on the 100K rating data set from group lens (approximately 1000 users by 1600 movies). It's been running for at least 10 minutes now, I am getting the feeling it might not be wise to apply the factorizer on a some of group lens's larger data sets... On Apr 18, 2012, at 1:09 PM, Sean Owen wrote: > This paper doesn't address how to compute the SVD. There are two > approaches implemented with SVDRecommender. One computes a SVD, one > doesn't :) Really it ought to be called something like > MatrixFactorizationRecommender. The SVD factorizer uses a fairly > simple expectation maximization approach. I don't know how well this > scales. The other factorizer uses alternating-least-squares. > > What you come out with are not 3 matrices, from an SVD, but 2. The "S" > matrix in the SVD of singular values is mashed into the left/right > singular vectors. > > So to answer your question now, the prediction expression is > essentially the same, with two caveats: > > 1. It shows it as the product of U, sqrt(S), sqrt(S), and V. What you > get out of the factorizer are really more like the "U" and "V" with > the two sqrt(S) bits already multiplied in. The product comes out the > same, there is a conceptual difference I suppose but not a practical > one. In both cases you're really just multiplying the matrix factors > all back together to make the predictions. > > 2. This model subtracts the customer average rating in the beginning, > and adds it back at the end here. The SVDRecommender doesn't do that, > because, quite crucially, it turns sparse data into dense data (all > the zeroes become non-zero) and this crushes scalability. > > The answer is "mostly the same thing" yes. In fact this is broadly how > all matrix factorization approaches work. > > On Wed, Apr 18, 2012 at 2:49 PM, Daniel Quach <[EMAIL PROTECTED]> wrote: >> I am basing my knowledge off this paper: http://www.grouplens.org/papers/pdf/webKDD00.pdf >> >> Your book provided algorithms for the user-based, item-based, and slope one recommendation, but none for the SVDRecommender (I'm guessing because it was experimental) >> >> Does the SVDRecommender just compute the resultant matrices and follow a formula similar to the one at the top of page 5 in the linked paper? I think I understand the process of SVD but I'm just wondering how it's exactly applied to obtain recommendations in mahout's case. >> >> >> On Apr 18, 2012, at 12:13 PM, Sean Owen wrote: >> >>> Yes you could call it a model-based approach. I suppose I was thinking >>> more of Bayesian implementations when I wrote that sentence. >>> >>> SVD is the Singular Value Decomposition -- are you asking what the SVD >>> is, or what matrix factorization is, or something about specific code >>> here? You can look up the SVD online. >>> >>> On Wed, Apr 18, 2012 at 12:49 PM, Daniel Quach <[EMAIL PROTECTED]> wrote: >>>> I had originally thought the experimental SVDrecommender in mahout was a model-based collaborative filtering technique. Looking at the book "Mahout in Action", it mentions that model-based recommenders are a future goal for mahout, which implies to me that the SVDRecommender is not considered model-based. >>>> >>>> How exactly does the SVDRecommender work in mahout? I can't seem to find any description of the algorithm underneath it >>
-
Re: How does SVDRecommender work in mahout?Sean Owen 2012-04-25, 19:25
There's not a hard limit; the hard limit you would run into is memory,
if anything. This sounds slow. It may be that this implementation could use some optimization somewhere. Are you running many iterations or using a large number of features? I have a different ALS implementation that finishes this data set (3 iterations, 30 features -- quick and dirty) in more like 20 seconds. Here's some info on a run on a much larger data set, using ALS, for comparison: http://myrrix.com/example-performance/ On Wed, Apr 25, 2012 at 8:17 PM, Daniel Quach <[EMAIL PROTECTED]> wrote: > Regarding the factorization (I am using ALSWRFactorizer), is there a limit to how large a data set that can be factorized? > > I am trying to apply it on the 100K rating data set from group lens (approximately 1000 users by 1600 movies). > > It's been running for at least 10 minutes now, I am getting the feeling it might not be wise to apply the factorizer on a some of group lens's larger data sets... > > On Apr 18, 2012, at 1:09 PM, Sean Owen wrote: > >> This paper doesn't address how to compute the SVD. There are two >> approaches implemented with SVDRecommender. One computes a SVD, one >> doesn't :) Really it ought to be called something like >> MatrixFactorizationRecommender. The SVD factorizer uses a fairly >> simple expectation maximization approach. I don't know how well this >> scales. The other factorizer uses alternating-least-squares. >> >> What you come out with are not 3 matrices, from an SVD, but 2. The "S" >> matrix in the SVD of singular values is mashed into the left/right >> singular vectors. >> >> So to answer your question now, the prediction expression is >> essentially the same, with two caveats: >> >> 1. It shows it as the product of U, sqrt(S), sqrt(S), and V. What you >> get out of the factorizer are really more like the "U" and "V" with >> the two sqrt(S) bits already multiplied in. The product comes out the >> same, there is a conceptual difference I suppose but not a practical >> one. In both cases you're really just multiplying the matrix factors >> all back together to make the predictions. >> >> 2. This model subtracts the customer average rating in the beginning, >> and adds it back at the end here. The SVDRecommender doesn't do that, >> because, quite crucially, it turns sparse data into dense data (all >> the zeroes become non-zero) and this crushes scalability. >> >> The answer is "mostly the same thing" yes. In fact this is broadly how >> all matrix factorization approaches work. >> >> On Wed, Apr 18, 2012 at 2:49 PM, Daniel Quach <[EMAIL PROTECTED]> wrote: >>> I am basing my knowledge off this paper: http://www.grouplens.org/papers/pdf/webKDD00.pdf >>> >>> Your book provided algorithms for the user-based, item-based, and slope one recommendation, but none for the SVDRecommender (I'm guessing because it was experimental) >>> >>> Does the SVDRecommender just compute the resultant matrices and follow a formula similar to the one at the top of page 5 in the linked paper? I think I understand the process of SVD but I'm just wondering how it's exactly applied to obtain recommendations in mahout's case. >>> >>> >>> On Apr 18, 2012, at 12:13 PM, Sean Owen wrote: >>> >>>> Yes you could call it a model-based approach. I suppose I was thinking >>>> more of Bayesian implementations when I wrote that sentence. >>>> >>>> SVD is the Singular Value Decomposition -- are you asking what the SVD >>>> is, or what matrix factorization is, or something about specific code >>>> here? You can look up the SVD online. >>>> >>>> On Wed, Apr 18, 2012 at 12:49 PM, Daniel Quach <[EMAIL PROTECTED]> wrote: >>>>> I had originally thought the experimental SVDrecommender in mahout was a model-based collaborative filtering technique. Looking at the book "Mahout in Action", it mentions that model-based recommenders are a future goal for mahout, which implies to me that the SVDRecommender is not considered model-based. >>>>> >>>>> How exactly does the SVDRecommender work in mahout? I can't seem to find any description of the algorithm underneath it
-
Re: How does SVDRecommender work in mahout?Daniel Quach 2012-04-25, 20:28
I tried it again with 30 features and 3 iterations on the same data set, it's still running for 10+ minutes just to factorize for the SVDRecommender and has yet to complete. Perhaps it is my machine?
I am running on a macbook air with 4GB of RAM and an intel i5 processor, I specified 2GB of memory for java. (-Xmx2048M) On Apr 25, 2012, at 12:25 PM, Sean Owen wrote: > There's not a hard limit; the hard limit you would run into is memory, > if anything. > > This sounds slow. It may be that this implementation could use some > optimization somewhere. Are you running many iterations or using a > large number of features? > > I have a different ALS implementation that finishes this data set (3 > iterations, 30 features -- quick and dirty) in more like 20 seconds. > Here's some info on a run on a much larger data set, using ALS, for > comparison: http://myrrix.com/example-performance/ > > On Wed, Apr 25, 2012 at 8:17 PM, Daniel Quach <[EMAIL PROTECTED]> wrote: >> Regarding the factorization (I am using ALSWRFactorizer), is there a limit to how large a data set that can be factorized? >> >> I am trying to apply it on the 100K rating data set from group lens (approximately 1000 users by 1600 movies). >> >> It's been running for at least 10 minutes now, I am getting the feeling it might not be wise to apply the factorizer on a some of group lens's larger data sets... >> >> On Apr 18, 2012, at 1:09 PM, Sean Owen wrote: >> >>> This paper doesn't address how to compute the SVD. There are two >>> approaches implemented with SVDRecommender. One computes a SVD, one >>> doesn't :) Really it ought to be called something like >>> MatrixFactorizationRecommender. The SVD factorizer uses a fairly >>> simple expectation maximization approach. I don't know how well this >>> scales. The other factorizer uses alternating-least-squares. >>> >>> What you come out with are not 3 matrices, from an SVD, but 2. The "S" >>> matrix in the SVD of singular values is mashed into the left/right >>> singular vectors. >>> >>> So to answer your question now, the prediction expression is >>> essentially the same, with two caveats: >>> >>> 1. It shows it as the product of U, sqrt(S), sqrt(S), and V. What you >>> get out of the factorizer are really more like the "U" and "V" with >>> the two sqrt(S) bits already multiplied in. The product comes out the >>> same, there is a conceptual difference I suppose but not a practical >>> one. In both cases you're really just multiplying the matrix factors >>> all back together to make the predictions. >>> >>> 2. This model subtracts the customer average rating in the beginning, >>> and adds it back at the end here. The SVDRecommender doesn't do that, >>> because, quite crucially, it turns sparse data into dense data (all >>> the zeroes become non-zero) and this crushes scalability. >>> >>> The answer is "mostly the same thing" yes. In fact this is broadly how >>> all matrix factorization approaches work. >>> >>> On Wed, Apr 18, 2012 at 2:49 PM, Daniel Quach <[EMAIL PROTECTED]> wrote: >>>> I am basing my knowledge off this paper: http://www.grouplens.org/papers/pdf/webKDD00.pdf >>>> >>>> Your book provided algorithms for the user-based, item-based, and slope one recommendation, but none for the SVDRecommender (I'm guessing because it was experimental) >>>> >>>> Does the SVDRecommender just compute the resultant matrices and follow a formula similar to the one at the top of page 5 in the linked paper? I think I understand the process of SVD but I'm just wondering how it's exactly applied to obtain recommendations in mahout's case. >>>> >>>> >>>> On Apr 18, 2012, at 12:13 PM, Sean Owen wrote: >>>> >>>>> Yes you could call it a model-based approach. I suppose I was thinking >>>>> more of Bayesian implementations when I wrote that sentence. >>>>> >>>>> SVD is the Singular Value Decomposition -- are you asking what the SVD >>>>> is, or what matrix factorization is, or something about specific code >>>>> here? You can look up the SVD online.
-
Re: How does SVDRecommender work in mahout?Sean Owen 2012-04-25, 23:26
I don't know what the particular issue is; I imagine there's something
that needs some optimization in there. If you're definitely interested in ALS and recommenders, I don't feel bad promoting our attempts to commercialize Mahout: Myrrix (http://myrrix.com) is exactly an ALS-based recommender, and I know it will crunch this data set into a model in 16 seconds on my laptop. This part of it is also free / open source. Sean On Wed, Apr 25, 2012 at 9:28 PM, Daniel Quach <[EMAIL PROTECTED]> wrote: > I tried it again with 30 features and 3 iterations on the same data set, it's still running for 10+ minutes just to factorize for the SVDRecommender and has yet to complete. Perhaps it is my machine? > > I am running on a macbook air with 4GB of RAM and an intel i5 processor, I specified 2GB of memory for java. (-Xmx2048M) > > > > On Apr 25, 2012, at 12:25 PM, Sean Owen wrote: > >> There's not a hard limit; the hard limit you would run into is memory, >> if anything. >> >> This sounds slow. It may be that this implementation could use some >> optimization somewhere. Are you running many iterations or using a >> large number of features? >> >> I have a different ALS implementation that finishes this data set (3 >> iterations, 30 features -- quick and dirty) in more like 20 seconds. >> Here's some info on a run on a much larger data set, using ALS, for >> comparison: http://myrrix.com/example-performance/ >> >> On Wed, Apr 25, 2012 at 8:17 PM, Daniel Quach <[EMAIL PROTECTED]> wrote: >>> Regarding the factorization (I am using ALSWRFactorizer), is there a limit to how large a data set that can be factorized? >>> >>> I am trying to apply it on the 100K rating data set from group lens (approximately 1000 users by 1600 movies). >>> >>> It's been running for at least 10 minutes now, I am getting the feeling it might not be wise to apply the factorizer on a some of group lens's larger data sets... >>> >>> On Apr 18, 2012, at 1:09 PM, Sean Owen wrote: >>> >>>> This paper doesn't address how to compute the SVD. There are two >>>> approaches implemented with SVDRecommender. One computes a SVD, one >>>> doesn't :) Really it ought to be called something like >>>> MatrixFactorizationRecommender. The SVD factorizer uses a fairly >>>> simple expectation maximization approach. I don't know how well this >>>> scales. The other factorizer uses alternating-least-squares. >>>> >>>> What you come out with are not 3 matrices, from an SVD, but 2. The "S" >>>> matrix in the SVD of singular values is mashed into the left/right >>>> singular vectors. >>>> >>>> So to answer your question now, the prediction expression is >>>> essentially the same, with two caveats: >>>> >>>> 1. It shows it as the product of U, sqrt(S), sqrt(S), and V. What you >>>> get out of the factorizer are really more like the "U" and "V" with >>>> the two sqrt(S) bits already multiplied in. The product comes out the >>>> same, there is a conceptual difference I suppose but not a practical >>>> one. In both cases you're really just multiplying the matrix factors >>>> all back together to make the predictions. >>>> >>>> 2. This model subtracts the customer average rating in the beginning, >>>> and adds it back at the end here. The SVDRecommender doesn't do that, >>>> because, quite crucially, it turns sparse data into dense data (all >>>> the zeroes become non-zero) and this crushes scalability. >>>> >>>> The answer is "mostly the same thing" yes. In fact this is broadly how >>>> all matrix factorization approaches work. >>>> >>>> On Wed, Apr 18, 2012 at 2:49 PM, Daniel Quach <[EMAIL PROTECTED]> wrote: >>>>> I am basing my knowledge off this paper: http://www.grouplens.org/papers/pdf/webKDD00.pdf >>>>> >>>>> Your book provided algorithms for the user-based, item-based, and slope one recommendation, but none for the SVDRecommender (I'm guessing because it was experimental) >>>>> >>>>> Does the SVDRecommender just compute the resultant matrices and follow a formula similar to the one at the top of page 5 in the linked paper? I think I understand the process of SVD but I'm just wondering how it's exactly applied to obtain recommendations in mahout's case.
-
Re: How does SVDRecommender work in mahout?Daniel Quach 2012-04-29, 21:24
Just wondering, what does mahout do for user/item pairs that do not have a rating? Does it fill it in with some average value? fill with zeros? something else?
On Apr 25, 2012, at 4:26 PM, Sean Owen wrote: > I don't know what the particular issue is; I imagine there's something > that needs some optimization in there. > > If you're definitely interested in ALS and recommenders, I don't feel > bad promoting our attempts to commercialize Mahout: Myrrix > (http://myrrix.com) is exactly an ALS-based recommender, and I know it > will crunch this data set into a model in 16 seconds on my laptop. > This part of it is also free / open source. > > Sean > > On Wed, Apr 25, 2012 at 9:28 PM, Daniel Quach <[EMAIL PROTECTED]> wrote: >> I tried it again with 30 features and 3 iterations on the same data set, it's still running for 10+ minutes just to factorize for the SVDRecommender and has yet to complete. Perhaps it is my machine? >> >> I am running on a macbook air with 4GB of RAM and an intel i5 processor, I specified 2GB of memory for java. (-Xmx2048M) >> >> >> >> On Apr 25, 2012, at 12:25 PM, Sean Owen wrote: >> >>> There's not a hard limit; the hard limit you would run into is memory, >>> if anything. >>> >>> This sounds slow. It may be that this implementation could use some >>> optimization somewhere. Are you running many iterations or using a >>> large number of features? >>> >>> I have a different ALS implementation that finishes this data set (3 >>> iterations, 30 features -- quick and dirty) in more like 20 seconds. >>> Here's some info on a run on a much larger data set, using ALS, for >>> comparison: http://myrrix.com/example-performance/ >>> >>> On Wed, Apr 25, 2012 at 8:17 PM, Daniel Quach <[EMAIL PROTECTED]> wrote: >>>> Regarding the factorization (I am using ALSWRFactorizer), is there a limit to how large a data set that can be factorized? >>>> >>>> I am trying to apply it on the 100K rating data set from group lens (approximately 1000 users by 1600 movies). >>>> >>>> It's been running for at least 10 minutes now, I am getting the feeling it might not be wise to apply the factorizer on a some of group lens's larger data sets... >>>> >>>> On Apr 18, 2012, at 1:09 PM, Sean Owen wrote: >>>> >>>>> This paper doesn't address how to compute the SVD. There are two >>>>> approaches implemented with SVDRecommender. One computes a SVD, one >>>>> doesn't :) Really it ought to be called something like >>>>> MatrixFactorizationRecommender. The SVD factorizer uses a fairly >>>>> simple expectation maximization approach. I don't know how well this >>>>> scales. The other factorizer uses alternating-least-squares. >>>>> >>>>> What you come out with are not 3 matrices, from an SVD, but 2. The "S" >>>>> matrix in the SVD of singular values is mashed into the left/right >>>>> singular vectors. >>>>> >>>>> So to answer your question now, the prediction expression is >>>>> essentially the same, with two caveats: >>>>> >>>>> 1. It shows it as the product of U, sqrt(S), sqrt(S), and V. What you >>>>> get out of the factorizer are really more like the "U" and "V" with >>>>> the two sqrt(S) bits already multiplied in. The product comes out the >>>>> same, there is a conceptual difference I suppose but not a practical >>>>> one. In both cases you're really just multiplying the matrix factors >>>>> all back together to make the predictions. >>>>> >>>>> 2. This model subtracts the customer average rating in the beginning, >>>>> and adds it back at the end here. The SVDRecommender doesn't do that, >>>>> because, quite crucially, it turns sparse data into dense data (all >>>>> the zeroes become non-zero) and this crushes scalability. >>>>> >>>>> The answer is "mostly the same thing" yes. In fact this is broadly how >>>>> all matrix factorization approaches work. >>>>> >>>>> On Wed, Apr 18, 2012 at 2:49 PM, Daniel Quach <[EMAIL PROTECTED]> wrote: >>>>>> I am basing my knowledge off this paper: http://www.grouplens.org/papers/pdf/webKDD00.pdf
-
Re: How does SVDRecommender work in mahout?Sean Owen 2012-04-29, 21:38
It depends a bit on the algorithm. The matrix-based approaches
naturally implicitly assume 0. The similarity-based ones don't assume any value at all and missing data is ignored, not inferred. (But you can make it infer values if you want. But it's not helpful in general.) On Sun, Apr 29, 2012 at 10:24 PM, Daniel Quach <[EMAIL PROTECTED]> wrote: > Just wondering, what does mahout do for user/item pairs that do not have a rating? Does it fill it in with some average value? fill with zeros? something else? > > On Apr 25, 2012, at 4:26 PM, Sean Owen wrote: > >> I don't know what the particular issue is; I imagine there's something >> that needs some optimization in there. >> >> If you're definitely interested in ALS and recommenders, I don't feel >> bad promoting our attempts to commercialize Mahout: Myrrix >> (http://myrrix.com) is exactly an ALS-based recommender, and I know it >> will crunch this data set into a model in 16 seconds on my laptop. >> This part of it is also free / open source. >> >> Sean >> >> On Wed, Apr 25, 2012 at 9:28 PM, Daniel Quach <[EMAIL PROTECTED]> wrote: >>> I tried it again with 30 features and 3 iterations on the same data set, it's still running for 10+ minutes just to factorize for the SVDRecommender and has yet to complete. Perhaps it is my machine? >>> >>> I am running on a macbook air with 4GB of RAM and an intel i5 processor, I specified 2GB of memory for java. (-Xmx2048M) >>> >>> >>> >>> On Apr 25, 2012, at 12:25 PM, Sean Owen wrote: >>> >>>> There's not a hard limit; the hard limit you would run into is memory, >>>> if anything. >>>> >>>> This sounds slow. It may be that this implementation could use some >>>> optimization somewhere. Are you running many iterations or using a >>>> large number of features? >>>> >>>> I have a different ALS implementation that finishes this data set (3 >>>> iterations, 30 features -- quick and dirty) in more like 20 seconds. >>>> Here's some info on a run on a much larger data set, using ALS, for >>>> comparison: http://myrrix.com/example-performance/ >>>> >>>> On Wed, Apr 25, 2012 at 8:17 PM, Daniel Quach <[EMAIL PROTECTED]> wrote: >>>>> Regarding the factorization (I am using ALSWRFactorizer), is there a limit to how large a data set that can be factorized? >>>>> >>>>> I am trying to apply it on the 100K rating data set from group lens (approximately 1000 users by 1600 movies). >>>>> >>>>> It's been running for at least 10 minutes now, I am getting the feeling it might not be wise to apply the factorizer on a some of group lens's larger data sets... >>>>> >>>>> On Apr 18, 2012, at 1:09 PM, Sean Owen wrote: >>>>> >>>>>> This paper doesn't address how to compute the SVD. There are two >>>>>> approaches implemented with SVDRecommender. One computes a SVD, one >>>>>> doesn't :) Really it ought to be called something like >>>>>> MatrixFactorizationRecommender. The SVD factorizer uses a fairly >>>>>> simple expectation maximization approach. I don't know how well this >>>>>> scales. The other factorizer uses alternating-least-squares. >>>>>> >>>>>> What you come out with are not 3 matrices, from an SVD, but 2. The "S" >>>>>> matrix in the SVD of singular values is mashed into the left/right >>>>>> singular vectors. >>>>>> >>>>>> So to answer your question now, the prediction expression is >>>>>> essentially the same, with two caveats: >>>>>> >>>>>> 1. It shows it as the product of U, sqrt(S), sqrt(S), and V. What you >>>>>> get out of the factorizer are really more like the "U" and "V" with >>>>>> the two sqrt(S) bits already multiplied in. The product comes out the >>>>>> same, there is a conceptual difference I suppose but not a practical >>>>>> one. In both cases you're really just multiplying the matrix factors >>>>>> all back together to make the predictions. >>>>>> >>>>>> 2. This model subtracts the customer average rating in the beginning, >>>>>> and adds it back at the end here. The SVDRecommender doesn't do that, >>>>>> because, quite crucially, it turns sparse data into dense data (all
-
Re: How does SVDRecommender work in mahout?Daniel Quach 2012-04-29, 21:45
ah sorry, I meant in the context of the SVDRecommender.
Your earlier email mentioned that the DataModel does NOT do any subtraction, nor add back in the end, ensuring the matrix remains sparse. Does that mean it inserts zero values? On Apr 29, 2012, at 2:24 PM, Daniel Quach wrote: > Just wondering, what does mahout do for user/item pairs that do not have a rating? Does it fill it in with some average value? fill with zeros? something else? > > On Apr 25, 2012, at 4:26 PM, Sean Owen wrote: > >> I don't know what the particular issue is; I imagine there's something >> that needs some optimization in there. >> >> If you're definitely interested in ALS and recommenders, I don't feel >> bad promoting our attempts to commercialize Mahout: Myrrix >> (http://myrrix.com) is exactly an ALS-based recommender, and I know it >> will crunch this data set into a model in 16 seconds on my laptop. >> This part of it is also free / open source. >> >> Sean >> >> On Wed, Apr 25, 2012 at 9:28 PM, Daniel Quach <[EMAIL PROTECTED]> wrote: >>> I tried it again with 30 features and 3 iterations on the same data set, it's still running for 10+ minutes just to factorize for the SVDRecommender and has yet to complete. Perhaps it is my machine? >>> >>> I am running on a macbook air with 4GB of RAM and an intel i5 processor, I specified 2GB of memory for java. (-Xmx2048M) >>> >>> >>> >>> On Apr 25, 2012, at 12:25 PM, Sean Owen wrote: >>> >>>> There's not a hard limit; the hard limit you would run into is memory, >>>> if anything. >>>> >>>> This sounds slow. It may be that this implementation could use some >>>> optimization somewhere. Are you running many iterations or using a >>>> large number of features? >>>> >>>> I have a different ALS implementation that finishes this data set (3 >>>> iterations, 30 features -- quick and dirty) in more like 20 seconds. >>>> Here's some info on a run on a much larger data set, using ALS, for >>>> comparison: http://myrrix.com/example-performance/ >>>> >>>> On Wed, Apr 25, 2012 at 8:17 PM, Daniel Quach <[EMAIL PROTECTED]> wrote: >>>>> Regarding the factorization (I am using ALSWRFactorizer), is there a limit to how large a data set that can be factorized? >>>>> >>>>> I am trying to apply it on the 100K rating data set from group lens (approximately 1000 users by 1600 movies). >>>>> >>>>> It's been running for at least 10 minutes now, I am getting the feeling it might not be wise to apply the factorizer on a some of group lens's larger data sets... >>>>> >>>>> On Apr 18, 2012, at 1:09 PM, Sean Owen wrote: >>>>> >>>>>> This paper doesn't address how to compute the SVD. There are two >>>>>> approaches implemented with SVDRecommender. One computes a SVD, one >>>>>> doesn't :) Really it ought to be called something like >>>>>> MatrixFactorizationRecommender. The SVD factorizer uses a fairly >>>>>> simple expectation maximization approach. I don't know how well this >>>>>> scales. The other factorizer uses alternating-least-squares. >>>>>> >>>>>> What you come out with are not 3 matrices, from an SVD, but 2. The "S" >>>>>> matrix in the SVD of singular values is mashed into the left/right >>>>>> singular vectors. >>>>>> >>>>>> So to answer your question now, the prediction expression is >>>>>> essentially the same, with two caveats: >>>>>> >>>>>> 1. It shows it as the product of U, sqrt(S), sqrt(S), and V. What you >>>>>> get out of the factorizer are really more like the "U" and "V" with >>>>>> the two sqrt(S) bits already multiplied in. The product comes out the >>>>>> same, there is a conceptual difference I suppose but not a practical >>>>>> one. In both cases you're really just multiplying the matrix factors >>>>>> all back together to make the predictions. >>>>>> >>>>>> 2. This model subtracts the customer average rating in the beginning, >>>>>> and adds it back at the end here. The SVDRecommender doesn't do that, >>>>>> because, quite crucially, it turns sparse data into dense data (all >>>
-
Re: How does SVDRecommender work in mahout?Sean Owen 2012-04-29, 21:48
They're implicitly zero as far as the math goes IIRC
On Sun, Apr 29, 2012 at 10:45 PM, Daniel Quach <[EMAIL PROTECTED]> wrote: > ah sorry, I meant in the context of the SVDRecommender. > > Your earlier email mentioned that the DataModel does NOT do any subtraction, nor add back in the end, ensuring the matrix remains sparse. Does that mean it inserts zero values?
-
Re: How does SVDRecommender work in mahout?Sebastian Schelter 2012-04-30, 06:31
Daniel,
You have to distinguish between explicit data (ratings from a predefined scale) and implicit data (counting how often you observed some behavior). For explicit data, you can't interpret missing values as zeros, because you simply don't know what the user would give as rating. In order to still use matrix factorization techniques, the decomposition has to be computed in a different way than with standard SVD approaches. The error function stays the same as with SVD (minimize the squared error of the product of the decomposed matrix), but the computation uses only the known entries. That's nothing Mahout specific, Mahout has implementations of the approaches described in http://sifter.org/~simon/journal/20061211.html and in http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.173.2797&rep=rep1&type=pdf For implicit data, the situation is different, because if you haven't observed a user conducting some behavior with an item, than your matrix should indeed have a 0 in that cell. The problem here is that the user might simply not have had the opportunity to interact with a lot of items, which means that you can't really 'trust' the zero entries as much as the other entries. There is a great paper that introduces a 'confidence' value for implicit data to solve this problem: www2.research.att.com/~yifanhu/PUB/cf.pdf Generally speaking, with this technique, the factorization uses the whole matrix, but 'favors' non-zero entries. --sebastian 2012/4/29 Sean Owen <[EMAIL PROTECTED]>: > They're implicitly zero as far as the math goes IIRC > > On Sun, Apr 29, 2012 at 10:45 PM, Daniel Quach <[EMAIL PROTECTED]> wrote: >> ah sorry, I meant in the context of the SVDRecommender. >> >> Your earlier email mentioned that the DataModel does NOT do any subtraction, nor add back in the end, ensuring the matrix remains sparse. Does that mean it inserts zero values?
-
Re: How does SVDRecommender work in mahout?Daniel Quach 2012-05-02, 04:50
I ran the factorizer on grouplens's 1 million rating movie dataset. I ran it for 5 iterations and chose number of features to be 10.
I then constructed an SVDRecommender with the factorization, and generated all preference estimates for every user/movie pair. For some reason, a good number of the user's end up with predictions of "0.0" for every movie, it seems to happen for every user greater than 2700-ish. Is it perhaps a problem due to factorization? I will see if I can reproduce the output, this seems like a bug and not expected behavior. On related note, is there a way to compute the full factorization, save the output, then later retrieve some rank-K approximation? It takes hours to run the factorizer and I feel it might be helpful to save factorizations for reuse. ----- Original Message ----- From: "Sebastian Schelter" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Sunday, April 29, 2012 11:31:34 PM Subject: Re: How does SVDRecommender work in mahout? Daniel, You have to distinguish between explicit data (ratings from a predefined scale) and implicit data (counting how often you observed some behavior). For explicit data, you can't interpret missing values as zeros, because you simply don't know what the user would give as rating. In order to still use matrix factorization techniques, the decomposition has to be computed in a different way than with standard SVD approaches. The error function stays the same as with SVD (minimize the squared error of the product of the decomposed matrix), but the computation uses only the known entries. That's nothing Mahout specific, Mahout has implementations of the approaches described in http://sifter.org/~simon/journal/20061211.html and in http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.173.2797&rep=rep1&type=pdf For implicit data, the situation is different, because if you haven't observed a user conducting some behavior with an item, than your matrix should indeed have a 0 in that cell. The problem here is that the user might simply not have had the opportunity to interact with a lot of items, which means that you can't really 'trust' the zero entries as much as the other entries. There is a great paper that introduces a 'confidence' value for implicit data to solve this problem: www2.research.att.com/~yifanhu/PUB/cf.pdf Generally speaking, with this technique, the factorization uses the whole matrix, but 'favors' non-zero entries. --sebastian 2012/4/29 Sean Owen <[EMAIL PROTECTED]>: > They're implicitly zero as far as the math goes IIRC > > On Sun, Apr 29, 2012 at 10:45 PM, Daniel Quach <[EMAIL PROTECTED]> wrote: >> ah sorry, I meant in the context of the SVDRecommender. >> >> Your earlier email mentioned that the DataModel does NOT do any subtraction, nor add back in the end, ensuring the matrix remains sparse. Does that mean it inserts zero values? |