|
|
-
Understanding the SVD recommender
Sean Owen 2010-06-03, 10:52
I'm finally looking to make my crude understanding of SVD-based recommenders more like "vague". I understand the SVD and the principle here but haven't implemented it.
I'm looking at what I believe is the original paper on it by Sarwar et al: www.grouplens.org/papers/pdf/sarwar_SVD.pdf
It's good stuff and I understood it once, but now that I dig into details since I want to implement, I can't figure out how to read the algorithm in a way that works out. I think there are some notation problems, mostly in the incremental stage.
Is anyone out there familiar enough with this to a) discuss this paper with me or b) point me to another writeup on the approach?
-
Re: Understanding the SVD recommender
Ted Dunning 2010-06-03, 15:15
Fire away.
On Thu, Jun 3, 2010 at 3:52 AM, Sean Owen <[EMAIL PROTECTED]> wrote:
> Is anyone out there familiar enough with this to a) discuss this paper > with me or b) point me to another writeup on the approach? >
-
Re: Understanding the SVD recommender
Sean Owen 2010-06-03, 16:26
Section 3 is hard to understand.
- Ak and P are defined, but not used later - Definition of P has UTk x Nu as a computation. UTk is a k x m matrix, and Nu is "t" x 1. t is not defined. - This only makes sense if t = m. But m is the number of users, and Nu is a user vector, so should have a number of elements equal to n, the number of items - Sk * VTk is described as a k x "d" matrix but d is undefined - The diagram suggests that VTk are multiplied by all the Nu, which makes more sense -- but only if Nu are multiplied by VTk, not the other way. And the diagram depicts neither of those. - Conceptually I would understand Nu x VTk, but then P is defined by an additional product with Uk
In short... what? On Thu, Jun 3, 2010 at 4:15 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > Fire away. > > On Thu, Jun 3, 2010 at 3:52 AM, Sean Owen <[EMAIL PROTECTED]> wrote: > >> Is anyone out there familiar enough with this to a) discuss this paper >> with me or b) point me to another writeup on the approach? >> >
-
Re: Understanding the SVD recommender
Ted Dunning 2010-06-04, 05:48
You are correct. The paper has an appalling treatment of the folding in approach.
In fact, the procedure is dead simple.
The basic idea is to leave the coordinate system derived in the original SVD intact and simply project the new users into that space.
The easiest way to see what is happening is to start again with the original rating matrix A as decomposed:
A = U S V'
where A is users x items. If we multiply on the right by V, we get
A V = U S V' V = U S
(because V' V = I, by definition). This result is (users x items) x (items x k) = users x k, that is, it gives a k dimensional vector for each user. Similarly, multiplication on the left by U' gives a k x items matrix which, when transposed gives a k dimensional vector for each item.
This implies that if we augment U with new user row vectors U_new, we should be able to simply compute new k-dimensional vectors for the new users and adjoin these new vectors to the previous vectors. Concisely put,
( A ) ( A V ) ( ) V = ( ) ( A_new ) ( A_new V )
This isn't really magical. It just says that we can compute new user vectors at any time by multiplying the new users' ratings by V.
The diagram in figure one is hideously confusing because it looks like a picture of some kind of multiplication whereas it is really depicting some odd kind of flow diagram.
Does this solve the problem?
On Thu, Jun 3, 2010 at 9:26 AM, Sean Owen <[EMAIL PROTECTED]> wrote:
> Section 3 is hard to understand. > > - Ak and P are defined, but not used later > - Definition of P has UTk x Nu as a computation. UTk is a k x m > matrix, and Nu is "t" x 1. t is not defined. > - This only makes sense if t = m. But m is the number of users, and Nu > is a user vector, so should have a number of elements equal to n, the > number of items > - Sk * VTk is described as a k x "d" matrix but d is undefined > - The diagram suggests that VTk are multiplied by all the Nu, which > makes more sense -- but only if Nu are multiplied by VTk, not the > other way. And the diagram depicts neither of those. > - Conceptually I would understand Nu x VTk, but then P is defined by > an additional product with Uk > > In short... what? > > > On Thu, Jun 3, 2010 at 4:15 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > > Fire away. > > > > On Thu, Jun 3, 2010 at 3:52 AM, Sean Owen <[EMAIL PROTECTED]> wrote: > > > >> Is anyone out there familiar enough with this to a) discuss this paper > >> with me or b) point me to another writeup on the approach? > >> > > >
-
Re: Understanding the SVD recommender
Jake Mannix 2010-06-04, 06:09
This reminds me: I never moved over the "folding-in" code from decomposer. Not that it's particularly complex, but it would probably be useful in "utils" at least.
-jake
On Thu, Jun 3, 2010 at 10:48 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
> You are correct. The paper has an appalling treatment of the folding in > approach. > > In fact, the procedure is dead simple. > > The basic idea is to leave the coordinate system derived in the original > SVD > intact and simply project the new users into that space. > > The easiest way to see what is happening is to start again with the > original > rating matrix A as decomposed: > > A = U S V' > > where A is users x items. If we multiply on the right by V, we get > > A V = U S V' V = U S > > (because V' V = I, by definition). This result is (users x items) x (items > x k) = users x k, that is, it gives a k dimensional vector for each user. > Similarly, multiplication on the left by U' gives a k x items matrix > which, > when transposed gives a k dimensional vector for each item. > > This implies that if we augment U with new user row vectors U_new, we > should > be able to simply compute new k-dimensional vectors for the new users and > adjoin these new vectors to the previous vectors. Concisely put, > > ( A ) ( A V ) > ( ) V = ( ) > ( A_new ) ( A_new V ) > > This isn't really magical. It just says that we can compute new user > vectors at any time by multiplying the new users' ratings by V. > > The diagram in figure one is hideously confusing because it looks like a > picture of some kind of multiplication whereas it is really depicting some > odd kind of flow diagram. > > Does this solve the problem? > > On Thu, Jun 3, 2010 at 9:26 AM, Sean Owen <[EMAIL PROTECTED]> wrote: > > > Section 3 is hard to understand. > > > > - Ak and P are defined, but not used later > > - Definition of P has UTk x Nu as a computation. UTk is a k x m > > matrix, and Nu is "t" x 1. t is not defined. > > - This only makes sense if t = m. But m is the number of users, and Nu > > is a user vector, so should have a number of elements equal to n, the > > number of items > > - Sk * VTk is described as a k x "d" matrix but d is undefined > > - The diagram suggests that VTk are multiplied by all the Nu, which > > makes more sense -- but only if Nu are multiplied by VTk, not the > > other way. And the diagram depicts neither of those. > > - Conceptually I would understand Nu x VTk, but then P is defined by > > an additional product with Uk > > > > In short... what? > > > > > > On Thu, Jun 3, 2010 at 4:15 PM, Ted Dunning <[EMAIL PROTECTED]> > wrote: > > > Fire away. > > > > > > On Thu, Jun 3, 2010 at 3:52 AM, Sean Owen <[EMAIL PROTECTED]> wrote: > > > > > >> Is anyone out there familiar enough with this to a) discuss this paper > > >> with me or b) point me to another writeup on the approach? > > >> > > > > > >
-
Re: Understanding the SVD recommender
Sean Owen 2010-06-04, 07:38
Yes thanks a lot. Makes sense to me: we're just changing basis and V is the change-of-basis transformation. Glad to see that is all there is to it; not sure what the rest is about. I had thought of U S as "user preferences for features" and V as "expression of features in items". The paper breaks S in half by taking the square root S = B* B and putting B* with U and B with V.
But am I right that both are equivalent? Because I'd rather think of maintaining and updating U S. Because conceptually S is just full of multipliers -- making users 3x more keen on feature 1 is the same as reducing the item express 3x less of feature 1. Certainly in the recommendation computation they show, which makes sense, it doesn't matter since the dot product is the same. They also add on the "row" average to make a prediction, which is the average rating by the user, I'm guessing -- "row" is a row of A? it's not specified. It doesn't affect the ordering of recommendations for a user. Just working backwards, I'd assume this is because the generated predictions are otherwise "centered" in the sense that 0 will be predicted for an item that the user might be neutral on. But I guess I hadn't seen the intuitive reason this is the result. Is there any easy way to see it? On Fri, Jun 4, 2010 at 6:48 AM, Ted Dunning <[EMAIL PROTECTED]> wrote: > You are correct. The paper has an appalling treatment of the folding in > approach. > > In fact, the procedure is dead simple. > > The basic idea is to leave the coordinate system derived in the original SVD > intact and simply project the new users into that space. > > The easiest way to see what is happening is to start again with the original > rating matrix A as decomposed: > > A = U S V' > > where A is users x items. If we multiply on the right by V, we get > > A V = U S V' V = U S > > (because V' V = I, by definition). This result is (users x items) x (items > x k) = users x k, that is, it gives a k dimensional vector for each user. > Similarly, multiplication on the left by U' gives a k x items matrix which, > when transposed gives a k dimensional vector for each item. > > This implies that if we augment U with new user row vectors U_new, we should > be able to simply compute new k-dimensional vectors for the new users and > adjoin these new vectors to the previous vectors. Concisely put, > > ( A ) ( A V ) > ( ) V = ( ) > ( A_new ) ( A_new V ) > > This isn't really magical. It just says that we can compute new user > vectors at any time by multiplying the new users' ratings by V. > > The diagram in figure one is hideously confusing because it looks like a > picture of some kind of multiplication whereas it is really depicting some > odd kind of flow diagram. > > Does this solve the problem? > > On Thu, Jun 3, 2010 at 9:26 AM, Sean Owen <[EMAIL PROTECTED]> wrote:
-
Re: Understanding the SVD recommender
Ted Dunning 2010-06-04, 07:53
On Fri, Jun 4, 2010 at 12:38 AM, Sean Owen <[EMAIL PROTECTED]> wrote: > Yes thanks a lot. Makes sense to me: we're just changing basis and V > is the change-of-basis transformation. Glad to see that is all there > is to it; not sure what the rest is about. > Exactly. > I had thought of U S as "user preferences for features" and V as > "expression of features in items". The paper breaks S in half by > taking the square root S = B* B and putting B* with U and B with V. > This is pretty common usage actually. It allows some degree of normalization of the vectors, but isn't strictly necessary. Note that since this is an SVD, S is diagonal and all elements are real (and positive, actually). Thus B* = B. > > But am I right that both are equivalent? Ish. But not customary. The old LSI article makes a better case than I can off the cuff just before bed. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.49.7546&rep=rep1&type=pdf> Because I'd rather think of > maintaining and updating U S. Because conceptually S is just full of > multipliers -- making users 3x more keen on feature 1 is the same as > reducing the item express 3x less of feature 1. Certainly in the > recommendation computation they show, which makes sense, it doesn't > matter since the dot product is the same. > Actually the elements of S aren't item or user specific. Remember there are only k non-zeros there. They represent the strength of each of the singular vectors. A very useful way to look at it is to consider the SVD as a sum of rank-1 outer products u_i * v'_i. These rank-1 products are summed with weights s_i. This way of looking at matters makes a number of lemmas about SVD's pretty trivial. > They also add on the "row" average to make a prediction, which is the > average rating by the user, I'm guessing -- "row" is a row of A? I would guess so, but that would only make sense if they subtracted it ahead of time. In general, I don't see the point for that. I would rather cosine normalize each user row. > Just working backwards, I'd assume this is because the generated > predictions are otherwise "centered" in the sense that 0 will be > predicted for an item that the user might be neutral on. But I guess I > hadn't seen the intuitive reason this is the result. Is there any easy > way to see it? > For a new user with no history, h = 0 so the corresponding k-dimensional representation of this user will be s^-(1/2) h v = 0. The dot product with any item vector will be identically 0. I don't know that it would make any useful difference, but it would make me happier to reduce the ratings to binary, normalize rows and then decompose. In general, I think that the Belkor team had a much better approach with SVD++ and their time dynamics trick. That is much the same as mean removal. > > > On Fri, Jun 4, 2010 at 6:48 AM, Ted Dunning <[EMAIL PROTECTED]> wrote: > > You are correct. The paper has an appalling treatment of the folding in > > approach. > > > > In fact, the procedure is dead simple. > > > > The basic idea is to leave the coordinate system derived in the original > SVD > > intact and simply project the new users into that space. > > > > The easiest way to see what is happening is to start again with the > original > > rating matrix A as decomposed: > > > > A = U S V' > > > > where A is users x items. If we multiply on the right by V, we get > > > > A V = U S V' V = U S > > > > (because V' V = I, by definition). This result is (users x items) x > (items > > x k) = users x k, that is, it gives a k dimensional vector for each user. > > Similarly, multiplication on the left by U' gives a k x items matrix > which, > > when transposed gives a k dimensional vector for each item. > > > > This implies that if we augment U with new user row vectors U_new, we > should > > be able to simply compute new k-dimensional vectors for the new users and > > adjoin these new vectors to the previous vectors. Concisely put,
-
Re: Understanding the SVD recommender
Sean Owen 2010-06-04, 08:34
On Fri, Jun 4, 2010 at 8:53 AM, Ted Dunning <[EMAIL PROTECTED]> wrote:on of the vectors, but isn't strictly necessary. > > Note that since this is an SVD, S is diagonal and all elements are real (and > positive, actually). Thus B* = B.
(Yeah either way it's just dotting stuff with that "diagonal vector", or really the square roots of its values.) > But not customary. The old LSI article makes a better case than I can off > the cuff just before bed.
OK sounds like it's best to maintain U sqrt(S) and sqrt(S) V if only for convention. > Actually the elements of S aren't item or user specific. Remember there are > only k non-zeros there.
Yeah I meant they're like multipliers for each "feature" (probably not the right word), establishing some relation between pseudo-users's preferences for features and pseudo item's expression of features.
> I would guess so, but that would only make sense if they subtracted it ahead > of time. In general, I don't see the point for that. I would rather cosine > normalize each user row.
Yeah sounds good. I wouldn't add this step to start.
-
Re: Understanding the SVD recommender
Ted Dunning 2010-06-04, 17:46
On Fri, Jun 4, 2010 at 1:34 AM, Sean Owen <[EMAIL PROTECTED]> wrote:
> > I would guess so, but that would only make sense if they subtracted it > ahead > > of time. In general, I don't see the point for that. I would rather > cosine > > normalize each user row. > > Yeah sounds good. I wouldn't add this step to start My quick guess is that normalizing decreases the condition number of the matrix which makes the numerics more stable so you get a better estimate of the singular vectors that you really care about because they aren't shadowed so excessively by the ones associated with the largest singular vectors.
The condition number is, among other ways, defined by the ratio of the largest to smallest eigenvalues. Looking at the outer product form of SVD, you can easily see how if the first few singular values total dominate the others that finding the residue represented by the others would be difficult. IDF weighting should have similar effects.
-
Re: Understanding the SVD recommender
Richard Simon Just 2010-06-09, 18:19
On 04/06/10 08:53, Ted Dunning wrote: > On Fri, Jun 4, 2010 at 12:38 AM, Sean Owen<[EMAIL PROTECTED]> wrote: > > >> They also add on the "row" average to make a prediction, which is the >> average rating by the user, I'm guessing -- "row" is a row of A? >> > > I would guess so, but that would only make sense if they subtracted it ahead > of time. In general, I don't see the point for that. I would rather cosine > normalize each user row. > > > I don't know enough yet to comment on what works best, but I can give some evidence that they do subtract teh row average ahead of time. Sarwar's previous work, Application of Dimensionality Reduction piece ( http://www.grouplens.org/papers/pdf/webKDD00.pdf) uses the same prediction function. In section 4.3.1 Prediction Experiment they discuss the removal of the row average before the SVD computation and it's later addition for the prediction. I'd make the assumption that the incremental SVD paper builds on this. - Richard
-
Re: Understanding the SVD recommender
Jake Mannix 2010-06-09, 18:28
On Wed, Jun 9, 2010 at 11:19 AM, Richard Simon Just < [EMAIL PROTECTED]> wrote: > > I don't know enough yet to comment on what works best, but I can give some > evidence that they do subtract teh row average ahead of time. Sarwar's > previous work, Application of Dimensionality Reduction piece ( > http://www.grouplens.org/papers/pdf/webKDD00.pdf) uses the same prediction > function. In section 4.3.1 Prediction Experiment they discuss the removal of > the row average before the SVD computation and it's later addition for the > prediction. I'd make the assumption that the incremental SVD paper builds on > this. > > - Richard > I would be *very* careful on how you decompose a sparse matrix which you center: if you naively just subtract off the mean from all the entries in the vectors, an SVD which would have taken 6 hours to compute could suddenly take weeks, literally. But if you do the second-from-most-naive thing, and subtract the means from only the nonzero entries, then all can turn out for the best. This is just following Sean's typical advice of "don't treat unknown preferences as '0.0' ". -jake
-
Re: Understanding the SVD recommender
Richard Simon Just 2010-06-09, 19:08
On 09/06/10 19:28, Jake Mannix wrote: > On Wed, Jun 9, 2010 at 11:19 AM, Richard Simon Just< > [EMAIL PROTECTED]> wrote: > > >> I don't know enough yet to comment on what works best, but I can give some >> evidence that they do subtract teh row average ahead of time. Sarwar's >> previous work, Application of Dimensionality Reduction piece ( >> http://www.grouplens.org/papers/pdf/webKDD00.pdf) uses the same prediction >> function. In section 4.3.1 Prediction Experiment they discuss the removal of >> the row average before the SVD computation and it's later addition for the >> prediction. I'd make the assumption that the incremental SVD paper builds on >> this. >> >> - Richard >> >> > I would be *very* careful on how you decompose a sparse matrix which you > center: if you naively just subtract off the mean from all the entries in > the vectors, an SVD which would have taken 6 hours to compute could suddenly > take weeks, literally. But if you do the second-from-most-naive thing, and > subtract the means from only the nonzero entries, then all can turn out for > the best. This is just following Sean's typical advice of "don't treat > unknown preferences as '0.0' ". > > -jake > > Agreed. Just wanted to answer the question that had been left hanging for why Sarwar add the row average back. In fact to be complete, before the SVD they fill each null value with 'column average - row average'. But yeah, that would make for a much bigger computation. -Richard
-
Re: Understanding the SVD recommender
Jake Mannix 2010-06-10, 07:11
On Wed, Jun 9, 2010 at 12:08 PM, Richard Simon Just < [EMAIL PROTECTED]> wrote: > > Agreed. Just wanted to answer the question that had been left hanging for > why Sarwar add the row average back. In fact to be complete, before the SVD > they fill each null value with 'column average - row average'. But yeah, > that would make for a much bigger computation. >
I should note here, that if you really want to see the difference between the fully centered SVD and the one where you only shift the nonzero entries, the former is not *necessarily* a dense and therefore-out-of-reach computation:
The meat of the Lanczos algorithm is repeatedly computing A . v, where A is your big big matrix, and v is a (dense) vector which is getting closer and closer to the biggest singular. If you wanted to compute the singular vectors of (A - B), where B_ij = r_i - c_j = (sum_{k=1..N} A_ik) - (sum_{k=1..M} A_kj) is the matrix of row averages minus column averages, then you can just do it in a couple of (still sparse) steps:
[(A - B) . v]_i = [A . v]_i - [B . v]_i = [A . v]_i - (sum_{k=1..N,j=1..M}A_ij v_k) + (sum_{k=1...M,j=1...M}A_kj v_j) = [A . v]_i - r_i * (sum_{k} v_k) + (sum_{k} (A . v)_k)
The second term is just the row-average of row-i times v.zSum(), and the last term is a constant for all values in the output vector, and is just (A . v).zSum(). So no additional map-reduces are needed to incorporate this - it's just that the Lanczos solver will need to be told about the row and column averages of the input and optionally at each Lanczos iteration, modify the resultant vector as above.
See if it makes a difference!
-jake
-
Re: Understanding the SVD recommender
Jake Mannix 2010-06-10, 07:18
On Thu, Jun 10, 2010 at 12:11 AM, Jake Mannix <[EMAIL PROTECTED]> wrote:
> > The second term is just the row-average of row-i times v.zSum(), and the > last term is a constant for all values in the output vector, and is just (A > . v).zSum(). > So no additional map-reduces are needed to incorporate this - it's just > that the Lanczos solver will need to be told about the row and column > averages of the > input and optionally at each Lanczos iteration, modify the resultant vector > as above. >
I should caution here: the simple form of my modification only would work in exactly that way if the input matrix is symmetric. When it's not symmetric, the inner loop of Lanczos isn't A . v, but A.timesSquared(v) = (A' . A).v, and so if you shift A, you will need to be slightly more clever about how you peel out the matrix of mean values. It can certainly be done sparsely still (at worst, compute (A - B).v, then (A - B)'.v !), but I'm not sure if you can do it in the k mapreduce steps that you can do it in the uncentered form (it might take the 2k that the previous parenthetical remark implies).
-jake
-
Re: Understanding the SVD recommender
Sean Owen 2011-11-06, 17:52
Following up on this very old thread.
I understood all this bit, thanks, that greatly clarified.
You multiply a new user vector by V to project it into the new "pseudo-item", reduced-dimension space. And to get back to real items, you multiply by V's inverse, which is its transpose. And so you are really multiplying the user vector by V VT, which is not a no-op, since those are truncated matrices and aren't actually exact inverses (?)
The original paper talks about cosine similarities between users or items in the reduced-dimension space, but, can anyone shine light on the point of that? From the paper also, it seems like they say the predictions are just computed as vector products as above. Finally, separately, I'm trying to understand the Lanczos method as part of computing an SVD. Lanczos operates on a real symmetric matrix right? And am I right that it comes into play when you are computing and SVD...
A = U * S * VT
... because U is actually the eigenvectors of (symmetric) A*AT and V is the eigenvectors of AT*A? And so Lanczos is used to answer those questions to complete the SVD? On Fri, Jun 4, 2010 at 6:48 AM, Ted Dunning <[EMAIL PROTECTED]> wrote: > You are correct. The paper has an appalling treatment of the folding in > approach. > > In fact, the procedure is dead simple. > > The basic idea is to leave the coordinate system derived in the original SVD > intact and simply project the new users into that space. > > The easiest way to see what is happening is to start again with the original > rating matrix A as decomposed: > > A = U S V' > > where A is users x items. If we multiply on the right by V, we get > > A V = U S V' V = U S > > (because V' V = I, by definition). This result is (users x items) x (items > x k) = users x k, that is, it gives a k dimensional vector for each user. > Similarly, multiplication on the left by U' gives a k x items matrix which, > when transposed gives a k dimensional vector for each item. > > This implies that if we augment U with new user row vectors U_new, we should > be able to simply compute new k-dimensional vectors for the new users and > adjoin these new vectors to the previous vectors. Concisely put, > > ( A ) ( A V ) > ( ) V = ( ) > ( A_new ) ( A_new V ) > > This isn't really magical. It just says that we can compute new user > vectors at any time by multiplying the new users' ratings by V. > > The diagram in figure one is hideously confusing because it looks like a > picture of some kind of multiplication whereas it is really depicting some > odd kind of flow diagram. > > Does this solve the problem? > > On Thu, Jun 3, 2010 at 9:26 AM, Sean Owen <[EMAIL PROTECTED]> wrote: > >> Section 3 is hard to understand. >> >> - Ak and P are defined, but not used later >> - Definition of P has UTk x Nu as a computation. UTk is a k x m >> matrix, and Nu is "t" x 1. t is not defined. >> - This only makes sense if t = m. But m is the number of users, and Nu >> is a user vector, so should have a number of elements equal to n, the >> number of items >> - Sk * VTk is described as a k x "d" matrix but d is undefined >> - The diagram suggests that VTk are multiplied by all the Nu, which >> makes more sense -- but only if Nu are multiplied by VTk, not the >> other way. And the diagram depicts neither of those. >> - Conceptually I would understand Nu x VTk, but then P is defined by >> an additional product with Uk >> >> In short... what? >> >> >> On Thu, Jun 3, 2010 at 4:15 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: >> > Fire away. >> > >> > On Thu, Jun 3, 2010 at 3:52 AM, Sean Owen <[EMAIL PROTECTED]> wrote: >> > >> >> Is anyone out there familiar enough with this to a) discuss this paper >> >> with me or b) point me to another writeup on the approach? >> >> >> > >> >
-
Re: Understanding the SVD recommender
Jake Mannix 2011-11-06, 18:08
Re: Lanczos, yes, it operates by finding V as you describe. The user is required to do more work to recapture U. Practical reason is that the assumption is numCols(A) = numFeatures which is much less than numRows(A) numTrainingSamples
On Nov 6, 2011 9:52 AM, "Sean Owen" <[EMAIL PROTECTED]> wrote:
Following up on this very old thread.
I understood all this bit, thanks, that greatly clarified.
You multiply a new user vector by V to project it into the new "pseudo-item", reduced-dimension space. And to get back to real items, you multiply by V's inverse, which is its transpose. And so you are really multiplying the user vector by V VT, which is not a no-op, since those are truncated matrices and aren't actually exact inverses (?)
The original paper talks about cosine similarities between users or items in the reduced-dimension space, but, can anyone shine light on the point of that? From the paper also, it seems like they say the predictions are just computed as vector products as above. Finally, separately, I'm trying to understand the Lanczos method as part of computing an SVD. Lanczos operates on a real symmetric matrix right? And am I right that it comes into play when you are computing and SVD...
A = U * S * VT
... because U is actually the eigenvectors of (symmetric) A*AT and V is the eigenvectors of AT*A? And so Lanczos is used to answer those questions to complete the SVD?
On Fri, Jun 4, 2010 at 6:48 AM, Ted Dunning <[EMAIL PROTECTED]> wrote: > You are correct. The...
|
|