|
Jake Mannix
2011-11-06, 18:08
Sebastian Schelter
2011-11-06, 19:25
Sean Owen
2011-11-06, 23:17
Ted Dunning
2011-11-06, 23:26
Sean Owen
2011-11-06, 23:30
Danny Bickson
2011-11-06, 23:35
Sean Owen
2011-11-06, 23:42
Danny Bickson
2011-11-06, 23:46
Ted Dunning
2011-11-06, 23:57
Lance Norskog
2011-11-06, 23:59
Ted Dunning
2011-11-07, 00:17
Jake Mannix
2011-11-07, 00:19
Ted Dunning
2011-11-07, 00:27
Danny Bickson
2011-11-07, 00:29
Ted Dunning
2011-11-07, 00:33
Lance Norskog
2011-11-07, 00:55
Sean Owen
2011-11-17, 13:26
Jake Mannix
2011-11-17, 15:21
Sean Owen
2011-11-17, 17:12
Ted Dunning
2011-11-17, 17:30
Ted Dunning
2011-11-17, 17:32
Dmitriy Lyubimov
2011-11-17, 19:30
Dmitriy Lyubimov
2011-11-17, 19:44
Ted Dunning
2011-11-17, 20:15
Dmitriy Lyubimov
2011-11-17, 20:33
Sebastian Schelter
2011-11-17, 20:38
Sebastian Schelter
2011-11-17, 20:51
Dmitriy Lyubimov
2011-11-17, 20:57
Ted Dunning
2011-11-17, 21:36
Dmitriy Lyubimov
2011-11-29, 01:04
Dmitriy Lyubimov
2011-11-29, 01:04
|
-
Re: Understanding the SVD recommenderJake Mannix 2011-11-06, 18:08
Re: Lanczos, yes, it operates by finding V as you describe. The user is
required to do more work to recapture U. Practical reason is that the assumption is numCols(A) = numFeatures which is much less than numRows(A) numTrainingSamples On Nov 6, 2011 9:52 AM, "Sean Owen" <[EMAIL PROTECTED]> wrote: Following up on this very old thread. I understood all this bit, thanks, that greatly clarified. You multiply a new user vector by V to project it into the new "pseudo-item", reduced-dimension space. And to get back to real items, you multiply by V's inverse, which is its transpose. And so you are really multiplying the user vector by V VT, which is not a no-op, since those are truncated matrices and aren't actually exact inverses (?) The original paper talks about cosine similarities between users or items in the reduced-dimension space, but, can anyone shine light on the point of that? From the paper also, it seems like they say the predictions are just computed as vector products as above. Finally, separately, I'm trying to understand the Lanczos method as part of computing an SVD. Lanczos operates on a real symmetric matrix right? And am I right that it comes into play when you are computing and SVD... A = U * S * VT ... because U is actually the eigenvectors of (symmetric) A*AT and V is the eigenvectors of AT*A? And so Lanczos is used to answer those questions to complete the SVD? On Fri, Jun 4, 2010 at 6:48 AM, Ted Dunning <[EMAIL PROTECTED]> wrote: > You are correct. The...
-
Re: Understanding the SVD recommenderSebastian Schelter 2011-11-06, 19:25
On 06.11.2011 18:52, Sean Owen wrote:
> Following up on this very old thread. > > I understood all this bit, thanks, that greatly clarified. > > You multiply a new user vector by V to project it into the new > "pseudo-item", reduced-dimension space. > And to get back to real items, you multiply by V's inverse, which is > its transpose. > And so you are really multiplying the user vector by V VT, which is > not a no-op, since those are truncated matrices and aren't actually > exact inverses (?) > > The original paper talks about cosine similarities between users or > items in the reduced-dimension space, but, can anyone shine light on > the point of that? I think this interpretation is coming from LSI where you project a query onto the document feature space and use the cosine to find the most similar documents (which can be done by simple matrix vector mulitplication as the singular vectors are orthonormal and computing dot products with the projected query is therefore equal to computing cosines). Something similar could be done to find most similar items or users, the bad thing is that AFAIK you have to look at every other user or item as U and V are dense. Maybe this paper can help to shine light on that: "Using Linear Algebra for Intelligent Information Retrieval" http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.33.3138&rep=rep1&type=pdf >From the paper also, it seems like they say the > predictions are just computed as vector products as above. > > > Finally, separately, I'm trying to understand the Lanczos method as > part of computing an SVD. Lanczos operates on a real symmetric matrix > right? And am I right that it comes into play when you are computing > and SVD... > > A = U * S * VT > > ... because U is actually the eigenvectors of (symmetric) A*AT and V > is the eigenvectors of AT*A? And so Lanczos is used to answer those > questions to complete the SVD > > On Fri, Jun 4, 2010 at 6:48 AM, Ted Dunning <[EMAIL PROTECTED]> wrote: >> You are correct. The paper has an appalling treatment of the folding in >> approach. >> >> In fact, the procedure is dead simple. >> >> The basic idea is to leave the coordinate system derived in the original SVD >> intact and simply project the new users into that space. >> >> The easiest way to see what is happening is to start again with the original >> rating matrix A as decomposed: >> >> A = U S V' >> >> where A is users x items. If we multiply on the right by V, we get >> >> A V = U S V' V = U S >> >> (because V' V = I, by definition). This result is (users x items) x (items >> x k) = users x k, that is, it gives a k dimensional vector for each user. >> Similarly, multiplication on the left by U' gives a k x items matrix which, >> when transposed gives a k dimensional vector for each item. >> >> This implies that if we augment U with new user row vectors U_new, we should >> be able to simply compute new k-dimensional vectors for the new users and >> adjoin these new vectors to the previous vectors. Concisely put, >> >> ( A ) ( A V ) >> ( ) V = ( ) >> ( A_new ) ( A_new V ) >> >> This isn't really magical. It just says that we can compute new user >> vectors at any time by multiplying the new users' ratings by V. >> >> The diagram in figure one is hideously confusing because it looks like a >> picture of some kind of multiplication whereas it is really depicting some >> odd kind of flow diagram. >> >> Does this solve the problem? >> >> On Thu, Jun 3, 2010 at 9:26 AM, Sean Owen <[EMAIL PROTECTED]> wrote: >> >>> Section 3 is hard to understand. >>> >>> - Ak and P are defined, but not used later >>> - Definition of P has UTk x Nu as a computation. UTk is a k x m >>> matrix, and Nu is "t" x 1. t is not defined. >>> - This only makes sense if t = m. But m is the number of users, and Nu >>> is a user vector, so should have a number of elements equal to n, the >>> number of items >>> - Sk * VTk is described as a k x "d" matrix but d is undefined
-
Re: Understanding the SVD recommenderSean Owen 2011-11-06, 23:17
Yeah OK, that makes sense. That's a pretty helpful paper.
I can get through the Lanczos algorithm without much trouble, I think. The piece I'm looking for pointers on at the moment is the eigen decomposition of the tridiagonal matrix. Jake am I right that this is done in memory right now? I am sure your previous comment actually answers this next question, but I haven't connected the dots yet. If we're getting eigenvectors of AT * A, and that matrix is huge, then isn't the tridiagonal matrix still huge -- albeit it has only 3m entries, not m^2. It seems like the code just calls to EigenDecomposer which treats it as a dense m x m matrix. (Is it worth me figuring out the QR decomposition enough to build a distributed version of it?) On Sun, Nov 6, 2011 at 7:25 PM, Sebastian Schelter <[EMAIL PROTECTED]> wrote: > On 06.11.2011 18:52, Sean Owen wrote: >> Following up on this very old thread. >> >> I understood all this bit, thanks, that greatly clarified. >> >> You multiply a new user vector by V to project it into the new >> "pseudo-item", reduced-dimension space. >> And to get back to real items, you multiply by V's inverse, which is >> its transpose. >> And so you are really multiplying the user vector by V VT, which is >> not a no-op, since those are truncated matrices and aren't actually >> exact inverses (?) >> >> The original paper talks about cosine similarities between users or >> items in the reduced-dimension space, but, can anyone shine light on >> the point of that? > > I think this interpretation is coming from LSI where you project a query > onto the document feature space and use the cosine to find the most > similar documents (which can be done by simple matrix vector > mulitplication as the singular vectors are orthonormal and computing dot > products with the projected query is therefore equal to computing cosines). > > Something similar could be done to find most similar items or users, the > bad thing is that AFAIK you have to look at every other user or item as > U and V are dense. > > Maybe this paper can help to shine light on that: > > "Using Linear Algebra for Intelligent Information Retrieval" > http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.33.3138&rep=rep1&type=pdf > > From the paper also, it seems like they say the >> predictions are just computed as vector products as above. >> >> >> Finally, separately, I'm trying to understand the Lanczos method as >> part of computing an SVD. Lanczos operates on a real symmetric matrix >> right? And am I right that it comes into play when you are computing >> and SVD... >> >> A = U * S * VT >> >> ... because U is actually the eigenvectors of (symmetric) A*AT and V >> is the eigenvectors of AT*A? And so Lanczos is used to answer those >> questions to complete the SVD >> >> On Fri, Jun 4, 2010 at 6:48 AM, Ted Dunning <[EMAIL PROTECTED]> wrote: >>> You are correct. The paper has an appalling treatment of the folding in >>> approach. >>> >>> In fact, the procedure is dead simple. >>> >>> The basic idea is to leave the coordinate system derived in the original SVD >>> intact and simply project the new users into that space. >>> >>> The easiest way to see what is happening is to start again with the original >>> rating matrix A as decomposed: >>> >>> A = U S V' >>> >>> where A is users x items. If we multiply on the right by V, we get >>> >>> A V = U S V' V = U S >>> >>> (because V' V = I, by definition). This result is (users x items) x (items >>> x k) = users x k, that is, it gives a k dimensional vector for each user. >>> Similarly, multiplication on the left by U' gives a k x items matrix which, >>> when transposed gives a k dimensional vector for each item. >>> >>> This implies that if we augment U with new user row vectors U_new, we should >>> be able to simply compute new k-dimensional vectors for the new users and >>> adjoin these new vectors to the previous vectors. Concisely put, >>> >>> ( A ) ( A V ) >>> ( ) V = ( )
-
Re: Understanding the SVD recommenderTed Dunning 2011-11-06, 23:26
The tridiagonal is much smaller than you would need if you wanted all the
eigenvalues. Since you only want a small number, you only have a tri-diagonal matrix that is some multiple of that size. In-memory decomposition makes total sense. Regarding QR decomposition, Dmitriy has already built a parallel version. A large QR is required at one point in the SSVD algorithm. I have shown a way to avoid this large QR using a much smaller Cholesky decomposition, but it isn't entirely clear that this is a net win. It is a huge win with the sequential version. On Sun, Nov 6, 2011 at 3:17 PM, Sean Owen <[EMAIL PROTECTED]> wrote: > Yeah OK, that makes sense. That's a pretty helpful paper. > > I can get through the Lanczos algorithm without much trouble, I think. > The piece I'm looking for pointers on at the moment is the eigen > decomposition of the tridiagonal matrix. > > Jake am I right that this is done in memory right now? I am sure your > previous comment actually answers this next question, but I haven't > connected the dots yet. If we're getting eigenvectors of AT * A, and > that matrix is huge, then isn't the tridiagonal matrix still huge -- > albeit it has only 3m entries, not m^2. It seems like the code just > calls to EigenDecomposer which treats it as a dense m x m matrix. > > (Is it worth me figuring out the QR decomposition enough to build a > distributed version of it?) > > On Sun, Nov 6, 2011 at 7:25 PM, Sebastian Schelter <[EMAIL PROTECTED]> wrote: > > On 06.11.2011 18:52, Sean Owen wrote: > >> Following up on this very old thread. > >> > >> I understood all this bit, thanks, that greatly clarified. > >> > >> You multiply a new user vector by V to project it into the new > >> "pseudo-item", reduced-dimension space. > >> And to get back to real items, you multiply by V's inverse, which is > >> its transpose. > >> And so you are really multiplying the user vector by V VT, which is > >> not a no-op, since those are truncated matrices and aren't actually > >> exact inverses (?) > >> > >> The original paper talks about cosine similarities between users or > >> items in the reduced-dimension space, but, can anyone shine light on > >> the point of that? > > > > I think this interpretation is coming from LSI where you project a query > > onto the document feature space and use the cosine to find the most > > similar documents (which can be done by simple matrix vector > > mulitplication as the singular vectors are orthonormal and computing dot > > products with the projected query is therefore equal to computing > cosines). > > > > Something similar could be done to find most similar items or users, the > > bad thing is that AFAIK you have to look at every other user or item as > > U and V are dense. > > > > Maybe this paper can help to shine light on that: > > > > "Using Linear Algebra for Intelligent Information Retrieval" > > > http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.33.3138&rep=rep1&type=pdf > > > > From the paper also, it seems like they say the > >> predictions are just computed as vector products as above. > >> > >> > >> Finally, separately, I'm trying to understand the Lanczos method as > >> part of computing an SVD. Lanczos operates on a real symmetric matrix > >> right? And am I right that it comes into play when you are computing > >> and SVD... > >> > >> A = U * S * VT > >> > >> ... because U is actually the eigenvectors of (symmetric) A*AT and V > >> is the eigenvectors of AT*A? And so Lanczos is used to answer those > >> questions to complete the SVD > >> > >> On Fri, Jun 4, 2010 at 6:48 AM, Ted Dunning <[EMAIL PROTECTED]> > wrote: > >>> You are correct. The paper has an appalling treatment of the folding > in > >>> approach. > >>> > >>> In fact, the procedure is dead simple. > >>> > >>> The basic idea is to leave the coordinate system derived in the > original SVD > >>> intact and simply project the new users into that space. > >>> > >>> The easiest way to see what is happening is to start again with the
-
Re: Understanding the SVD recommenderSean Owen 2011-11-06, 23:30
Oh do you only need the top k x k bit of the tridiagonal to find the
top k eigenvalues? I really don't want to write a QR decomposer, phew. On Sun, Nov 6, 2011 at 11:26 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > The tridiagonal is much smaller than you would need if you wanted all the > eigenvalues. Since you only want a small number, you only have a > tri-diagonal matrix that is some multiple of that size. In-memory > decomposition makes total sense. >
-
Re: Understanding the SVD recommenderDanny Bickson 2011-11-06, 23:35
To be exact, the size of the tridiagonal matrix
is number of iterations + 1. See description of the matrix T_mm in Wikipedia: http://en.wikipedia.org/wiki/Lanczos_algorithm Best, DB On Sun, Nov 6, 2011 at 6:30 PM, Sean Owen <[EMAIL PROTECTED]> wrote: > Oh do you only need the top k x k bit of the tridiagonal to find the > top k eigenvalues? > > I really don't want to write a QR decomposer, phew. > > On Sun, Nov 6, 2011 at 11:26 PM, Ted Dunning <[EMAIL PROTECTED]> > wrote: > > The tridiagonal is much smaller than you would need if you wanted all the > > eigenvalues. Since you only want a small number, you only have a > > tri-diagonal matrix that is some multiple of that size. In-memory > > decomposition makes total sense. > > >
-
Re: Understanding the SVD recommenderSean Owen 2011-11-06, 23:42
True though I thought it was intended to run through all m steps - is it
simply that you stop after some number k and that still leads you to compute the k largest eigenvectors of the original? On Nov 6, 2011 11:36 PM, "Danny Bickson" <[EMAIL PROTECTED]> wrote: > To be exact, the size of the tridiagonal matrix > is number of iterations + 1. See description of the matrix T_mm > in Wikipedia: http://en.wikipedia.org/wiki/Lanczos_algorithm > > Best, > > DB > > On Sun, Nov 6, 2011 at 6:30 PM, Sean Owen <[EMAIL PROTECTED]> wrote: > > > Oh do you only need the top k x k bit of the tridiagonal to find the > > top k eigenvalues? > > > > I really don't want to write a QR decomposer, phew. > > > > On Sun, Nov 6, 2011 at 11:26 PM, Ted Dunning <[EMAIL PROTECTED]> > > wrote: > > > The tridiagonal is much smaller than you would need if you wanted all > the > > > eigenvalues. Since you only want a small number, you only have a > > > tri-diagonal matrix that is some multiple of that size. In-memory > > > decomposition makes total sense. > > > > > >
-
Re: Understanding the SVD recommenderDanny Bickson 2011-11-06, 23:46
Exactly, in each iteration you extract one more eigenvalue from largest to
smallest. If you run more iterations then the matrix rank then you get again all eigenvalues. In that case you may encounter each eigenvalue a few times. On Sun, Nov 6, 2011 at 6:42 PM, Sean Owen <[EMAIL PROTECTED]> wrote: > True though I thought it was intended to run through all m steps - is it > simply that you stop after some number k and that still leads you to > compute the k largest eigenvectors of the original? > On Nov 6, 2011 11:36 PM, "Danny Bickson" <[EMAIL PROTECTED]> wrote: > > > To be exact, the size of the tridiagonal matrix > > is number of iterations + 1. See description of the matrix T_mm > > in Wikipedia: http://en.wikipedia.org/wiki/Lanczos_algorithm > > > > Best, > > > > DB > > > > On Sun, Nov 6, 2011 at 6:30 PM, Sean Owen <[EMAIL PROTECTED]> wrote: > > > > > Oh do you only need the top k x k bit of the tridiagonal to find the > > > top k eigenvalues? > > > > > > I really don't want to write a QR decomposer, phew. > > > > > > On Sun, Nov 6, 2011 at 11:26 PM, Ted Dunning <[EMAIL PROTECTED]> > > > wrote: > > > > The tridiagonal is much smaller than you would need if you wanted all > > the > > > > eigenvalues. Since you only want a small number, you only have a > > > > tri-diagonal matrix that is some multiple of that size. In-memory > > > > decomposition makes total sense. > > > > > > > > > >
-
Re: Understanding the SVD recommenderTed Dunning 2011-11-06, 23:57
Almost exactly. If you stop at k iterations, you don't get the top k
eigenvalues precisely. You need some extra slop to make sure your top eigenvalues are accurate. On Sun, Nov 6, 2011 at 3:46 PM, Danny Bickson <[EMAIL PROTECTED]>wrote: > Exactly, in each iteration you extract one more eigenvalue from largest to > smallest. If you run more iterations then the matrix rank then you get > again all eigenvalues. In that case you may encounter each eigenvalue a few > times. > > On Sun, Nov 6, 2011 at 6:42 PM, Sean Owen <[EMAIL PROTECTED]> wrote: > > > True though I thought it was intended to run through all m steps - is it > > simply that you stop after some number k and that still leads you to > > compute the k largest eigenvectors of the original? > > On Nov 6, 2011 11:36 PM, "Danny Bickson" <[EMAIL PROTECTED]> > wrote: > > > > > To be exact, the size of the tridiagonal matrix > > > is number of iterations + 1. See description of the matrix T_mm > > > in Wikipedia: http://en.wikipedia.org/wiki/Lanczos_algorithm > > > > > > Best, > > > > > > DB > > > > > > On Sun, Nov 6, 2011 at 6:30 PM, Sean Owen <[EMAIL PROTECTED]> wrote: > > > > > > > Oh do you only need the top k x k bit of the tridiagonal to find the > > > > top k eigenvalues? > > > > > > > > I really don't want to write a QR decomposer, phew. > > > > > > > > On Sun, Nov 6, 2011 at 11:26 PM, Ted Dunning <[EMAIL PROTECTED]> > > > > wrote: > > > > > The tridiagonal is much smaller than you would need if you wanted > all > > > the > > > > > eigenvalues. Since you only want a small number, you only have a > > > > > tri-diagonal matrix that is some multiple of that size. In-memory > > > > > decomposition makes total sense. > > > > > > > > > > > > > > >
-
Re: Understanding the SVD recommenderLance Norskog 2011-11-06, 23:59
Is there a formula (somewhere easy to find) for "slop" and "accurate"?
On Sun, Nov 6, 2011 at 3:57 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > Almost exactly. If you stop at k iterations, you don't get the top k > eigenvalues precisely. You need some extra slop to make sure your top > eigenvalues are accurate. > > On Sun, Nov 6, 2011 at 3:46 PM, Danny Bickson <[EMAIL PROTECTED] > >wrote: > > > Exactly, in each iteration you extract one more eigenvalue from largest > to > > smallest. If you run more iterations then the matrix rank then you get > > again all eigenvalues. In that case you may encounter each eigenvalue a > few > > times. > > > > On Sun, Nov 6, 2011 at 6:42 PM, Sean Owen <[EMAIL PROTECTED]> wrote: > > > > > True though I thought it was intended to run through all m steps - is > it > > > simply that you stop after some number k and that still leads you to > > > compute the k largest eigenvectors of the original? > > > On Nov 6, 2011 11:36 PM, "Danny Bickson" <[EMAIL PROTECTED]> > > wrote: > > > > > > > To be exact, the size of the tridiagonal matrix > > > > is number of iterations + 1. See description of the matrix T_mm > > > > in Wikipedia: http://en.wikipedia.org/wiki/Lanczos_algorithm > > > > > > > > Best, > > > > > > > > DB > > > > > > > > On Sun, Nov 6, 2011 at 6:30 PM, Sean Owen <[EMAIL PROTECTED]> wrote: > > > > > > > > > Oh do you only need the top k x k bit of the tridiagonal to find > the > > > > > top k eigenvalues? > > > > > > > > > > I really don't want to write a QR decomposer, phew. > > > > > > > > > > On Sun, Nov 6, 2011 at 11:26 PM, Ted Dunning < > [EMAIL PROTECTED]> > > > > > wrote: > > > > > > The tridiagonal is much smaller than you would need if you wanted > > all > > > > the > > > > > > eigenvalues. Since you only want a small number, you only have a > > > > > > tri-diagonal matrix that is some multiple of that size. > In-memory > > > > > > decomposition makes total sense. > > > > > > > > > > > > > > > > > > > > > -- Lance Norskog [EMAIL PROTECTED]
-
Re: Understanding the SVD recommenderTed Dunning 2011-11-07, 00:17
Not particularly easy to understand. See the Lanczos section in Matrix
Computations by Golub and van Loan<http://www.amazon.com/Computations-Hopkins-Studies-Mathematical-Sciences/dp/0801854148> There is a similar slop required for the random projection stuff. You could make a hand-waving argument that the slop is similarly motivated in both cases, but the random projection probably needs more slop because it commits to a sub-space on the first step. On Sun, Nov 6, 2011 at 3:59 PM, Lance Norskog <[EMAIL PROTECTED]> wrote: > Is there a formula (somewhere easy to find) for "slop" and "accurate"? > > On Sun, Nov 6, 2011 at 3:57 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > > > Almost exactly. If you stop at k iterations, you don't get the top k > > eigenvalues precisely. You need some extra slop to make sure your top > > eigenvalues are accurate. > > > > On Sun, Nov 6, 2011 at 3:46 PM, Danny Bickson <[EMAIL PROTECTED] > > >wrote: > > > > > Exactly, in each iteration you extract one more eigenvalue from largest > > to > > > smallest. If you run more iterations then the matrix rank then you get > > > again all eigenvalues. In that case you may encounter each eigenvalue a > > few > > > times. > > > > > > On Sun, Nov 6, 2011 at 6:42 PM, Sean Owen <[EMAIL PROTECTED]> wrote: > > > > > > > True though I thought it was intended to run through all m steps - is > > it > > > > simply that you stop after some number k and that still leads you to > > > > compute the k largest eigenvectors of the original? > > > > On Nov 6, 2011 11:36 PM, "Danny Bickson" <[EMAIL PROTECTED]> > > > wrote: > > > > > > > > > To be exact, the size of the tridiagonal matrix > > > > > is number of iterations + 1. See description of the matrix T_mm > > > > > in Wikipedia: http://en.wikipedia.org/wiki/Lanczos_algorithm > > > > > > > > > > Best, > > > > > > > > > > DB > > > > > > > > > > On Sun, Nov 6, 2011 at 6:30 PM, Sean Owen <[EMAIL PROTECTED]> > wrote: > > > > > > > > > > > Oh do you only need the top k x k bit of the tridiagonal to find > > the > > > > > > top k eigenvalues? > > > > > > > > > > > > I really don't want to write a QR decomposer, phew. > > > > > > > > > > > > On Sun, Nov 6, 2011 at 11:26 PM, Ted Dunning < > > [EMAIL PROTECTED]> > > > > > > wrote: > > > > > > > The tridiagonal is much smaller than you would need if you > wanted > > > all > > > > > the > > > > > > > eigenvalues. Since you only want a small number, you only > have a > > > > > > > tri-diagonal matrix that is some multiple of that size. > > In-memory > > > > > > > decomposition makes total sense. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > Lance Norskog > [EMAIL PROTECTED] >
-
Re: Understanding the SVD recommenderJake Mannix 2011-11-07, 00:19
The idea is this: after k iterations, you have a k-by-k tri-diagonal
matrix, the eigenvalues of *this* are *an approximation* to the top k eigenvalues of A*A'. The largest of these k will have the least error, the next will have more, and so on down the line. I've typically assumed that the bottom 1/3rd or so of the k are not going to be very accurate, but it depends on the spectrum of eigenvalues of A*A', sometimes all but the last few are very accurately computed (but sometimes it's even worse: if the spectrum is very flat, the mixing between eigenvectors can be near complete) But you don't need a formula: look at EigenVerificationJob for the way to measure your accuracy, you can just run this on your data after you've extracted out your first k singular vector/value pairs, and see how far into this list you want to keep. The measurement I use is "how close to an eigenvector is this vector?". I.e. how big is 1 - (v/|v|) . ((AA'v)/|AA'v|) ? -jake On Sun, Nov 6, 2011 at 3:59 PM, Lance Norskog <[EMAIL PROTECTED]> wrote: > Is there a formula (somewhere easy to find) for "slop" and "accurate"? > > On Sun, Nov 6, 2011 at 3:57 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > > > Almost exactly. If you stop at k iterations, you don't get the top k > > eigenvalues precisely. You need some extra slop to make sure your top > > eigenvalues are accurate. > > > > On Sun, Nov 6, 2011 at 3:46 PM, Danny Bickson <[EMAIL PROTECTED] > > >wrote: > > > > > Exactly, in each iteration you extract one more eigenvalue from largest > > to > > > smallest. If you run more iterations then the matrix rank then you get > > > again all eigenvalues. In that case you may encounter each eigenvalue a > > few > > > times. > > > > > > On Sun, Nov 6, 2011 at 6:42 PM, Sean Owen <[EMAIL PROTECTED]> wrote: > > > > > > > True though I thought it was intended to run through all m steps - is > > it > > > > simply that you stop after some number k and that still leads you to > > > > compute the k largest eigenvectors of the original? > > > > On Nov 6, 2011 11:36 PM, "Danny Bickson" <[EMAIL PROTECTED]> > > > wrote: > > > > > > > > > To be exact, the size of the tridiagonal matrix > > > > > is number of iterations + 1. See description of the matrix T_mm > > > > > in Wikipedia: http://en.wikipedia.org/wiki/Lanczos_algorithm > > > > > > > > > > Best, > > > > > > > > > > DB > > > > > > > > > > On Sun, Nov 6, 2011 at 6:30 PM, Sean Owen <[EMAIL PROTECTED]> > wrote: > > > > > > > > > > > Oh do you only need the top k x k bit of the tridiagonal to find > > the > > > > > > top k eigenvalues? > > > > > > > > > > > > I really don't want to write a QR decomposer, phew. > > > > > > > > > > > > On Sun, Nov 6, 2011 at 11:26 PM, Ted Dunning < > > [EMAIL PROTECTED]> > > > > > > wrote: > > > > > > > The tridiagonal is much smaller than you would need if you > wanted > > > all > > > > > the > > > > > > > eigenvalues. Since you only want a small number, you only > have a > > > > > > > tri-diagonal matrix that is some multiple of that size. > > In-memory > > > > > > > decomposition makes total sense. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > Lance Norskog > [EMAIL PROTECTED] >
-
Re: Understanding the SVD recommenderTed Dunning 2011-11-07, 00:27
it only matters for non-Mahout applications, but if you have repeated
eigenvalues, then any vector in the space spanned by the corresponding eigenvectors is also an eigenvector. To the extent that you might have nearly repeated eigenvalues, vectors in the space spanned by the corresponding eigenvectors will be nearly eigenvectors. That is what makes Jake's test work ... to misquote the over-quoted move, Eigen is as Eigen does. On Sun, Nov 6, 2011 at 4:19 PM, Jake Mannix <[EMAIL PROTECTED]> wrote: > But you don't need a formula: look at EigenVerificationJob for the way to > measure your accuracy, you can just run this on your data after you've > extracted out your first k singular vector/value pairs, and see how far > into > this list you want to keep. The measurement I use is "how close to an > eigenvector is this vector?". I.e. how big is 1 - (v/|v|) . > ((AA'v)/|AA'v|) ? >
-
Re: Understanding the SVD recommenderDanny Bickson 2011-11-07, 00:29
Regarding accuracy, you can also compute norm(T- V'*A*V)
where T is the tridiagonal matrix, V is a matrix with the intermediate solution vectors v_i , and A is the original matrix you want to approximate. (If A is not symmetric then replace it with A*A') On Sun, Nov 6, 2011 at 7:19 PM, Jake Mannix <[EMAIL PROTECTED]> wrote: > The idea is this: after k iterations, you have a k-by-k tri-diagonal > matrix, the > eigenvalues of *this* are *an approximation* to the top k eigenvalues of > A*A'. The largest of these k will have the least error, the next will have > more, > and so on down the line. I've typically assumed that the bottom 1/3rd or > so of the k are not going to be very accurate, but it depends on the > spectrum > of eigenvalues of A*A', sometimes all but the last few are very accurately > computed (but sometimes it's even worse: if the spectrum is very flat, the > mixing between eigenvectors can be near complete) > > But you don't need a formula: look at EigenVerificationJob for the way to > measure your accuracy, you can just run this on your data after you've > extracted out your first k singular vector/value pairs, and see how far > into > this list you want to keep. The measurement I use is "how close to an > eigenvector is this vector?". I.e. how big is 1 - (v/|v|) . > ((AA'v)/|AA'v|) ? > > -jake > > > On Sun, Nov 6, 2011 at 3:59 PM, Lance Norskog <[EMAIL PROTECTED]> wrote: > > > Is there a formula (somewhere easy to find) for "slop" and "accurate"? > > > > On Sun, Nov 6, 2011 at 3:57 PM, Ted Dunning <[EMAIL PROTECTED]> > wrote: > > > > > Almost exactly. If you stop at k iterations, you don't get the top k > > > eigenvalues precisely. You need some extra slop to make sure your top > > > eigenvalues are accurate. > > > > > > On Sun, Nov 6, 2011 at 3:46 PM, Danny Bickson <[EMAIL PROTECTED] > > > >wrote: > > > > > > > Exactly, in each iteration you extract one more eigenvalue from > largest > > > to > > > > smallest. If you run more iterations then the matrix rank then you > get > > > > again all eigenvalues. In that case you may encounter each > eigenvalue a > > > few > > > > times. > > > > > > > > On Sun, Nov 6, 2011 at 6:42 PM, Sean Owen <[EMAIL PROTECTED]> wrote: > > > > > > > > > True though I thought it was intended to run through all m steps - > is > > > it > > > > > simply that you stop after some number k and that still leads you > to > > > > > compute the k largest eigenvectors of the original? > > > > > On Nov 6, 2011 11:36 PM, "Danny Bickson" <[EMAIL PROTECTED] > > > > > > wrote: > > > > > > > > > > > To be exact, the size of the tridiagonal matrix > > > > > > is number of iterations + 1. See description of the matrix T_mm > > > > > > in Wikipedia: http://en.wikipedia.org/wiki/Lanczos_algorithm > > > > > > > > > > > > Best, > > > > > > > > > > > > DB > > > > > > > > > > > > On Sun, Nov 6, 2011 at 6:30 PM, Sean Owen <[EMAIL PROTECTED]> > > wrote: > > > > > > > > > > > > > Oh do you only need the top k x k bit of the tridiagonal to > find > > > the > > > > > > > top k eigenvalues? > > > > > > > > > > > > > > I really don't want to write a QR decomposer, phew. > > > > > > > > > > > > > > On Sun, Nov 6, 2011 at 11:26 PM, Ted Dunning < > > > [EMAIL PROTECTED]> > > > > > > > wrote: > > > > > > > > The tridiagonal is much smaller than you would need if you > > wanted > > > > all > > > > > > the > > > > > > > > eigenvalues. Since you only want a small number, you only > > have a > > > > > > > > tri-diagonal matrix that is some multiple of that size. > > > In-memory > > > > > > > > decomposition makes total sense. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > Lance Norskog > > [EMAIL PROTECTED] > > >
-
Re: Understanding the SVD recommenderTed Dunning 2011-11-07, 00:33
I think that this gives you a measure of the error due to all of the
missing eigenvectors/values, not the accuracy of the ones that are present. Both questions are interesting ones and the answers are clearly somewhat related. On Sun, Nov 6, 2011 at 4:29 PM, Danny Bickson <[EMAIL PROTECTED]>wrote: > Regarding accuracy, you can also compute norm(T- V'*A*V) > where T is the tridiagonal matrix, V is a matrix with the intermediate > solution vectors v_i , and A is the original matrix you want to > approximate. (If A is not symmetric then replace it with A*A') > > On Sun, Nov 6, 2011 at 7:19 PM, Jake Mannix <[EMAIL PROTECTED]> wrote: > > > The idea is this: after k iterations, you have a k-by-k tri-diagonal > > matrix, the > > eigenvalues of *this* are *an approximation* to the top k eigenvalues of > > A*A'. The largest of these k will have the least error, the next will > have > > more, > > and so on down the line. I've typically assumed that the bottom 1/3rd or > > so of the k are not going to be very accurate, but it depends on the > > spectrum > > of eigenvalues of A*A', sometimes all but the last few are very > accurately > > computed (but sometimes it's even worse: if the spectrum is very flat, > the > > mixing between eigenvectors can be near complete) > > > > But you don't need a formula: look at EigenVerificationJob for the way to > > measure your accuracy, you can just run this on your data after you've > > extracted out your first k singular vector/value pairs, and see how far > > into > > this list you want to keep. The measurement I use is "how close to an > > eigenvector is this vector?". I.e. how big is 1 - (v/|v|) . > > ((AA'v)/|AA'v|) ? > > > > -jake > > > > > > On Sun, Nov 6, 2011 at 3:59 PM, Lance Norskog <[EMAIL PROTECTED]> wrote: > > > > > Is there a formula (somewhere easy to find) for "slop" and "accurate"? > > > > > > On Sun, Nov 6, 2011 at 3:57 PM, Ted Dunning <[EMAIL PROTECTED]> > > wrote: > > > > > > > Almost exactly. If you stop at k iterations, you don't get the top k > > > > eigenvalues precisely. You need some extra slop to make sure your > top > > > > eigenvalues are accurate. > > > > > > > > On Sun, Nov 6, 2011 at 3:46 PM, Danny Bickson < > [EMAIL PROTECTED] > > > > >wrote: > > > > > > > > > Exactly, in each iteration you extract one more eigenvalue from > > largest > > > > to > > > > > smallest. If you run more iterations then the matrix rank then you > > get > > > > > again all eigenvalues. In that case you may encounter each > > eigenvalue a > > > > few > > > > > times. > > > > > > > > > > On Sun, Nov 6, 2011 at 6:42 PM, Sean Owen <[EMAIL PROTECTED]> > wrote: > > > > > > > > > > > True though I thought it was intended to run through all m steps > - > > is > > > > it > > > > > > simply that you stop after some number k and that still leads you > > to > > > > > > compute the k largest eigenvectors of the original? > > > > > > On Nov 6, 2011 11:36 PM, "Danny Bickson" < > [EMAIL PROTECTED] > > > > > > > > wrote: > > > > > > > > > > > > > To be exact, the size of the tridiagonal matrix > > > > > > > is number of iterations + 1. See description of the matrix T_mm > > > > > > > in Wikipedia: http://en.wikipedia.org/wiki/Lanczos_algorithm > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > DB > > > > > > > > > > > > > > On Sun, Nov 6, 2011 at 6:30 PM, Sean Owen <[EMAIL PROTECTED]> > > > wrote: > > > > > > > > > > > > > > > Oh do you only need the top k x k bit of the tridiagonal to > > find > > > > the > > > > > > > > top k eigenvalues? > > > > > > > > > > > > > > > > I really don't want to write a QR decomposer, phew. > > > > > > > > > > > > > > > > On Sun, Nov 6, 2011 at 11:26 PM, Ted Dunning < > > > > [EMAIL PROTECTED]> > > > > > > > > wrote: > > > > > > > > > The tridiagonal is much smaller than you would need if you > > > wanted > > > > > all > > > > > > > the > > > > > > > > > eigenvalues. Since you only want a small number, you only > > > have a > > > > > > > > > tri-diagonal matrix that is some multiple of that size.
-
Re: Understanding the SVD recommenderLance Norskog 2011-11-07, 00:55
There's a formula for random projection. Something along the lines of "if
your data has error e, a function on (new dimensions, old dimensions, e)". (Or maybe I found this in one of the +1/-1 random projection papers.) But anyway a function on the error of the original data seems more useful than an absolute error. On Sun, Nov 6, 2011 at 4:33 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > I think that this gives you a measure of the error due to all of the > missing eigenvectors/values, not the accuracy of the ones that are present. > > Both questions are interesting ones and the answers are clearly somewhat > related. > > On Sun, Nov 6, 2011 at 4:29 PM, Danny Bickson <[EMAIL PROTECTED] > >wrote: > > > Regarding accuracy, you can also compute norm(T- V'*A*V) > > where T is the tridiagonal matrix, V is a matrix with the intermediate > > solution vectors v_i , and A is the original matrix you want to > > approximate. (If A is not symmetric then replace it with A*A') > > > > On Sun, Nov 6, 2011 at 7:19 PM, Jake Mannix <[EMAIL PROTECTED]> > wrote: > > > > > The idea is this: after k iterations, you have a k-by-k tri-diagonal > > > matrix, the > > > eigenvalues of *this* are *an approximation* to the top k eigenvalues > of > > > A*A'. The largest of these k will have the least error, the next will > > have > > > more, > > > and so on down the line. I've typically assumed that the bottom 1/3rd > or > > > so of the k are not going to be very accurate, but it depends on the > > > spectrum > > > of eigenvalues of A*A', sometimes all but the last few are very > > accurately > > > computed (but sometimes it's even worse: if the spectrum is very flat, > > the > > > mixing between eigenvectors can be near complete) > > > > > > But you don't need a formula: look at EigenVerificationJob for the way > to > > > measure your accuracy, you can just run this on your data after you've > > > extracted out your first k singular vector/value pairs, and see how far > > > into > > > this list you want to keep. The measurement I use is "how close to an > > > eigenvector is this vector?". I.e. how big is 1 - (v/|v|) . > > > ((AA'v)/|AA'v|) ? > > > > > > -jake > > > > > > > > > On Sun, Nov 6, 2011 at 3:59 PM, Lance Norskog <[EMAIL PROTECTED]> > wrote: > > > > > > > Is there a formula (somewhere easy to find) for "slop" and > "accurate"? > > > > > > > > On Sun, Nov 6, 2011 at 3:57 PM, Ted Dunning <[EMAIL PROTECTED]> > > > wrote: > > > > > > > > > Almost exactly. If you stop at k iterations, you don't get the > top k > > > > > eigenvalues precisely. You need some extra slop to make sure your > > top > > > > > eigenvalues are accurate. > > > > > > > > > > On Sun, Nov 6, 2011 at 3:46 PM, Danny Bickson < > > [EMAIL PROTECTED] > > > > > >wrote: > > > > > > > > > > > Exactly, in each iteration you extract one more eigenvalue from > > > largest > > > > > to > > > > > > smallest. If you run more iterations then the matrix rank then > you > > > get > > > > > > again all eigenvalues. In that case you may encounter each > > > eigenvalue a > > > > > few > > > > > > times. > > > > > > > > > > > > On Sun, Nov 6, 2011 at 6:42 PM, Sean Owen <[EMAIL PROTECTED]> > > wrote: > > > > > > > > > > > > > True though I thought it was intended to run through all m > steps > > - > > > is > > > > > it > > > > > > > simply that you stop after some number k and that still leads > you > > > to > > > > > > > compute the k largest eigenvectors of the original? > > > > > > > On Nov 6, 2011 11:36 PM, "Danny Bickson" < > > [EMAIL PROTECTED] > > > > > > > > > > wrote: > > > > > > > > > > > > > > > To be exact, the size of the tridiagonal matrix > > > > > > > > is number of iterations + 1. See description of the matrix > T_mm > > > > > > > > in Wikipedia: http://en.wikipedia.org/wiki/Lanczos_algorithm > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > > > DB > > > > > > > > > > > > > > > > On Sun, Nov 6, 2011 at 6:30 PM, Sean Owen <[EMAIL PROTECTED]> Lance Norskog [EMAIL PROTECTED]
-
Re: Understanding the SVD recommenderSean Owen 2011-11-17, 13:26
One more question. OK, so I use Lanczos to find V_k by finding the top k
eigenvectors of AT * A. A is sparse. But isn't AT * A dense, then? Is that just how it is? This will be my last basic question for the week: I understand that A ~= U_k * S_k * V_kT. Let's call the product on the right A_k. A_k = A * V_k * V_kT right? And then A_k is just my complete prediction matrix right? It's dense so it's not formed all at once. But all I need are V_k and its transpose to do the work. I somehow thought it was more complicated than this -- having come back to this I keep wondering if I forget something here. On Sun, Nov 6, 2011 at 6:08 PM, Jake Mannix <[EMAIL PROTECTED]> wrote: > Re: Lanczos, yes, it operates by finding V as you describe. The user is > required to do more work to recapture U. Practical reason is that the > assumption is numCols(A) = numFeatures which is much less than numRows(A) > numTrainingSamples > > On Nov 6, 2011 9:52 AM, "Sean Owen" <[EMAIL PROTECTED]> wrote: > > Following up on this very old thread. > > I understood all this bit, thanks, that greatly clarified. > > You multiply a new user vector by V to project it into the new > "pseudo-item", reduced-dimension space. > And to get back to real items, you multiply by V's inverse, which is > its transpose. > And so you are really multiplying the user vector by V VT, which is > not a no-op, since those are truncated matrices and aren't actually > exact inverses (?) > > The original paper talks about cosine similarities between users or > items in the reduced-dimension space, but, can anyone shine light on > the point of that? From the paper also, it seems like they say the > predictions are just computed as vector products as above. > > > Finally, separately, I'm trying to understand the Lanczos method as > part of computing an SVD. Lanczos operates on a real symmetric matrix > right? And am I right that it comes into play when you are computing > and SVD... > > A = U * S * VT > > ... because U is actually the eigenvectors of (symmetric) A*AT and V > is the eigenvectors of AT*A? And so Lanczos is used to answer those > questions to complete the SVD? > > On Fri, Jun 4, 2010 at 6:48 AM, Ted Dunning <[EMAIL PROTECTED]> wrote: > > You are correct. The... >
-
Re: Understanding the SVD recommenderJake Mannix 2011-11-17, 15:21
On Thu, Nov 17, 2011 at 5:26 AM, Sean Owen <[EMAIL PROTECTED]> wrote:
> One more question. OK, so I use Lanczos to find V_k by finding the top k > eigenvectors of AT * A. A is sparse. But isn't AT * A dense, then? Is that > just how it is? > A'A and AA' are both dense, yes, but you never compute them. There are several ways to avoid computing them: 1) you do two Matrix - Vector multiplications per Lanczos iteration: first take (sparse) A and multiply it by a vector v. Then take (also sparse) A' and multiply it by the now-computed u = Av. A'u = A'(A v) = (A'A)v 2) you rewrite the double summation involved in the definition of (A'A)v to notice that it can be done in one pass over A: (A'A)v = vectorSum_i(a_i' (a_i * v)) where a_i is the i'th row of A, and the Map operation is everything in the outer parenthesis: take vector v (which is living in memory) and dot it with a_i, then take emit a_i' scaled by this value. Reduce is just vector summation. We take approach 2), and is why we have the horrible method "timesSquared(Vector)" as a special optimization method in the VectorIterable interface. This trick is essentially how we compute sparse matrix multiplication as well. > This will be my last basic question for the week: > > I understand that A ~= U_k * S_k * V_kT. Let's call the product on the > right A_k. > > A_k = A * V_k * V_kT right? > yeah, I guess so. > And then A_k is just my complete prediction matrix right? It's dense so > it's not formed all at once. But all I need are V_k and its transpose to do > the work. > Correct. With V_k and V_k' in memory (if they're small enough), you can compute a row of A_k on the fly by doing two matrix-vector multiplications. > I somehow thought it was more complicated than this -- having come back to > this I keep wondering if I forget something here. > No, I think this is it. You might need S^-1 somewhere in there that I'm forgetting, but this looks right. -jake > > > On Sun, Nov 6, 2011 at 6:08 PM, Jake Mannix <[EMAIL PROTECTED]> wrote: > > > Re: Lanczos, yes, it operates by finding V as you describe. The user is > > required to do more work to recapture U. Practical reason is that the > > assumption is numCols(A) = numFeatures which is much less than > numRows(A) > > numTrainingSamples > > > > On Nov 6, 2011 9:52 AM, "Sean Owen" <[EMAIL PROTECTED]> wrote: > > > > Following up on this very old thread. > > > > I understood all this bit, thanks, that greatly clarified. > > > > You multiply a new user vector by V to project it into the new > > "pseudo-item", reduced-dimension space. > > And to get back to real items, you multiply by V's inverse, which is > > its transpose. > > And so you are really multiplying the user vector by V VT, which is > > not a no-op, since those are truncated matrices and aren't actually > > exact inverses (?) > > > > The original paper talks about cosine similarities between users or > > items in the reduced-dimension space, but, can anyone shine light on > > the point of that? From the paper also, it seems like they say the > > predictions are just computed as vector products as above. > > > > > > Finally, separately, I'm trying to understand the Lanczos method as > > part of computing an SVD. Lanczos operates on a real symmetric matrix > > right? And am I right that it comes into play when you are computing > > and SVD... > > > > A = U * S * VT > > > > ... because U is actually the eigenvectors of (symmetric) A*AT and V > > is the eigenvectors of AT*A? And so Lanczos is used to answer those > > questions to complete the SVD? > > > > On Fri, Jun 4, 2010 at 6:48 AM, Ted Dunning <[EMAIL PROTECTED]> > wrote: > > > You are correct. The... > > >
-
Re: Understanding the SVD recommenderSean Owen 2011-11-17, 17:12
Ah-ha. That's clicked now. Especially as I read the comments and see it
already says exactly this. And I understand that you just compute extra eigenvectors then throw out near-duplicates, or those that are too un-eigenvector -- are there good pointers on the alternatives for that, or are alternatives impractical? I understand it's possible to re-figure the vectors it produces at each iteration to be orthogonal, but don't know whether that's considered unhelpful. On Thu, Nov 17, 2011 at 3:21 PM, Jake Mannix <[EMAIL PROTECTED]> wrote: > > 2) you rewrite the double summation involved in the definition of > (A'A)v to notice that it can be done in one pass over A: > > (A'A)v = vectorSum_i(a_i' (a_i * v)) > >
-
Re: Understanding the SVD recommenderTed Dunning 2011-11-17, 17:30
On Thu, Nov 17, 2011 at 7:21 AM, Jake Mannix <[EMAIL PROTECTED]> wrote:
> On Thu, Nov 17, 2011 at 5:26 AM, Sean Owen <[EMAIL PROTECTED]> wrote: > > > One more question. OK, so I use Lanczos to find V_k by finding the top k > > eigenvectors of AT * A. A is sparse. But isn't AT * A dense, then? Is > that > > just how it is? > > > > A'A and AA' are both dense, yes, but you never compute them. Actually, they aren't necessarily dense. Computing them explicitly can still be a pain in a****. > ... > 1) you do two Matrix - Vector multiplications per Lanczos iteration: > first > take (sparse) A and multiply it by a vector v. Then take (also sparse) A' > and multiply it by the now-computed u = Av. A'u = A'(A v) = (A'A)v > This is the traditional method used in most implementations of Lanczos. Jake's one-pass map-reduce is better for our needs.
-
Re: Understanding the SVD recommenderTed Dunning 2011-11-17, 17:32
Yeah... a good alternative is to use the random projection stuff.
On Thu, Nov 17, 2011 at 9:12 AM, Sean Owen <[EMAIL PROTECTED]> wrote: > Ah-ha. That's clicked now. Especially as I read the comments and see it > already says exactly this. > > And I understand that you just compute extra eigenvectors then throw out > near-duplicates, or those that are too un-eigenvector -- are there good > pointers on the alternatives for that, or are alternatives impractical? I > understand it's possible to re-figure the vectors it produces at each > iteration to be orthogonal, but don't know whether that's considered > unhelpful. >
-
Re: Understanding the SVD recommenderDmitriy Lyubimov 2011-11-17, 19:30
I will finish adding an option with Cholesky decomposition route to
SSVD some time early in Q1 2012. RE: distributed QR: right now this option is architecturally "melted" with SSVD because MR steps that do distributed QR doing extra job (i would have to use more MR jobs if i did not, to do the same). Although QR related math is modelled into nice and standalone set of "pipeline preprocessors" (in other word, MR collector decorators) which could be wired up in a standalone MR QR solver, such wiring at the moment does not exist (although it is not difficult to wire one up). However, it would seem to me that QR as a completely isolated job would have little value in machine learning applications. Thanks. -d On Sun, Nov 6, 2011 at 3:26 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > The tridiagonal is much smaller than you would need if you wanted all the > eigenvalues. Since you only want a small number, you only have a > tri-diagonal matrix that is some multiple of that size. In-memory > decomposition makes total sense. > > Regarding QR decomposition, Dmitriy has already built a parallel version. > A large QR is required at one point in the SSVD algorithm. I have shown a > way to avoid this large QR using a much smaller Cholesky decomposition, but > it isn't entirely clear that this is a net win. It is a huge win with the > sequential version. > > On Sun, Nov 6, 2011 at 3:17 PM, Sean Owen <[EMAIL PROTECTED]> wrote: > >> Yeah OK, that makes sense. That's a pretty helpful paper. >> >> I can get through the Lanczos algorithm without much trouble, I think. >> The piece I'm looking for pointers on at the moment is the eigen >> decomposition of the tridiagonal matrix. >> >> Jake am I right that this is done in memory right now? I am sure your >> previous comment actually answers this next question, but I haven't >> connected the dots yet. If we're getting eigenvectors of AT * A, and >> that matrix is huge, then isn't the tridiagonal matrix still huge -- >> albeit it has only 3m entries, not m^2. It seems like the code just >> calls to EigenDecomposer which treats it as a dense m x m matrix. >> >> (Is it worth me figuring out the QR decomposition enough to build a >> distributed version of it?) >> >> On Sun, Nov 6, 2011 at 7:25 PM, Sebastian Schelter <[EMAIL PROTECTED]> wrote: >> > On 06.11.2011 18:52, Sean Owen wrote: >> >> Following up on this very old thread. >> >> >> >> I understood all this bit, thanks, that greatly clarified. >> >> >> >> You multiply a new user vector by V to project it into the new >> >> "pseudo-item", reduced-dimension space. >> >> And to get back to real items, you multiply by V's inverse, which is >> >> its transpose. >> >> And so you are really multiplying the user vector by V VT, which is >> >> not a no-op, since those are truncated matrices and aren't actually >> >> exact inverses (?) >> >> >> >> The original paper talks about cosine similarities between users or >> >> items in the reduced-dimension space, but, can anyone shine light on >> >> the point of that? >> > >> > I think this interpretation is coming from LSI where you project a query >> > onto the document feature space and use the cosine to find the most >> > similar documents (which can be done by simple matrix vector >> > mulitplication as the singular vectors are orthonormal and computing dot >> > products with the projected query is therefore equal to computing >> cosines). >> > >> > Something similar could be done to find most similar items or users, the >> > bad thing is that AFAIK you have to look at every other user or item as >> > U and V are dense. >> > >> > Maybe this paper can help to shine light on that: >> > >> > "Using Linear Algebra for Intelligent Information Retrieval" >> > >> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.33.3138&rep=rep1&type=pdf >> > >> > From the paper also, it seems like they say the >> >> predictions are just computed as vector products as above. >> >> >> >> >> >> Finally, separately, I'm trying to understand the Lanczos method as
-
Re: Understanding the SVD recommenderDmitriy Lyubimov 2011-11-17, 19:44
Also:
my personal understanding that use of straightforward SVD in recommenders produces oversmoothed (or overregularized) results. Intuition behind this is as follows: consider Netflix example (users x movies). if I am a user, i probbably rated 40 or so movies out of say 100,000 that Netflix has. The typical strategy to populate input rating matrix (in order to keep it sparse, among other things), is to compute and average of my rating and subtract it from all rated positions (leaving unrated positions at 0). Now, say for the sake of this discussion, my average rating is 3. But some how I am heavily leaning to chick flicks. So out of all 40, i saw about 10 that i rated with the highest rating. So any reasonable inference should figure that if i rated 10 out of 10 chick flicks at their highest rating, then i am probably into that stuff. But the computation wouldn't see it the same way. Netflix may be having 10,000 chick flicks movies, and for computation it basically would look like i rated 10 movies at 5 and the rest of 10,000 of them as 3 (my average). So, 10 '5's out of 10,000 '3' s would still rate my interest in chick flicks pretty low (nearly back to what happens to be my average). In other words, regularization of the training is waaay up. So oversmoothing over sparse data is a problem here. Other algorithms deal with that by either not taking unrated examples into account at all (SGD-based factorization methods) or providing some sort of weighting to the regularization amplification based on the number of rated samples (i think that's ALS WR's way of coping with this). Am i making any sense? or i am fundamentally wrong in my understanding of the flow of basic SVD recommendation?
-
Re: Understanding the SVD recommenderTed Dunning 2011-11-17, 20:15
Agree.
On Thu, Nov 17, 2011 at 11:30 AM, Dmitriy Lyubimov <[EMAIL PROTECTED]>wrote: > However, > it would seem to me that QR as a completely isolated job would have > little value in machine learning applications. >
-
Re: Understanding the SVD recommenderDmitriy Lyubimov 2011-11-17, 20:33
On Thu, Nov 17, 2011 at 11:30 AM, Dmitriy Lyubimov <[EMAIL PROTECTED]> wrote:
> I will finish adding an option with Cholesky decomposition route to > SSVD some time early in Q1 2012. > PPS i already put some jobs in (they are in the trunk) for Cholesky route. I thought it would be an easy mod but then i saw that it would require a little bit more modifications to also support power iterations the same way they are supported today (and also i still kind of couldn't quite finish my thought process on what it would take to modify U-job to produce U without Q in his case, it seems this route will require a 100% special handling and i wouldn't be able to reuse any of current U job for this option. For these reasons, i decided to wait until i figure all of the remaining issues architecturally before i proceed. And that would better be a one longer chunk of time rather than several little chunks, which makes it dependent more on my schedule to figure where that chunk might be.
-
Re: Understanding the SVD recommenderSebastian Schelter 2011-11-17, 20:38
I think Dmitriys description of the SGD and ALS-WR approach hits the
nail on the head. However there is a third way to factorize the rating matrix which we haven't talked about yet. It's described in Yehuda Koren's "Collaborative Filtering for Implicit Feedback Datasets" http://research.yahoo.com/pub/2433 and I recently added it to ParallelALSFactorizationJob. This approach works on implicit feedback data (like the number of times a user watched a television series) and all unobserved interactions are by definition 0. Using a standard SVD would result in the problems Dmitriy described. But the paper introduces a very interesting approach: the user-item matrix holds 0s and 1s only (0 in a cell if there have been no interactions, 1 if there have been 1 or more interactions). This matrix is decomposed into two other matrices X and Y (user and item features) by minimizing the (regularized) squared error over all observations (which is the same as in ALS-WR). However the error is weighted by a confidence value that is very low if the user never interacted with the item (because he simply might not be aware that this item exists) and very high if the user interacted very often with the item (a good indication of preference). That should help to avoid the problems that Dmitriy described. --sebastian 2011/11/17 Dmitriy Lyubimov <[EMAIL PROTECTED]>: > On Thu, Nov 17, 2011 at 11:30 AM, Dmitriy Lyubimov <[EMAIL PROTECTED]> wrote: >> I will finish adding an option with Cholesky decomposition route to >> SSVD some time early in Q1 2012. >> > > PPS i already put some jobs in (they are in the trunk) for Cholesky > route. I thought it would be an easy mod but then i saw that it would > require a little bit more modifications to also support power > iterations the same way they are supported today (and also i still > kind of couldn't quite finish my thought process on what it would take > to modify U-job to produce U without Q in his case, it seems this > route will require a 100% special handling and i wouldn't be able to > reuse any of current U job for this option. > > For these reasons, i decided to wait until i figure all of the > remaining issues architecturally before i proceed. And that would > better be a one longer chunk of time rather than several little > chunks, which makes it dependent more on my schedule to figure where > that chunk might be. >
-
Re: Understanding the SVD recommenderSebastian Schelter 2011-11-17, 20:51
One small correction: ALS-WR is also only looking at rated examples.
2011/11/17 Sebastian Schelter <[EMAIL PROTECTED]>: > I think Dmitriys description of the SGD and ALS-WR approach hits the > nail on the head. > > However there is a third way to factorize the rating matrix which we > haven't talked about yet. It's described in Yehuda Koren's > "Collaborative Filtering for Implicit Feedback Datasets" > http://research.yahoo.com/pub/2433 and I recently added it to > ParallelALSFactorizationJob. > > This approach works on implicit feedback data (like the number of > times a user watched a television series) and all unobserved > interactions are by definition 0. Using a standard SVD would result in > the problems Dmitriy described. > > But the paper introduces a very interesting approach: the user-item > matrix holds 0s and 1s only (0 in a cell if there have been no > interactions, 1 if there have been 1 or more interactions). This > matrix is decomposed into two other matrices X and Y (user and item > features) by minimizing the (regularized) squared error over all > observations (which is the same as in ALS-WR). However the error is > weighted by a confidence value that is very low if the user never > interacted with the item (because he simply might not be aware that > this item exists) and very high if the user interacted very often with > the item (a good indication of preference). That should help to avoid > the problems that Dmitriy described. > > --sebastian > > > 2011/11/17 Dmitriy Lyubimov <[EMAIL PROTECTED]>: >> On Thu, Nov 17, 2011 at 11:30 AM, Dmitriy Lyubimov <[EMAIL PROTECTED]> wrote: >>> I will finish adding an option with Cholesky decomposition route to >>> SSVD some time early in Q1 2012. >>> >> >> PPS i already put some jobs in (they are in the trunk) for Cholesky >> route. I thought it would be an easy mod but then i saw that it would >> require a little bit more modifications to also support power >> iterations the same way they are supported today (and also i still >> kind of couldn't quite finish my thought process on what it would take >> to modify U-job to produce U without Q in his case, it seems this >> route will require a 100% special handling and i wouldn't be able to >> reuse any of current U job for this option. >> >> For these reasons, i decided to wait until i figure all of the >> remaining issues architecturally before i proceed. And that would >> better be a one longer chunk of time rather than several little >> chunks, which makes it dependent more on my schedule to figure where >> that chunk might be. >> >
-
Re: Understanding the SVD recommenderDmitriy Lyubimov 2011-11-17, 20:57
Yes. This is even one more step away from straightforward SVD, i.e.
explicitly analyizing implicit feedback (pun intended). On Thu, Nov 17, 2011 at 12:38 PM, Sebastian Schelter <[EMAIL PROTECTED]> wrote: > I think Dmitriys description of the SGD and ALS-WR approach hits the > nail on the head. > > However there is a third way to factorize the rating matrix which we > haven't talked about yet. It's described in Yehuda Koren's > "Collaborative Filtering for Implicit Feedback Datasets" > http://research.yahoo.com/pub/2433 and I recently added it to > ParallelALSFactorizationJob. > > This approach works on implicit feedback data (like the number of > times a user watched a television series) and all unobserved > interactions are by definition 0. Using a standard SVD would result in > the problems Dmitriy described. > > But the paper introduces a very interesting approach: the user-item > matrix holds 0s and 1s only (0 in a cell if there have been no > interactions, 1 if there have been 1 or more interactions). This > matrix is decomposed into two other matrices X and Y (user and item > features) by minimizing the (regularized) squared error over all > observations (which is the same as in ALS-WR). However the error is > weighted by a confidence value that is very low if the user never > interacted with the item (because he simply might not be aware that > this item exists) and very high if the user interacted very often with > the item (a good indication of preference). That should help to avoid > the problems that Dmitriy described. > > --sebastian > > > 2011/11/17 Dmitriy Lyubimov <[EMAIL PROTECTED]>: >> On Thu, Nov 17, 2011 at 11:30 AM, Dmitriy Lyubimov <[EMAIL PROTECTED]> wrote: >>> I will finish adding an option with Cholesky decomposition route to >>> SSVD some time early in Q1 2012. >>> >> >> PPS i already put some jobs in (they are in the trunk) for Cholesky >> route. I thought it would be an easy mod but then i saw that it would >> require a little bit more modifications to also support power >> iterations the same way they are supported today (and also i still >> kind of couldn't quite finish my thought process on what it would take >> to modify U-job to produce U without Q in his case, it seems this >> route will require a 100% special handling and i wouldn't be able to >> reuse any of current U job for this option. >> >> For these reasons, i decided to wait until i figure all of the >> remaining issues architecturally before i proceed. And that would >> better be a one longer chunk of time rather than several little >> chunks, which makes it dependent more on my schedule to figure where >> that chunk might be. >> >
-
Re: Understanding the SVD recommenderTed Dunning 2011-11-17, 21:36
Adding weights is actually not at all incompatible with the original idea of singular vale decomposition. It may make some algorithms trickier but weighting is a very reasonable companion to any approximation algorithm.
Sent from my iPhone On Nov 17, 2011, at 12:57, Dmitriy Lyubimov <[EMAIL PROTECTED]> wrote: > Yes. This is even one more step away from straightforward SVD, i.e. > explicitly analyizing implicit feedback (pun intended). > > On Thu, Nov 17, 2011 at 12:38 PM, Sebastian Schelter <[EMAIL PROTECTED]> wrote: >> I think Dmitriys description of the SGD and ALS-WR approach hits the >> nail on the head. >> >> However there is a third way to factorize the rating matrix which we >> haven't talked about yet. It's described in Yehuda Koren's >> "Collaborative Filtering for Implicit Feedback Datasets" >> http://research.yahoo.com/pub/2433 and I recently added it to >> ParallelALSFactorizationJob. >> >> This approach works on implicit feedback data (like the number of >> times a user watched a television series) and all unobserved >> interactions are by definition 0. Using a standard SVD would result in >> the problems Dmitriy described. >> >> But the paper introduces a very interesting approach: the user-item >> matrix holds 0s and 1s only (0 in a cell if there have been no >> interactions, 1 if there have been 1 or more interactions). This >> matrix is decomposed into two other matrices X and Y (user and item >> features) by minimizing the (regularized) squared error over all >> observations (which is the same as in ALS-WR). However the error is >> weighted by a confidence value that is very low if the user never >> interacted with the item (because he simply might not be aware that >> this item exists) and very high if the user interacted very often with >> the item (a good indication of preference). That should help to avoid >> the problems that Dmitriy described. >> >> --sebastian >> >> >> 2011/11/17 Dmitriy Lyubimov <[EMAIL PROTECTED]>: >>> On Thu, Nov 17, 2011 at 11:30 AM, Dmitriy Lyubimov <[EMAIL PROTECTED]> wrote: >>>> I will finish adding an option with Cholesky decomposition route to >>>> SSVD some time early in Q1 2012. >>>> >>> >>> PPS i already put some jobs in (they are in the trunk) for Cholesky >>> route. I thought it would be an easy mod but then i saw that it would >>> require a little bit more modifications to also support power >>> iterations the same way they are supported today (and also i still >>> kind of couldn't quite finish my thought process on what it would take >>> to modify U-job to produce U without Q in his case, it seems this >>> route will require a 100% special handling and i wouldn't be able to >>> reuse any of current U job for this option. >>> >>> For these reasons, i decided to wait until i figure all of the >>> remaining issues architecturally before i proceed. And that would >>> better be a one longer chunk of time rather than several little >>> chunks, which makes it dependent more on my schedule to figure where >>> that chunk might be. >>> >>
-
Re: Understanding the SVD recommenderDmitriy Lyubimov 2011-11-29, 01:04
Picking this old thread a little bit...
I was wondering: is there a way to reduce task of ALS with regularization (or even ALS-WR) to a mathematical definition of reduced rank SVD? If somehow i could work in a regularization under the SSVD hood, that would solve most of iterative ills i think... But since i haven't seen math around for it, i assume it is not very practical or even possible? Thanks. -d On Thu, Nov 17, 2011 at 12:38 PM, Sebastian Schelter <[EMAIL PROTECTED]> wrote: > I think Dmitriys description of the SGD and ALS-WR approach hits the > nail on the head. > > However there is a third way to factorize the rating matrix which we > haven't talked about yet. It's described in Yehuda Koren's > "Collaborative Filtering for Implicit Feedback Datasets" > http://research.yahoo.com/pub/2433 and I recently added it to > ParallelALSFactorizationJob. > > This approach works on implicit feedback data (like the number of > times a user watched a television series) and all unobserved > interactions are by definition 0. Using a standard SVD would result in > the problems Dmitriy described. > > But the paper introduces a very interesting approach: the user-item > matrix holds 0s and 1s only (0 in a cell if there have been no > interactions, 1 if there have been 1 or more interactions). This > matrix is decomposed into two other matrices X and Y (user and item > features) by minimizing the (regularized) squared error over all > observations (which is the same as in ALS-WR). However the error is > weighted by a confidence value that is very low if the user never > interacted with the item (because he simply might not be aware that > this item exists) and very high if the user interacted very often with > the item (a good indication of preference). That should help to avoid > the problems that Dmitriy described. > > --sebastian > > > 2011/11/17 Dmitriy Lyubimov <[EMAIL PROTECTED]>: > > On Thu, Nov 17, 2011 at 11:30 AM, Dmitriy Lyubimov <[EMAIL PROTECTED]> > wrote: > >> I will finish adding an option with Cholesky decomposition route to > >> SSVD some time early in Q1 2012. > >> > > > > PPS i already put some jobs in (they are in the trunk) for Cholesky > > route. I thought it would be an easy mod but then i saw that it would > > require a little bit more modifications to also support power > > iterations the same way they are supported today (and also i still > > kind of couldn't quite finish my thought process on what it would take > > to modify U-job to produce U without Q in his case, it seems this > > route will require a 100% special handling and i wouldn't be able to > > reuse any of current U job for this option. > > > > For these reasons, i decided to wait until i figure all of the > > remaining issues architecturally before i proceed. And that would > > better be a one longer chunk of time rather than several little > > chunks, which makes it dependent more on my schedule to figure where > > that chunk might be. > > >
-
Re: Understanding the SVD recommenderDmitriy Lyubimov 2011-11-29, 01:04
i meant, reduce to a task that uses mathematical SVD
On Mon, Nov 28, 2011 at 5:04 PM, Dmitriy Lyubimov <[EMAIL PROTECTED]> wrote: > Picking this old thread a little bit... > > I was wondering: is there a way to reduce task of ALS with regularization > (or even ALS-WR) to a mathematical definition of reduced rank SVD? > > If somehow i could work in a regularization under the SSVD hood, that > would solve most of iterative ills i think... But since i haven't seen math > around for it, i assume it is not very practical or even possible? > > Thanks. > -d > > > On Thu, Nov 17, 2011 at 12:38 PM, Sebastian Schelter <[EMAIL PROTECTED]>wrote: > >> I think Dmitriys description of the SGD and ALS-WR approach hits the >> nail on the head. >> >> However there is a third way to factorize the rating matrix which we >> haven't talked about yet. It's described in Yehuda Koren's >> "Collaborative Filtering for Implicit Feedback Datasets" >> http://research.yahoo.com/pub/2433 and I recently added it to >> ParallelALSFactorizationJob. >> >> This approach works on implicit feedback data (like the number of >> times a user watched a television series) and all unobserved >> interactions are by definition 0. Using a standard SVD would result in >> the problems Dmitriy described. >> >> But the paper introduces a very interesting approach: the user-item >> matrix holds 0s and 1s only (0 in a cell if there have been no >> interactions, 1 if there have been 1 or more interactions). This >> matrix is decomposed into two other matrices X and Y (user and item >> features) by minimizing the (regularized) squared error over all >> observations (which is the same as in ALS-WR). However the error is >> weighted by a confidence value that is very low if the user never >> interacted with the item (because he simply might not be aware that >> this item exists) and very high if the user interacted very often with >> the item (a good indication of preference). That should help to avoid >> the problems that Dmitriy described. >> >> --sebastian >> >> >> 2011/11/17 Dmitriy Lyubimov <[EMAIL PROTECTED]>: >> > On Thu, Nov 17, 2011 at 11:30 AM, Dmitriy Lyubimov <[EMAIL PROTECTED]> >> wrote: >> >> I will finish adding an option with Cholesky decomposition route to >> >> SSVD some time early in Q1 2012. >> >> >> > >> > PPS i already put some jobs in (they are in the trunk) for Cholesky >> > route. I thought it would be an easy mod but then i saw that it would >> > require a little bit more modifications to also support power >> > iterations the same way they are supported today (and also i still >> > kind of couldn't quite finish my thought process on what it would take >> > to modify U-job to produce U without Q in his case, it seems this >> > route will require a 100% special handling and i wouldn't be able to >> > reuse any of current U job for this option. >> > >> > For these reasons, i decided to wait until i figure all of the >> > remaining issues architecturally before i proceed. And that would >> > better be a one longer chunk of time rather than several little >> > chunks, which makes it dependent more on my schedule to figure where >> > that chunk might be. >> > >> > > |