|
Lance Norskog
2011-07-10, 02:07
Sean Owen
2011-07-10, 09:51
Ted Dunning
2011-07-10, 19:15
Lance Norskog
2011-07-12, 03:56
Lance Norskog
2011-08-10, 09:54
Lance Norskog
2011-08-11, 05:22
Sean Owen
2011-08-11, 06:53
Lance Norskog
2011-08-17, 08:52
Jeff Hansen
2011-08-25, 20:40
Jeff Hansen
2011-08-25, 20:53
Jake Mannix
2011-08-25, 20:57
Jeff Hansen
2011-08-25, 21:21
Sean Owen
2011-08-25, 21:50
Lance Norskog
2011-08-25, 22:07
Jeff Hansen
2011-08-25, 23:07
Ted Dunning
2011-08-26, 00:57
Lance Norskog
2011-08-26, 09:08
Sean Owen
2011-08-26, 09:21
Lance Norskog
2011-08-26, 09:33
Jeff Hansen
2011-08-26, 15:29
Ted Dunning
2011-08-26, 17:15
Lance Norskog
2011-08-27, 03:21
Jeff Hansen
2011-08-29, 20:34
Ted Dunning
2011-08-29, 22:28
Lance Norskog
2011-08-31, 08:24
Lance Norskog
2011-08-31, 08:31
Ted Dunning
2011-08-31, 15:04
Ted Dunning
2011-08-31, 15:10
Lance Norskog
2011-08-31, 22:57
Dmitriy Lyubimov
2011-08-31, 23:06
Dmitriy Lyubimov
2011-09-01, 00:24
Ted Dunning
2011-09-01, 03:28
Ted Dunning
2011-09-01, 03:32
Ken Krugler
2011-09-01, 03:50
Ted Dunning
2011-09-01, 03:57
Lance Norskog
2011-09-01, 05:55
|
-
Singular vectors of a recommendation Item-Item spaceLance Norskog 2011-07-10, 02:07
I would like to find the singular vectors of an item-item data model.
That is, the largest singular vector has many items "close" to it, and items at the ends are polar opposites in popularity. For example, the Netflix dataset yielded chick flicks v.s. Star Trek movies and Harry Potter v.s. (Stanley Kubrick, maybe?). To do this requires a matrix that then supplies these singular vectors. Which recommendation data structures or operations, or combinations thereof, can create a such a matrix? (This is what MAHOUT-752 does.) -- Lance Norskog [EMAIL PROTECTED]
-
Re: Singular vectors of a recommendation Item-Item spaceSean Owen 2011-07-10, 09:51
So it sounds like you want the SVD of the item-item similarity matrix? Sure,
you can use Mahout for that. If you are not in Hadoop land then look at SVDRecomnender to crib some related code. It is decomposing the user item matrix though. But for this special case of a symmetric matrix your singular vectors are the eigenvectors which you may find much easier to compute. I might restate the interpretation. The 'size' of these vectors is not what matters to your question. It is which elements (items) have the smallest vs largest values . On Jul 10, 2011 3:08 AM, "Lance Norskog" <[EMAIL PROTECTED]> wrote:
-
Re: Singular vectors of a recommendation Item-Item spaceTed Dunning 2011-07-10, 19:15
Also, item-item similarity is often (nearly) the result of a matrix product.
If yours is, then you can decompose the user x item matrix and the desired eigenvalues are the singular values squared and the eigen vectors are the right singular vectors for the decomposition. On Sun, Jul 10, 2011 at 2:51 AM, Sean Owen <[EMAIL PROTECTED]> wrote: > So it sounds like you want the SVD of the item-item similarity matrix? > Sure, > you can use Mahout for that. If you are not in Hadoop land then look at > SVDRecomnender to crib some related code. It is decomposing the user item > matrix though. > > But for this special case of a symmetric matrix your singular vectors are > the eigenvectors which you may find much easier to compute. > > I might restate the interpretation. > The 'size' of these vectors is not what matters to your question. It is > which elements (items) have the smallest vs largest values . > On Jul 10, 2011 3:08 AM, "Lance Norskog" <[EMAIL PROTECTED]> wrote: >
-
Re: Singular vectors of a recommendation Item-Item spaceLance Norskog 2011-07-12, 03:56
SVDRecommender is intriguing, thanks for the pointer.
On Sun, Jul 10, 2011 at 12:15 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > Also, item-item similarity is often (nearly) the result of a matrix product. > If yours is, then you can decompose the user x item matrix and the desired > eigenvalues are the singular values squared and the eigen vectors are the > right singular vectors for the decomposition. > > On Sun, Jul 10, 2011 at 2:51 AM, Sean Owen <[EMAIL PROTECTED]> wrote: > >> So it sounds like you want the SVD of the item-item similarity matrix? >> Sure, >> you can use Mahout for that. If you are not in Hadoop land then look at >> SVDRecomnender to crib some related code. It is decomposing the user item >> matrix though. >> >> But for this special case of a symmetric matrix your singular vectors are >> the eigenvectors which you may find much easier to compute. >> >> I might restate the interpretation. >> The 'size' of these vectors is not what matters to your question. It is >> which elements (items) have the smallest vs largest values . >> On Jul 10, 2011 3:08 AM, "Lance Norskog" <[EMAIL PROTECTED]> wrote: >> > -- Lance Norskog [EMAIL PROTECTED]
-
Re: Singular vectors of a recommendation Item-Item spaceLance Norskog 2011-08-10, 09:54
Zeroing in on the topic:
I have: 1) a set of raw input vectors of a given length, one for each item. Each value in the vectors are geometric, not bag-of-words or other. The matrix is [# items , # dimensions]. 2) An SVD of same: left matrix of [ # items, #d features per item] * singular vector[# features] * right matrix of [#dimensions features per dimension, #dimensions]. 3) The first few columns of the left matrix are interesting singular eigenvectors. I would like to: 1) relate the singular vectors to the item vectors, such that they create points in the "hot spots" of the item vectors. 2) find the inverses: a singular vector has two endpoints, and both represent "hot spots" in the item space. Given the first 3 singular vectors, there are 6 "hot spots" in the item vectors, one for each end of the vector. What transforms are needed to get the item vectors and the singular vector endpoints in the same space? I'm not finding the exact sequence. A use case for this is a new user. It gives a quick assessment by asking where the user is on the few common axes of items: "Transformers 3: The Stupiding" v.s. "Crazy Bride Wedding Love Planner"? On Mon, Jul 11, 2011 at 8:56 PM, Lance Norskog <[EMAIL PROTECTED]> wrote: > SVDRecommender is intriguing, thanks for the pointer. > > On Sun, Jul 10, 2011 at 12:15 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: >> Also, item-item similarity is often (nearly) the result of a matrix product. >> If yours is, then you can decompose the user x item matrix and the desired >> eigenvalues are the singular values squared and the eigen vectors are the >> right singular vectors for the decomposition. >> >> On Sun, Jul 10, 2011 at 2:51 AM, Sean Owen <[EMAIL PROTECTED]> wrote: >> >>> So it sounds like you want the SVD of the item-item similarity matrix? >>> Sure, >>> you can use Mahout for that. If you are not in Hadoop land then look at >>> SVDRecomnender to crib some related code. It is decomposing the user item >>> matrix though. >>> >>> But for this special case of a symmetric matrix your singular vectors are >>> the eigenvectors which you may find much easier to compute. >>> >>> I might restate the interpretation. >>> The 'size' of these vectors is not what matters to your question. It is >>> which elements (items) have the smallest vs largest values . >>> On Jul 10, 2011 3:08 AM, "Lance Norskog" <[EMAIL PROTECTED]> wrote: >>> >> > > > > -- > Lance Norskog > [EMAIL PROTECTED] > -- Lance Norskog [EMAIL PROTECTED]
-
Re: Singular vectors of a recommendation Item-Item spaceLance Norskog 2011-08-11, 05:22
A picture that might help explain the problem:
http://www.flickr.com/photos/54866255@N00/6031564308/in/photostream On 8/10/11, Lance Norskog <[EMAIL PROTECTED]> wrote: > Zeroing in on the topic: > > I have: > 1) a set of raw input vectors of a given length, one for each item. > Each value in the vectors are geometric, not bag-of-words or other. > The matrix is [# items , # dimensions]. > 2) An SVD of same: > left matrix of [ # items, #d features per item] * singular > vector[# features] * right matrix of [#dimensions features per > dimension, #dimensions]. > 3) The first few columns of the left matrix are interesting singular > eigenvectors. > > I would like to: > 1) relate the singular vectors to the item vectors, such that they > create points in the "hot spots" of the item vectors. > 2) find the inverses: a singular vector has two endpoints, and both > represent "hot spots" in the item space. > > Given the first 3 singular vectors, there are 6 "hot spots" in the > item vectors, one for each end of the vector. What transforms are > needed to get the item vectors and the singular vector endpoints in > the same space? I'm not finding the exact sequence. > > A use case for this is a new user. It gives a quick assessment by > asking where the user is on the few common axes of items: > "Transformers 3: The Stupiding" v.s. "Crazy Bride Wedding Love > Planner"? > > On Mon, Jul 11, 2011 at 8:56 PM, Lance Norskog <[EMAIL PROTECTED]> wrote: >> SVDRecommender is intriguing, thanks for the pointer. >> >> On Sun, Jul 10, 2011 at 12:15 PM, Ted Dunning <[EMAIL PROTECTED]> >> wrote: >>> Also, item-item similarity is often (nearly) the result of a matrix >>> product. >>> If yours is, then you can decompose the user x item matrix and the >>> desired >>> eigenvalues are the singular values squared and the eigen vectors are >>> the >>> right singular vectors for the decomposition. >>> >>> On Sun, Jul 10, 2011 at 2:51 AM, Sean Owen <[EMAIL PROTECTED]> wrote: >>> >>>> So it sounds like you want the SVD of the item-item similarity matrix? >>>> Sure, >>>> you can use Mahout for that. If you are not in Hadoop land then look at >>>> SVDRecomnender to crib some related code. It is decomposing the user >>>> item >>>> matrix though. >>>> >>>> But for this special case of a symmetric matrix your singular vectors >>>> are >>>> the eigenvectors which you may find much easier to compute. >>>> >>>> I might restate the interpretation. >>>> The 'size' of these vectors is not what matters to your question. It is >>>> which elements (items) have the smallest vs largest values . >>>> On Jul 10, 2011 3:08 AM, "Lance Norskog" <[EMAIL PROTECTED]> wrote: >>>> >>> >> >> >> >> -- >> Lance Norskog >> [EMAIL PROTECTED] >> > > > > -- > Lance Norskog > [EMAIL PROTECTED] > -- Lance Norskog [EMAIL PROTECTED]
-
Re: Singular vectors of a recommendation Item-Item spaceSean Owen 2011-08-11, 06:53
You may need to sharpen your terms / problem statement here :
What is a geometric value -- just mean a continuous real value? So these are item-feature vectors? The middle bit of the output of an SVD is not a singular vector -- it's a diagonal matrix containing singular values on the diagonal. The left matrix contains singular vectors, which are not eigenvectors except in very specific cases of the original matrix. Singular vectors are the columns of the left matrix, not rows, whereas items corresponds to its rows. What do you mean about relating them? What do you mean by the "hot spot" you are trying to find? A vector does not express two end-points, no. You could think of (X,Y) as corresponding to a point in 2-space, or could think of it as a ray from (0,0) to (X,Y), but you could think of it as (100,200) to (100+X,200+Y) just as well. There are not two point implied by anything here. How do you get points from the original item-feature space into the transformed, reduced space? While I think this is an imprecise answer: if A = U Sigma V^T then you can think of (Sigma V^T) as like the change-of-basis transformation that does this. On Wed, Aug 10, 2011 at 10:54 AM, Lance Norskog <[EMAIL PROTECTED]> wrote: > Zeroing in on the topic: > > I have: > 1) a set of raw input vectors of a given length, one for each item. > Each value in the vectors are geometric, not bag-of-words or other. > The matrix is [# items , # dimensions]. > 2) An SVD of same: > left matrix of [ # items, #d features per item] * singular > vector[# features] * right matrix of [#dimensions features per > dimension, #dimensions]. > 3) The first few columns of the left matrix are interesting singular > eigenvectors. > > I would like to: > 1) relate the singular vectors to the item vectors, such that they > create points in the "hot spots" of the item vectors. > 2) find the inverses: a singular vector has two endpoints, and both > represent "hot spots" in the item space. > > Given the first 3 singular vectors, there are 6 "hot spots" in the > item vectors, one for each end of the vector. What transforms are > needed to get the item vectors and the singular vector endpoints in > the same space? I'm not finding the exact sequence. > > A use case for this is a new user. It gives a quick assessment by > asking where the user is on the few common axes of items: > "Transformers 3: The Stupiding" v.s. "Crazy Bride Wedding Love > Planner"? > > -- > Lance Norskog > [EMAIL PROTECTED] >
-
Re: Singular vectors of a recommendation Item-Item spaceLance Norskog 2011-08-17, 08:52
Sharpened:
http://ultrawhizbang.blogspot.com/2011/08/singular-vectors-for-recommendations.html On Wed, Aug 10, 2011 at 11:53 PM, Sean Owen <[EMAIL PROTECTED]> wrote: > You may need to sharpen your terms / problem statement here : > > What is a geometric value -- just mean a continuous real value? > So these are item-feature vectors? > > The middle bit of the output of an SVD is not a singular vector -- it's a > diagonal matrix containing singular values on the diagonal. > The left matrix contains singular vectors, which are not eigenvectors except > in very specific cases of the original matrix. > > Singular vectors are the columns of the left matrix, not rows, whereas items > corresponds to its rows. What do you mean about relating them? > What do you mean by the "hot spot" you are trying to find? > A vector does not express two end-points, no. You could think of (X,Y) as > corresponding to a point in 2-space, or could think of it as a ray from > (0,0) to (X,Y), but you could think of it as (100,200) to (100+X,200+Y) just > as well. There are not two point implied by anything here. > > > How do you get points from the original item-feature space into the > transformed, reduced space? While I think this is an imprecise answer: if A > = U Sigma V^T then you can think of (Sigma V^T) as like the change-of-basis > transformation that does this. > > > On Wed, Aug 10, 2011 at 10:54 AM, Lance Norskog <[EMAIL PROTECTED]> wrote: > >> Zeroing in on the topic: >> >> I have: >> 1) a set of raw input vectors of a given length, one for each item. >> Each value in the vectors are geometric, not bag-of-words or other. >> The matrix is [# items , # dimensions]. >> 2) An SVD of same: >> left matrix of [ # items, #d features per item] * singular >> vector[# features] * right matrix of [#dimensions features per >> dimension, #dimensions]. >> 3) The first few columns of the left matrix are interesting singular >> eigenvectors. >> >> I would like to: >> 1) relate the singular vectors to the item vectors, such that they >> create points in the "hot spots" of the item vectors. >> 2) find the inverses: a singular vector has two endpoints, and both >> represent "hot spots" in the item space. >> >> Given the first 3 singular vectors, there are 6 "hot spots" in the >> item vectors, one for each end of the vector. What transforms are >> needed to get the item vectors and the singular vector endpoints in >> the same space? I'm not finding the exact sequence. >> >> A use case for this is a new user. It gives a quick assessment by >> asking where the user is on the few common axes of items: >> "Transformers 3: The Stupiding" v.s. "Crazy Bride Wedding Love >> Planner"? >> >> -- >> Lance Norskog >> [EMAIL PROTECTED] >> > -- Lance Norskog [EMAIL PROTECTED]
-
Re: Singular vectors of a recommendation Item-Item spaceJeff Hansen 2011-08-25, 20:40
I've been playing around with this problem for the last week or so (or at
least this problem as I understood it based on your initial commentary Lance) -- but purely in R using smaller data so I can 1. get my head wrapped around the problem, and 2. get more familiar with R. To make the problem a little more tenable I limited my sample to 200 movies and 10,000 users (taking the most rated movies from 2004 and 2005 based on NF's dataset -- I know, I should really switch back to the grouplens dataset...) I'm also only looking at binary data at the moment -- I treat any rating above 3 as a movie you liked and anything 3 or below as the same as not having rated the movie. So I take this 200 x 10,000 matrix of 1s and 0s and I run a truncated SVD on it so that I can project it onto a 10 dimensional space. M<-initial data s_m<- svd(M,10,10) U<-s_m$u S<-diag(s_m$d[1:10]) V<-s_m$v So U is a 200 row by 10 column matrix -- each row represents the eigenvector of a given movie, and each column represents one Lance's so called axes of interest. So what I did next was spit out the top and bottom n movie titles for each of these 10 dimensions. I found it was important to show more than one movie title for each side of the dimensions, otherwise the results might be somewhat misleading. I then went through the 10 dimensions and qualitatively answered for myself whether I was strongly or weakly aligned in one direction, or not aligned in anyway on this dimension. Personally I usually found I only felt strongly aligned on 2 of the ten, and weakly aligned on another 2. I then normalized U across each of the ten dimensions and for each movie added up it's z score in that dimension by my alignment in that dimension. I then sorted the results and displayed the movie titles -- it was a pretty accurate ranking of movies as I like them. scaled <- apply(U,2,scale) me <- c(0,2,1,0,-1,1,0,0,0,0) dim(me) <- c(10,1) recommendations <- scaled %*% me I imagine few users would want to bother, but I can see where it would be a relatively quick way to train a recommender. Here's the problem though -- I can get it to work using the method I've described above, but I can't quite figure out how to use it to generate an eigenvector for the user. For existing users I can always generate predictions by matrix multiplying U %*% S %*% t(V)[,user] and then sorting by the results. It would be nice to use a consistent model. I can't quite see the math to generate an equivalent equation though. On Wed, Aug 17, 2011 at 3:52 AM, Lance Norskog <[EMAIL PROTECTED]> wrote: > Sharpened: > > > http://ultrawhizbang.blogspot.com/2011/08/singular-vectors-for-recommendations.html > > On Wed, Aug 10, 2011 at 11:53 PM, Sean Owen <[EMAIL PROTECTED]> wrote: > > You may need to sharpen your terms / problem statement here : > > > > What is a geometric value -- just mean a continuous real value? > > So these are item-feature vectors? > > > > The middle bit of the output of an SVD is not a singular vector -- it's a > > diagonal matrix containing singular values on the diagonal. > > The left matrix contains singular vectors, which are not eigenvectors > except > > in very specific cases of the original matrix. > > > > Singular vectors are the columns of the left matrix, not rows, whereas > items > > corresponds to its rows. What do you mean about relating them? > > What do you mean by the "hot spot" you are trying to find? > > A vector does not express two end-points, no. You could think of (X,Y) as > > corresponding to a point in 2-space, or could think of it as a ray from > > (0,0) to (X,Y), but you could think of it as (100,200) to (100+X,200+Y) > just > > as well. There are not two point implied by anything here. > > > > > > How do you get points from the original item-feature space into the > > transformed, reduced space? While I think this is an imprecise answer: if > A > > = U Sigma V^T then you can think of (Sigma V^T) as like the > change-of-basis > > transformation that does this.
-
Re: Singular vectors of a recommendation Item-Item spaceJeff Hansen 2011-08-25, 20:53
By the way, please ignore my use of the term eigenvector -- I have a feeling
I completely misused it. I've never quite understood the concept, but to me that truncated 10 value long vector that corresponds to a movie seems to be "characteristic" of it (which is what the language eigen was always intended to convey. On Thu, Aug 25, 2011 at 3:40 PM, Jeff Hansen <[EMAIL PROTECTED]> wrote: > I've been playing around with this problem for the last week or so (or at > least this problem as I understood it based on your initial commentary > Lance) -- but purely in R using smaller data so I can 1. get my head wrapped > around the problem, and 2. get more familiar with R. > > To make the problem a little more tenable I limited my sample to 200 movies > and 10,000 users (taking the most rated movies from 2004 and 2005 based on > NF's dataset -- I know, I should really switch back to the grouplens > dataset...) I'm also only looking at binary data at the moment -- I treat > any rating above 3 as a movie you liked and anything 3 or below as the same > as not having rated the movie. > > So I take this 200 x 10,000 matrix of 1s and 0s and I run a truncated SVD > on it so that I can project it onto a 10 dimensional space. > > M<-initial data > s_m<- svd(M,10,10) > U<-s_m$u > S<-diag(s_m$d[1:10]) > V<-s_m$v > > So U is a 200 row by 10 column matrix -- each row represents the > eigenvector of a given movie, and each column represents one Lance's so > called axes of interest. So what I did next was spit out the top and bottom > n movie titles for each of these 10 dimensions. I found it was important to > show more than one movie title for each side of the dimensions, otherwise > the results might be somewhat misleading. > > I then went through the 10 dimensions and qualitatively answered for > myself whether I was strongly or weakly aligned in one direction, or not > aligned in anyway on this dimension. Personally I usually found I only felt > strongly aligned on 2 of the ten, and weakly aligned on another 2. > > I then normalized U across each of the ten dimensions and for each movie > added up it's z score in that dimension by my alignment in that dimension. > I then sorted the results and displayed the movie titles -- it was a pretty > accurate ranking of movies as I like them. > > scaled <- apply(U,2,scale) > me <- c(0,2,1,0,-1,1,0,0,0,0) > dim(me) <- c(10,1) > recommendations <- scaled %*% me > > I imagine few users would want to bother, but I can see where it would be a > relatively quick way to train a recommender. Here's the problem though -- I > can get it to work using the method I've described above, but I can't quite > figure out how to use it to generate an eigenvector for the user. For > existing users I can always generate predictions by matrix multiplying U %*% > S %*% t(V)[,user] and then sorting by the results. It would be nice to use > a consistent model. I can't quite see the math to generate an equivalent > equation though. > > On Wed, Aug 17, 2011 at 3:52 AM, Lance Norskog <[EMAIL PROTECTED]> wrote: > >> Sharpened: >> >> >> http://ultrawhizbang.blogspot.com/2011/08/singular-vectors-for-recommendations.html >> >> On Wed, Aug 10, 2011 at 11:53 PM, Sean Owen <[EMAIL PROTECTED]> wrote: >> > You may need to sharpen your terms / problem statement here : >> > >> > What is a geometric value -- just mean a continuous real value? >> > So these are item-feature vectors? >> > >> > The middle bit of the output of an SVD is not a singular vector -- it's >> a >> > diagonal matrix containing singular values on the diagonal. >> > The left matrix contains singular vectors, which are not eigenvectors >> except >> > in very specific cases of the original matrix. >> > >> > Singular vectors are the columns of the left matrix, not rows, whereas >> items >> > corresponds to its rows. What do you mean about relating them? >> > What do you mean by the "hot spot" you are trying to find? >> > A vector does not express two end-points, no. You could think of (X,Y)
-
Re: Singular vectors of a recommendation Item-Item spaceJake Mannix 2011-08-25, 20:57
On Thu, Aug 25, 2011 at 1:53 PM, Jeff Hansen <[EMAIL PROTECTED]> wrote:
> By the way, please ignore my use of the term eigenvector -- I have a > feeling > I completely misused it. I've never quite understood the concept, but to > me > that truncated 10 value long vector that corresponds to a movie seems to be > "characteristic" of it (which is what the language eigen was always > intended > to convey. > It actually *is* an eigenvector, you're not wrong. In fact, singular vectors *are* eigenvectors, in general. If you're a singular vector of matrix A, then you're the eigenvector of either A'A, or AA' (depending on whether you're a left or right eigenvector). -jake > > On Thu, Aug 25, 2011 at 3:40 PM, Jeff Hansen <[EMAIL PROTECTED]> wrote: > > > I've been playing around with this problem for the last week or so (or at > > least this problem as I understood it based on your initial commentary > > Lance) -- but purely in R using smaller data so I can 1. get my head > wrapped > > around the problem, and 2. get more familiar with R. > > > > To make the problem a little more tenable I limited my sample to 200 > movies > > and 10,000 users (taking the most rated movies from 2004 and 2005 based > on > > NF's dataset -- I know, I should really switch back to the grouplens > > dataset...) I'm also only looking at binary data at the moment -- I > treat > > any rating above 3 as a movie you liked and anything 3 or below as the > same > > as not having rated the movie. > > > > So I take this 200 x 10,000 matrix of 1s and 0s and I run a truncated SVD > > on it so that I can project it onto a 10 dimensional space. > > > > M<-initial data > > s_m<- svd(M,10,10) > > U<-s_m$u > > S<-diag(s_m$d[1:10]) > > V<-s_m$v > > > > So U is a 200 row by 10 column matrix -- each row represents the > > eigenvector of a given movie, and each column represents one Lance's so > > called axes of interest. So what I did next was spit out the top and > bottom > > n movie titles for each of these 10 dimensions. I found it was important > to > > show more than one movie title for each side of the dimensions, otherwise > > the results might be somewhat misleading. > > > > I then went through the 10 dimensions and qualitatively answered for > > myself whether I was strongly or weakly aligned in one direction, or not > > aligned in anyway on this dimension. Personally I usually found I only > felt > > strongly aligned on 2 of the ten, and weakly aligned on another 2. > > > > I then normalized U across each of the ten dimensions and for each movie > > added up it's z score in that dimension by my alignment in that > dimension. > > I then sorted the results and displayed the movie titles -- it was a > pretty > > accurate ranking of movies as I like them. > > > > scaled <- apply(U,2,scale) > > me <- c(0,2,1,0,-1,1,0,0,0,0) > > dim(me) <- c(10,1) > > recommendations <- scaled %*% me > > > > I imagine few users would want to bother, but I can see where it would be > a > > relatively quick way to train a recommender. Here's the problem though > -- I > > can get it to work using the method I've described above, but I can't > quite > > figure out how to use it to generate an eigenvector for the user. For > > existing users I can always generate predictions by matrix multiplying U > %*% > > S %*% t(V)[,user] and then sorting by the results. It would be nice to > use > > a consistent model. I can't quite see the math to generate an equivalent > > equation though. > > > > On Wed, Aug 17, 2011 at 3:52 AM, Lance Norskog <[EMAIL PROTECTED]> > wrote: > > > >> Sharpened: > >> > >> > >> > http://ultrawhizbang.blogspot.com/2011/08/singular-vectors-for-recommendations.html > >> > >> On Wed, Aug 10, 2011 at 11:53 PM, Sean Owen <[EMAIL PROTECTED]> wrote: > >> > You may need to sharpen your terms / problem statement here : > >> > > >> > What is a geometric value -- just mean a continuous real value? > >> > So these are item-feature vectors? > >> > > >> > The middle bit of the output of an SVD is not a singular vector --
-
Re: Singular vectors of a recommendation Item-Item spaceJeff Hansen 2011-08-25, 21:21
Well, I think my problem may have had more to do with what I was calling the
eigenvector... I was referring to the rows rather than the columns of U and V. While the columns may be characteristic of the overall matrix, the rows are characteristic of the user or item (in that they are a rank reduced representation of that person or thing). I guess you could say I just had to tilt my head to the side and change my perspective 90 degrees =) On Thu, Aug 25, 2011 at 3:57 PM, Jake Mannix <[EMAIL PROTECTED]> wrote: > On Thu, Aug 25, 2011 at 1:53 PM, Jeff Hansen <[EMAIL PROTECTED]> wrote: > > > By the way, please ignore my use of the term eigenvector -- I have a > > feeling > > I completely misused it. I've never quite understood the concept, but to > > me > > that truncated 10 value long vector that corresponds to a movie seems to > be > > "characteristic" of it (which is what the language eigen was always > > intended > > to convey. > > > > It actually *is* an eigenvector, you're not wrong. > > In fact, singular vectors *are* eigenvectors, in general. If you're a > singular vector > of matrix A, then you're the eigenvector of either A'A, or AA' (depending > on > whether you're a left or right eigenvector). > > -jake > > > > > > On Thu, Aug 25, 2011 at 3:40 PM, Jeff Hansen <[EMAIL PROTECTED]> wrote: > > > > > I've been playing around with this problem for the last week or so (or > at > > > least this problem as I understood it based on your initial commentary > > > Lance) -- but purely in R using smaller data so I can 1. get my head > > wrapped > > > around the problem, and 2. get more familiar with R. > > > > > > To make the problem a little more tenable I limited my sample to 200 > > movies > > > and 10,000 users (taking the most rated movies from 2004 and 2005 based > > on > > > NF's dataset -- I know, I should really switch back to the grouplens > > > dataset...) I'm also only looking at binary data at the moment -- I > > treat > > > any rating above 3 as a movie you liked and anything 3 or below as the > > same > > > as not having rated the movie. > > > > > > So I take this 200 x 10,000 matrix of 1s and 0s and I run a truncated > SVD > > > on it so that I can project it onto a 10 dimensional space. > > > > > > M<-initial data > > > s_m<- svd(M,10,10) > > > U<-s_m$u > > > S<-diag(s_m$d[1:10]) > > > V<-s_m$v > > > > > > So U is a 200 row by 10 column matrix -- each row represents the > > > eigenvector of a given movie, and each column represents one Lance's so > > > called axes of interest. So what I did next was spit out the top and > > bottom > > > n movie titles for each of these 10 dimensions. I found it was > important > > to > > > show more than one movie title for each side of the dimensions, > otherwise > > > the results might be somewhat misleading. > > > > > > I then went through the 10 dimensions and qualitatively answered for > > > myself whether I was strongly or weakly aligned in one direction, or > not > > > aligned in anyway on this dimension. Personally I usually found I only > > felt > > > strongly aligned on 2 of the ten, and weakly aligned on another 2. > > > > > > I then normalized U across each of the ten dimensions and for each > movie > > > added up it's z score in that dimension by my alignment in that > > dimension. > > > I then sorted the results and displayed the movie titles -- it was a > > pretty > > > accurate ranking of movies as I like them. > > > > > > scaled <- apply(U,2,scale) > > > me <- c(0,2,1,0,-1,1,0,0,0,0) > > > dim(me) <- c(10,1) > > > recommendations <- scaled %*% me > > > > > > I imagine few users would want to bother, but I can see where it would > be > > a > > > relatively quick way to train a recommender. Here's the problem though > > -- I > > > can get it to work using the method I've described above, but I can't > > quite > > > figure out how to use it to generate an eigenvector for the user. For > > > existing users I can always generate predictions by matrix multiplying
-
Re: Singular vectors of a recommendation Item-Item spaceSean Owen 2011-08-25, 21:50
The 200x10 matrix is indeed a matrix of 10 singular vectors, which are
eigenvectors of AA'. It's the columns, not rows, that are eigenvectors. The rows do mean something. I think it's fair to interpret the 10 singular values / vectors as corresponding to some underlying features of tastes. The rows say how much each user expresses those 10 tastes. The matrix of right singular vectors on the other side tells you the same thing about items. The diagonal matrix of singular values in the middle also comes into play -- it's like a set of multipliers that say how important each feature is. (This is why we cut out the singular vectors / values that have the smallest singular values; it's like removing the least-important features.) So really you'd have to stick those values somewhere; Ted says it's conventional to put "half" of each (their square roots) with each side if anything. I don't have as good a grasp on an intuition for the columns as eigenvectors. They're also a set of basis vectors, and I had understood them as like the "real" bases of the reduced feature space expressed in user-item space. But I'd have to go back and think that intuition through again since I can't really justify it from scratch again in my head just now. On Thu, Aug 25, 2011 at 10:21 PM, Jeff Hansen <[EMAIL PROTECTED]> wrote: > Well, I think my problem may have had more to do with what I was calling the > eigenvector... I was referring to the rows rather than the columns of U and > V. While the columns may be characteristic of the overall matrix, the rows > are characteristic of the user or item (in that they are a rank reduced > representation of that person or thing). I guess you could say I just had to > tilt my head to the side and change my perspective 90 degrees =) >
-
Re: Singular vectors of a recommendation Item-Item spaceLance Norskog 2011-08-25, 22:07
If you take the item vector an existing user, multiply that by the
left-hand SVD matrix, and multthe resulting vector[i] * 1/singularvalues[i], you should get the item's row in the left-hand column. So, the left-hand column times 1/singular values gives you the projection for a new user's item vector >into this space of users<. Which gives you a user-user recommender for new users. On Thu, Aug 25, 2011 at 2:50 PM, Sean Owen <[EMAIL PROTECTED]> wrote: > The 200x10 matrix is indeed a matrix of 10 singular vectors, which are > eigenvectors of AA'. It's the columns, not rows, that are > eigenvectors. > > The rows do mean something. I think it's fair to interpret the 10 > singular values / vectors as corresponding to some underlying features > of tastes. The rows say how much each user expresses those 10 tastes. > The matrix of right singular vectors on the other side tells you the > same thing about items. The diagonal matrix of singular values in the > middle also comes into play -- it's like a set of multipliers that say > how important each feature is. (This is why we cut out the singular > vectors / values that have the smallest singular values; it's like > removing the least-important features.) So really you'd have to stick > those values somewhere; Ted says it's conventional to put "half" of > each (their square roots) with each side if anything. > > I don't have as good a grasp on an intuition for the columns as > eigenvectors. They're also a set of basis vectors, and I had > understood them as like the "real" bases of the reduced feature space > expressed in user-item space. But I'd have to go back and think that > intuition through again since I can't really justify it from scratch > again in my head just now. > > > On Thu, Aug 25, 2011 at 10:21 PM, Jeff Hansen <[EMAIL PROTECTED]> wrote: >> Well, I think my problem may have had more to do with what I was calling the >> eigenvector... I was referring to the rows rather than the columns of U and >> V. While the columns may be characteristic of the overall matrix, the rows >> are characteristic of the user or item (in that they are a rank reduced >> representation of that person or thing). I guess you could say I just had to >> tilt my head to the side and change my perspective 90 degrees =) >> > -- Lance Norskog [EMAIL PROTECTED]
-
Re: Singular vectors of a recommendation Item-Item spaceJeff Hansen 2011-08-25, 23:07
One thing I found interesting (but not particularly surprising) is that the
biggest singular value/vector was pretty much tied directly to volume. That makes sense because the best predictor of whether a given fields value was 1 was whether it belonged to a row with lots of 1s or a column with lots of 1s (I haven't quite figured out the best method to normalize the values with yet). When I plot the values of the largest singular vectors against the sum of values in the corresponding row or column, the correlation is very linear. That's not the case for any of the other singular vectors (which actually makes me wonder if just removing that first singular vector from the prediction might not be one of the best ways to normalize the data). I understand what you're saying about each singular vector corresponding to a feature though. Each left singular vector represents some abstract aspect of a movie and each right singular vector represents users leanings or inclinations with regards to that aspect of the movie. The singular value itself just seems to indicate how good a predictor the combination of one users inclination toward that aspect of a movie is for coming up with the actual value. The issue I mentioned above is that popularity of a movie as well as how often a user watches movies tend to be the best predictors of whether a user has seen or will see a movie. I had been picturing this with the idea of one k dimensional space -- one where a users location corresponds to their ideal prototypical movies location. Not that there would be a movie right there, but there would be ones nearby, and the nearer they were, the more enjoyable they would be. That's a naive model, but that doesn't mean it wouldn't work well enough. My problem is I don't quite get how to map the user space over to the item space. I think that may be what Lance is trying to describe in his last response, but I'm falling short on reconstructing the math from his description. I get the following U S V* = A U S = A V U = A V 1/S I think that last line is what Lance was describing. Of course my problem was bootstrapping in a user for whom I don't know A or V. I also think I may have missed a big step of the puzzle. For some reason I thought that just by loosening the rank, you could recompose the Matrix A from the truncated SVD values/vectors and use the recomposed values themselves as the recommendation. I thought one of the ideas was that the recomposed matrix had less "noise" and could be a better representation of the underlying nature of the matrix than the original matrix itself. But that may have just been wishful thinking... On Thu, Aug 25, 2011 at 4:50 PM, Sean Owen <[EMAIL PROTECTED]> wrote: > The 200x10 matrix is indeed a matrix of 10 singular vectors, which are > eigenvectors of AA'. It's the columns, not rows, that are > eigenvectors. > > The rows do mean something. I think it's fair to interpret the 10 > singular values / vectors as corresponding to some underlying features > of tastes. The rows say how much each user expresses those 10 tastes. > The matrix of right singular vectors on the other side tells you the > same thing about items. The diagonal matrix of singular values in the > middle also comes into play -- it's like a set of multipliers that say > how important each feature is. (This is why we cut out the singular > vectors / values that have the smallest singular values; it's like > removing the least-important features.) So really you'd have to stick > those values somewhere; Ted says it's conventional to put "half" of > each (their square roots) with each side if anything. > > I don't have as good a grasp on an intuition for the columns as > eigenvectors. They're also a set of basis vectors, and I had > understood them as like the "real" bases of the reduced feature space > expressed in user-item space. But I'd have to go back and think that > intuition through again since I can't really justify it from scratch
-
Re: Singular vectors of a recommendation Item-Item spaceTed Dunning 2011-08-26, 00:57
In matrix terms the binary user x item matrix maps a set of items to users
(A h = users who interacted with items in h). Similarly A' maps users to items. Thus A' (A h) is the classic "users who x'ed this also x'ed that" sort of operation. This can be rearranged to be (A'A) h. This is where the singular values come in. If A \approx U S V' and we know that U'U = V'V = I from the definition of the SVD. This means that A' A \approx (U S V')' (U S V') = V S U' U S V' = V S^2 V' But V' transforms an item vector into the reduced dimensional space so we can view this as transforming the item vector into the reduced dimensional space, scaling element-wise using S^2 and then transforming back to item space. As you noted, the first right singular vector corresponds to popularity. It is often not appropriate to just recommend popular things (since users have other means of discovering them) so you might want to drop that dimension from the analysis. This can be done in the SVD results or it can be done using something like a log-likelihood ratio test so that we only model anomalously large values in A'A. Btw, if you continue to use R for experimentation, you should be able to dramatically increase the size of the analysis you do by using sparse matrices and a random projection algorithm. I was able to compute SVD's of matrices with millions of rows and columns and a few million non-zero elements in a few seconds. Holler if you would like details about how to do this. On Thu, Aug 25, 2011 at 4:07 PM, Jeff Hansen <[EMAIL PROTECTED]> wrote: > One thing I found interesting (but not particularly surprising) is that the > biggest singular value/vector was pretty much tied directly to volume. > That > makes sense because the best predictor of whether a given fields value was > 1 > was whether it belonged to a row with lots of 1s or a column with lots of > 1s > (I haven't quite figured out the best method to normalize the values with > yet). When I plot the values of the largest singular vectors against the > sum of values in the corresponding row or column, the correlation is very > linear. That's not the case for any of the other singular vectors (which > actually makes me wonder if just removing that first singular vector from > the prediction might not be one of the best ways to normalize the data). > > I understand what you're saying about each singular vector corresponding to > a feature though. Each left singular vector represents some abstract > aspect > of a movie and each right singular vector represents users leanings or > inclinations with regards to that aspect of the movie. The singular value > itself just seems to indicate how good a predictor the combination of one > users inclination toward that aspect of a movie is for coming up with the > actual value. The issue I mentioned above is that popularity of a movie as > well as how often a user watches movies tend to be the best predictors of > whether a user has seen or will see a movie. > > I had been picturing this with the idea of one k dimensional space -- one > where a users location corresponds to their ideal prototypical movies > location. Not that there would be a movie right there, but there would be > ones nearby, and the nearer they were, the more enjoyable they would be. > That's a naive model, but that doesn't mean it wouldn't work well enough. > My problem is I don't quite get how to map the user space over to the item > space. > > I think that may be what Lance is trying to describe in his last response, > but I'm falling short on reconstructing the math from his description. > > I get the following > > U S V* = A > U S = A V > U = A V 1/S > > I think that last line is what Lance was describing. Of course my problem > was bootstrapping in a user for whom I don't know A or V. > > I also think I may have missed a big step of the puzzle. For some reason I > thought that just by loosening the rank, you could recompose the Matrix A
-
Re: Singular vectors of a recommendation Item-Item spaceLance Norskog 2011-08-26, 09:08
This is a little meditation on user v.s. item matrix density. The
heavy users and heavy items can be subsampled, once they are identified. Hadoop's built-in sort does give a very simple "map-increase" way to do this sort. http://ultrawhizbang.blogspot.com/2011/08/sorted-recommender-data.html On Thu, Aug 25, 2011 at 5:57 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > In matrix terms the binary user x item matrix maps a set of items to users > (A h = users who interacted with items in h). Similarly A' maps users to > items. Thus A' (A h) is the classic "users who x'ed this also x'ed that" > sort of operation. This can be rearranged to be (A'A) h. > > This is where the singular values come in. If > > A \approx U S V' > > and we know that U'U = V'V = I from the definition of the SVD. This means > that > > A' A \approx (U S V')' (U S V') = V S U' U S V' = V S^2 V' > > But V' transforms an item vector into the reduced dimensional space so we > can view this as transforming the item vector into the reduced dimensional > space, scaling element-wise using S^2 and then transforming back to item > space. > > As you noted, the first right singular vector corresponds to popularity. It > is often not appropriate to just recommend popular things (since users have > other means of discovering them) so you might want to drop that dimension > from the analysis. This can be done in the SVD results or it can be done > using something like a log-likelihood ratio test so that we only model > anomalously large values in A'A. > > Btw, if you continue to use R for experimentation, you should be able to > dramatically increase the size of the analysis you do by using sparse > matrices and a random projection algorithm. I was able to compute SVD's of > matrices with millions of rows and columns and a few million non-zero > elements in a few seconds. Holler if you would like details about how to do > this. > > On Thu, Aug 25, 2011 at 4:07 PM, Jeff Hansen <[EMAIL PROTECTED]> wrote: > >> One thing I found interesting (but not particularly surprising) is that the >> biggest singular value/vector was pretty much tied directly to volume. >> That >> makes sense because the best predictor of whether a given fields value was >> 1 >> was whether it belonged to a row with lots of 1s or a column with lots of >> 1s >> (I haven't quite figured out the best method to normalize the values with >> yet). When I plot the values of the largest singular vectors against the >> sum of values in the corresponding row or column, the correlation is very >> linear. That's not the case for any of the other singular vectors (which >> actually makes me wonder if just removing that first singular vector from >> the prediction might not be one of the best ways to normalize the data). >> >> I understand what you're saying about each singular vector corresponding to >> a feature though. Each left singular vector represents some abstract >> aspect >> of a movie and each right singular vector represents users leanings or >> inclinations with regards to that aspect of the movie. The singular value >> itself just seems to indicate how good a predictor the combination of one >> users inclination toward that aspect of a movie is for coming up with the >> actual value. The issue I mentioned above is that popularity of a movie as >> well as how often a user watches movies tend to be the best predictors of >> whether a user has seen or will see a movie. >> >> I had been picturing this with the idea of one k dimensional space -- one >> where a users location corresponds to their ideal prototypical movies >> location. Not that there would be a movie right there, but there would be >> ones nearby, and the nearer they were, the more enjoyable they would be. >> That's a naive model, but that doesn't mean it wouldn't work well enough. >> My problem is I don't quite get how to map the user space over to the item >> space. >> >> I think that may be what Lance is trying to describe in his last response, Lance Norskog [EMAIL PROTECTED]
-
Re: Singular vectors of a recommendation Item-Item spaceSean Owen 2011-08-26, 09:21
That's correct. Well you just have to recompose the user row you are
interested in. It will no longer be sparse, at all. Those new values are your estimated ratings. On Fri, Aug 26, 2011 at 12:07 AM, Jeff Hansen <[EMAIL PROTECTED]> wrote: > > I also think I may have missed a big step of the puzzle. For some reason I > thought that just by loosening the rank, you could recompose the Matrix A > from the truncated SVD values/vectors and use the recomposed values > themselves as the recommendation. I thought one of the ideas was that the > recomposed matrix had less "noise" and could be a better representation of > the underlying nature of the matrix than the original matrix itself. But > that may have just been wishful thinking... > >
-
Re: Singular vectors of a recommendation Item-Item spaceLance Norskog 2011-08-26, 09:33
I got this "axis of interest" concept from a presentation by one of the
Netflix team runner-ups, I don't know which one. He did not give a name for it. Is there a standard term? I hate just making up new words. Also, there are clusters of items at both ends, but there are also items along the axis which form a spectrum from one end to the other. So you get a line of items with blobs at the ends. On Fri, Aug 26, 2011 at 2:21 AM, Sean Owen <[EMAIL PROTECTED]> wrote: > That's correct. Well you just have to recompose the user row you are > interested in. It will no longer be sparse, at all. Those new values are > your estimated ratings. > > On Fri, Aug 26, 2011 at 12:07 AM, Jeff Hansen <[EMAIL PROTECTED]> wrote: > > > > I also think I may have missed a big step of the puzzle. For some reason > I > > thought that just by loosening the rank, you could recompose the Matrix A > > from the truncated SVD values/vectors and use the recomposed values > > themselves as the recommendation. I thought one of the ideas was that > the > > recomposed matrix had less "noise" and could be a better representation > of > > the underlying nature of the matrix than the original matrix itself. But > > that may have just been wishful thinking... > > > > > -- Lance Norskog [EMAIL PROTECTED]
-
Re: Singular vectors of a recommendation Item-Item spaceJeff Hansen 2011-08-26, 15:29
Thanks for the math Ted -- that was very helpful.
I've been using sparseMatrix() from libray(Matrix) -- largely based on your response to somebody elses email. I've been playing with smaller matrices mainly for my own learning purposes -- it's much easier to read through 200 movies (most of which I've heard of) and get a gut feel, than 10,000 movies. It also means I don't have to shut down my session quite as often (I've been using rstudio) when I run something over the wrong dimension(column vs row). I was running into some limitations as well, but I think some of those may have had more to do with my own typos and misunderstanding of that language. There's a second reason I've been avoiding the tail -- while working on this as a possible project to propose, I had a bit of a "duh" moment. Sometimes it's important to realize the real world constraints. Picture a company with very physical locations and very limited shelf space (say a convenience store or even a vending machine...) Netflix and Amazon can make a lot of money in the long tail because they have centralized and or digital inventories that service a large customer base. Imagine a situation where you have a small physical distributed inventory with an equally distributed customer base. In that situation it doesn't pay to chase after the tail -- you just need to reduce your costs and target the head where you get more volume with less items. So you really end up with a three tier market segmentation -- one strategy works best for the head, another for the body, and a third for the tail. As far as clusters go -- I really wasn't finding any clusters at the edges of the data, but that could have more to do with not including the tail (and not normalizing appropriately for popularity). Incidentally, there was one amusingly cautionary anecdote I'd share -- I don't remember the exact spread, but at one point I had two extremes that included something like "the johnson family vacation", "my baby's daddy", and "barbershop 2" on the one side, and then titles like "mean girls", "50 first dates" and "along came polly" on the other side. You may have to look up those titles to see what I'm talking about, but I'd say the distinction will be pretty clear when you do -- and if you were simply automating all of this and not reviewing it with common sense, you could end up offending some of your users... All I'm saying is there may be trends in the real world that some people aren't comfortable having pointed out. On Fri, Aug 26, 2011 at 4:08 AM, Lance Norskog <[EMAIL PROTECTED]> wrote: > This is a little meditation on user v.s. item matrix density. The > heavy users and heavy items can be subsampled, once they are > identified. Hadoop's built-in sort does give a very simple > "map-increase" way to do this sort. > > http://ultrawhizbang.blogspot.com/2011/08/sorted-recommender-data.html > > On Thu, Aug 25, 2011 at 5:57 PM, Ted Dunning <[EMAIL PROTECTED]> > wrote: > > In matrix terms the binary user x item matrix maps a set of items to > users > > (A h = users who interacted with items in h). Similarly A' maps users to > > items. Thus A' (A h) is the classic "users who x'ed this also x'ed that" > > sort of operation. This can be rearranged to be (A'A) h. > > > > This is where the singular values come in. If > > > > A \approx U S V' > > > > and we know that U'U = V'V = I from the definition of the SVD. This > means > > that > > > > A' A \approx (U S V')' (U S V') = V S U' U S V' = V S^2 V' > > > > But V' transforms an item vector into the reduced dimensional space so we > > can view this as transforming the item vector into the reduced > dimensional > > space, scaling element-wise using S^2 and then transforming back to item > > space. > > > > As you noted, the first right singular vector corresponds to popularity. > It > > is often not appropriate to just recommend popular things (since users > have > > other means of discovering them) so you might want to drop that dimension
-
Re: Singular vectors of a recommendation Item-Item spaceTed Dunning 2011-08-26, 17:15
On Fri, Aug 26, 2011 at 8:29 AM, Jeff Hansen <[EMAIL PROTECTED]> wrote:
> Thanks for the math Ted -- that was very helpful. > NP. > ... I've been playing with smaller matrices > mainly for my own learning purposes -- it's much easier to read through 200 > movies (most of which I've heard of) and get a gut feel, than 10,000 > movies. > I think it would still be a good idea to analyze on a larger data set even if you only analyze a few movies relative to your own ground truth. Let me know if you need the in-core stochastic projection. ... Sometimes > it's important to realize the real world constraints. Picture a company > with very physical locations ... So you really end up with a three tier > market > segmentation -- one strategy works best for the head, another for the body, > and a third for the tail. > This is definitely true. > As far as clusters go -- I really wasn't finding any clusters at the edges > of the data, but that could have more to do with not including the tail > (and > not normalizing appropriately for popularity). > Indeed. ... -- and if you were simply automating all of > this and not reviewing it with common sense, you could end up offending > some > of your users... All I'm saying is there may be trends in the real world > that some people aren't comfortable having pointed out. > This is definitely something we saw at Veoh. Sometimes the right thing to do is have a list of exceptions.
-
Re: Singular vectors of a recommendation Item-Item spaceLance Norskog 2011-08-27, 03:21
http://www.nytimes.com/interactive/2010/01/10/nyregion/20100110-netflix-map.html
Do not fear demographics. Yes, some people rent movies with all-black casts, and other people rent movies with all-white casts. And the Walmarts in the SF East Bay have palettes full of Tyler Perry videos, while most of the SF Peninsula don't know his name. And sometimes a bunch of Army Rangers have a great time watching "Confessions of a Shopaholic" in a tent, 120o farenheit, in Qatar. On Fri, Aug 26, 2011 at 8:29 AM, Jeff Hansen <[EMAIL PROTECTED]> wrote: > Thanks for the math Ted -- that was very helpful. > > I've been using sparseMatrix() from libray(Matrix) -- largely based on your > response to somebody elses email. I've been playing with smaller matrices > mainly for my own learning purposes -- it's much easier to read through 200 > movies (most of which I've heard of) and get a gut feel, than 10,000 > movies. > It also means I don't have to shut down my session quite as often (I've > been using rstudio) when I run something over the wrong dimension(column vs > row). I was running into some limitations as well, but I think some of > those may have had more to do with my own typos and misunderstanding of > that > language. > > There's a second reason I've been avoiding the tail -- while working on > this > as a possible project to propose, I had a bit of a "duh" moment. Sometimes > it's important to realize the real world constraints. Picture a company > with very physical locations and very limited shelf space (say a > convenience > store or even a vending machine...) Netflix and Amazon can make a lot of > money in the long tail because they have centralized and or digital > inventories that service a large customer base. Imagine a situation where > you have a small physical distributed inventory with an equally distributed > customer base. In that situation it doesn't pay to chase after the tail -- > you just need to reduce your costs and target the head where you get more > volume with less items. So you really end up with a three tier market > segmentation -- one strategy works best for the head, another for the body, > and a third for the tail. > > As far as clusters go -- I really wasn't finding any clusters at the edges > of the data, but that could have more to do with not including the tail > (and > not normalizing appropriately for popularity). > > Incidentally, there was one amusingly cautionary anecdote I'd share -- I > don't remember the exact spread, but at one point I had two extremes that > included something like "the johnson family vacation", "my baby's daddy", > and "barbershop 2" on the one side, and then titles like "mean girls", "50 > first dates" and "along came polly" on the other side. You may have to > look > up those titles to see what I'm talking about, but I'd say the distinction > will be pretty clear when you do -- and if you were simply automating all > of > this and not reviewing it with common sense, you could end up offending > some > of your users... All I'm saying is there may be trends in the real world > that some people aren't comfortable having pointed out. > > On Fri, Aug 26, 2011 at 4:08 AM, Lance Norskog <[EMAIL PROTECTED]> wrote: > > > This is a little meditation on user v.s. item matrix density. The > > heavy users and heavy items can be subsampled, once they are > > identified. Hadoop's built-in sort does give a very simple > > "map-increase" way to do this sort. > > > > http://ultrawhizbang.blogspot.com/2011/08/sorted-recommender-data.html > > > > On Thu, Aug 25, 2011 at 5:57 PM, Ted Dunning <[EMAIL PROTECTED]> > > wrote: > > > In matrix terms the binary user x item matrix maps a set of items to > > users > > > (A h = users who interacted with items in h). Similarly A' maps users > to > > > items. Thus A' (A h) is the classic "users who x'ed this also x'ed > that" > > > sort of operation. This can be rearranged to be (A'A) h. > > > > > > This is where the singular values come in. If Lance Norskog [EMAIL PROTECTED]
-
Re: Singular vectors of a recommendation Item-Item spaceJeff Hansen 2011-08-29, 20:34
Friday I finally got around to reading Ted's paper "accurate methods for
statistics of surprise and coincidence" for a better understanding of how to apply log likelihood. Can somebody validate if I'm understanding/applying the idea correctly in this case? If we have a item/feature matrix (document/words, user/movies) then we can consider each row or column to be a sample from larger reference population of the full matrix. In that case the log likelihood significance for any given observation within the sample will be based on the observation itself (1), the count of observations for the column (documents with this word/users that watched this movie), the count of observations for the row (words in this document, movies this user has watched) and the total number of any observation within the reference population. Given those numbers, G2 or log likelihood is just a calculation of a, b, c and d (values described above respectively) Given the log likelihood for each individual observation, is the best way to apply it simply to remove observations that don't have a high enough level of significance? Or is there a better way to normalize the values rather than removing less significant ones. Also, in this case I feel like it makes more sense to treat Users like Documents and movies like observations of words within a document, so compare each User and their observations to the reference population rather than each Movie and the users that reviewed it. It doesn't seem to make a very big difference, but I was wondering if there's a reason that explains when and why one should be used over the other. I realize this is a Mahout user list, so I feel somewhat guilty constantly pasting in R code, but here's how I ended up implementing it. I'm sure there's a better way to prune the matrix, but this was the best I could come up with. #rough equation that calculates log likelihood given a contingency table of counts llr <- function(a,b,c,d){ r=(a+b)/(c+d) e1=c*r e2=d*r g2=2*(a*log(a/e1)+b*log(b/e2)) return(g2) } #get counts to calculate log likelihood for function above a <- 1 b <- apply(A,2,sum) c <- apply(A,1,sum) d <- sum(A) #export sparse matrix A to a dataframe for calculating llr triples <- summary(A) #create b and c value vectors that sync up with the dataframe columns bx <- b[triples$j] cx <- c[triples$i] #caculate the log likelihood for each entry in the dataframe ll <- llr(a,bx,cx,d) #create a logical operator vector that indicates whether each entry in dataframe is significant significant <- ll>3 #build a new sparse vector that only entries with significant enough log likelihood B <- sparseMatrix(i=triples$i[significant],j=triples$j[significant],x=1) By the way, I always used to struggle with matrix multiplication (I could never quite remember which side was rows and which side was columns with out little mnemonics, but then again I've always struggled with remembering my left from my right). I recently realized it makes much more sense if you picture it as a cube in 3space -- if you match up the lower left hand corner of the right matrix with the upper right hand corner of the left matrix and put the product in between them so you get a backwards capital L shape, then fold back the right and left matrices you get three faces of a cube (or cubic rectangle). Then you just project inwards from the original matrices and multiply where ever the values cross and sum them up those products as you project them toward the face which is the resulting matrix. Once you visualize it this way it makes slicing and dicing the operation into blockwise formulas trivial and obvious. If only somebody had explained it this way I would have found matrices much less intimidating -- has anybody ever seen it explained this way? On Fri, Aug 26, 2011 at 10:21 PM, Lance Norskog <[EMAIL PROTECTED]> wrote: > > http://www.nytimes.com/interactive/2010/01/10/nyregion/20100110-netflix-map.html > > Do not fear demographics. Yes, some people rent movies with all-black
-
Re: Singular vectors of a recommendation Item-Item spaceTed Dunning 2011-08-29, 22:28
Jeff,
I think that this is a much simpler exposition: http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html It makes the connection with entropy clear and allows a very simple implementation for more than 2x2 situations. More comments in-line: On Mon, Aug 29, 2011 at 1:34 PM, Jeff Hansen <[EMAIL PROTECTED]> wrote: > ... > If we have a item/feature matrix (document/words, user/movies) then we can > consider each row or column to be a sample from larger reference population > of the full matrix. Well, sort of. If you have a document (user) and a word (item), then you have a joint probability that any given interaction will be between this document and word. We pretend in this case that each interaction is independent of every other which is patently not true, but very helpful. This joint probability can be approximated more or less accurately as the product of document (user) and word (item) popularities. In matrix terms, this is the same as approximating the joint probability as a rank-1 matrix that is the outer produced of user and item probability distributions, each of which is a vector. The point of what we are trying to do is to find a good and economical approximation of the joint probability that captures important deviations from this rank-1 approximation. There are two problems that come up in trying to find this approximation. The first is that we see counts rather than probabilities and the second is that the matrix is big. Sometimes, it is very big. > In that case the log likelihood significance for any > given observation within the sample will be based on the observation itself > (1), the count of observations for the column (documents with this > word/users that watched this movie), the count of observations for the row > (words in this document, movies this user has watched) and the total number > of any observation within the reference population. Given those numbers, > G2 > or log likelihood is just a calculation of a, b, c and d (values described > above respectively) > As the blog I mentioned above makes more clear than original paper, you need to reduce the counts before taking them as entries in the contingency table. If you are looking at the typical problem of vocabulary in documents, then the discrepancy won't be huge, but if you have any very popular items, this could be a problem. Given the log likelihood for each individual observation, is the best way to > apply it simply to remove observations that don't have a high enough level > of significance? Roughly. It is a bit misleading to talk about "significance" here since we aren't trying to do a classic frequentist hypothesis test. Instead, what we are trying to do is prioritize which joint terms we need to use to get a usefully better approximation of the underlying joint distribution. > Or is there a better way to normalize the values rather > than removing less significant ones. There are a lot of ways of approaching this. Most don't do any better than the simple expedient of just keeping the highest scoring k words for each document (or items for each user). Having k be the same for all users is not a bad thing at all from a number of angles. I often use k = 50 to 300. Also, in this case I feel like it makes more sense to treat Users like > Documents and movies like observations of words within a document, so > compare each User and their observations to > the reference population rather than each Movie and the users that reviewed > it. I think that you are on the right track here, but your second sentence made a distinction I didn't understand. User is to item as Document is to Word is a fine metaphor for recording your observations. I strongly prefer binary observations, so I usually either count all ratings as 1 or all ratings above a threshold as 1 with everything else being a zero. Whether you subsequently try to find users similar to each other or items similar to each other doesn't much matter. All are reasonable things to try. Whether they are useful is up to you and your customer to decide. Also, if your original matrix is A, then it is usually a shorter path to results to analyze the word (item) cooccurrence matrix A'A. The methods below work either way. I do it. Use the right tools for the right task. R is great for prototyping. So this code implies that you are arranging the elements of the contingency table this way: a b | a+b c d | c+d a+c b+d | a+b+c+d But your later code you pass in data that uses this convention a b-a | b c d-b-c | d-b a+c d-a-c | d I don't like this form because it makes code more complex overall and breaks important symmetries. #get counts to calculate log likelihood for function above c <- apply(A,1,sum) you can also use rowSums and columnSums d <- sum(A) This probably needs to be something more like llr(a, bx-a, d-cx, d-bx-cx+a) (I think) Also, summary produces different values for different kinds of matrices. As you point out, it is very handy for sparse matrices, but it is nice to have code that works on sparse or dense matrices. I think that a much higher cutoff is better. Commonly a cutoff of 10-30 is useful. Also, you may be better off if you can take the k-highest.
-
Re: Singular vectors of a recommendation Item-Item spaceLance Norskog 2011-08-31, 08:24
"Also, if your original matrix is A, then it is usually a shorter path to
results to analyze the word (item) cooccurrence matrix A'A. The methods below work either way." The cooccurrence definitions I'm finding only use the summation-based one in wikipedia. Are there any involving inverting the matrix instead? Or, as I suspect, are the two exactly the same but I don't know enough linear algebra? On Mon, Aug 29, 2011 at 3:28 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > Jeff, > > I think that this is a much simpler exposition: > http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html > > It makes the connection with entropy clear and allows a very simple > implementation for more than 2x2 situations. > > More comments in-line: > > On Mon, Aug 29, 2011 at 1:34 PM, Jeff Hansen <[EMAIL PROTECTED]> wrote: > > > ... > > If we have a item/feature matrix (document/words, user/movies) then we > can > > consider each row or column to be a sample from larger reference > population > > of the full matrix. > > > Well, sort of. If you have a document (user) and a word (item), then you > have a joint probability that any given interaction will be between this > document and word. We pretend in this case that each interaction is > independent of every other which is patently not true, but very helpful. > > This joint probability can be approximated more or less accurately as the > product of document (user) and word (item) popularities. In matrix terms, > this is the same as approximating the joint probability as a rank-1 matrix > that is the outer produced of user and item probability distributions, each > of which is a vector. The point of what we are trying to do is to find a > good and economical approximation of the joint probability that captures > important deviations from this rank-1 approximation. There are two > problems > that come up in trying to find this approximation. The first is that we > see > counts rather than probabilities and the second is that the matrix is big. > Sometimes, it is very big. > > > > In that case the log likelihood significance for any > > given observation within the sample will be based on the observation > itself > > (1), the count of observations for the column (documents with this > > word/users that watched this movie), the count of observations for the > row > > (words in this document, movies this user has watched) and the total > number > > of any observation within the reference population. Given those numbers, > > G2 > > or log likelihood is just a calculation of a, b, c and d (values > described > > above respectively) > > > > As the blog I mentioned above makes more clear than original paper, you > need > to reduce the counts before taking them as entries in the contingency > table. > If you are looking at the typical problem of vocabulary in documents, then > the discrepancy won't be huge, but if you have any very popular items, this > could be a problem. > > Given the log likelihood for each individual observation, is the best way > to > > apply it simply to remove observations that don't have a high enough > level > > of significance? > > > Roughly. It is a bit misleading to talk about "significance" here since we > aren't trying to do a classic frequentist hypothesis test. Instead, what > we > are trying to do is prioritize which joint terms we need to use to get a > usefully better approximation of the underlying joint distribution. > > > > Or is there a better way to normalize the values rather > > than removing less significant ones. > > > There are a lot of ways of approaching this. Most don't do any better than > the simple expedient of just keeping the highest scoring k words for each > document (or items for each user). Having k be the same for all users is > not a bad thing at all from a number of angles. I often use k = 50 to 300. > > Also, in this case I feel like it makes more sense to treat Users like > > Documents and movies like observations of words within a document, so Lance Norskog [EMAIL PROTECTED]
-
Re: Singular vectors of a recommendation Item-Item spaceLance Norskog 2011-08-31, 08:31
"If you have a document (user) and a word (item), then you
have a joint probability that any given interaction will be between this document and word. We pretend in this case that each interaction is independent of every other which is patently not true, but very helpful." So if you subsample randomly to trim the data sizes, is it worth deliberately breaking correlations? Lets say that in a user/item dataset, almost all of the user/rating rows are in clusters of two or three according to the timestamp. Instead of just a random subsample, is it worth removing 1 rating from each cluster? This would strengthen the Bayesian assumption. On Mon, Aug 29, 2011 at 3:28 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > Jeff, > > I think that this is a much simpler exposition: > http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html > > It makes the connection with entropy clear and allows a very simple > implementation for more than 2x2 situations. > > More comments in-line: > > On Mon, Aug 29, 2011 at 1:34 PM, Jeff Hansen <[EMAIL PROTECTED]> wrote: > > > ... > > If we have a item/feature matrix (document/words, user/movies) then we > can > > consider each row or column to be a sample from larger reference > population > > of the full matrix. > > > Well, sort of. If you have a document (user) and a word (item), then you > have a joint probability that any given interaction will be between this > document and word. We pretend in this case that each interaction is > independent of every other which is patently not true, but very helpful. > > This joint probability can be approximated more or less accurately as the > product of document (user) and word (item) popularities. In matrix terms, > this is the same as approximating the joint probability as a rank-1 matrix > that is the outer produced of user and item probability distributions, each > of which is a vector. The point of what we are trying to do is to find a > good and economical approximation of the joint probability that captures > important deviations from this rank-1 approximation. There are two > problems > that come up in trying to find this approximation. The first is that we > see > counts rather than probabilities and the second is that the matrix is big. > Sometimes, it is very big. > > > > In that case the log likelihood significance for any > > given observation within the sample will be based on the observation > itself > > (1), the count of observations for the column (documents with this > > word/users that watched this movie), the count of observations for the > row > > (words in this document, movies this user has watched) and the total > number > > of any observation within the reference population. Given those numbers, > > G2 > > or log likelihood is just a calculation of a, b, c and d (values > described > > above respectively) > > > > As the blog I mentioned above makes more clear than original paper, you > need > to reduce the counts before taking them as entries in the contingency > table. > If you are looking at the typical problem of vocabulary in documents, then > the discrepancy won't be huge, but if you have any very popular items, this > could be a problem. > > Given the log likelihood for each individual observation, is the best way > to > > apply it simply to remove observations that don't have a high enough > level > > of significance? > > > Roughly. It is a bit misleading to talk about "significance" here since we > aren't trying to do a classic frequentist hypothesis test. Instead, what > we > are trying to do is prioritize which joint terms we need to use to get a > usefully better approximation of the underlying joint distribution. > > > > Or is there a better way to normalize the values rather > > than removing less significant ones. > > > There are a lot of ways of approaching this. Most don't do any better than > the simple expedient of just keeping the highest scoring k words for each > document (or items for each user). Having k be the same for all users is Lance Norskog [EMAIL PROTECTED]
-
Re: Singular vectors of a recommendation Item-Item spaceTed Dunning 2011-08-31, 15:04
Uhh...
A' is the transpose of A. Not the inverse. A' A *is* the summation version. On Wed, Aug 31, 2011 at 1:24 AM, Lance Norskog <[EMAIL PROTECTED]> wrote: > "Also, if your original matrix is A, then it is usually a shorter path to > results to analyze the word (item) cooccurrence matrix A'A. The methods > below work either way." > > The cooccurrence definitions I'm finding only use the summation-based one > in > wikipedia. Are there any involving inverting the matrix instead? Or, as I > suspect, are the two exactly the same but I don't know enough linear > algebra? > > On Mon, Aug 29, 2011 at 3:28 PM, Ted Dunning <[EMAIL PROTECTED]> > wrote: > > > Jeff, > > > > I think that this is a much simpler exposition: > > http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html > > > > It makes the connection with entropy clear and allows a very simple > > implementation for more than 2x2 situations. > > > > More comments in-line: > > > > On Mon, Aug 29, 2011 at 1:34 PM, Jeff Hansen <[EMAIL PROTECTED]> wrote: > > > > > ... > > > If we have a item/feature matrix (document/words, user/movies) then we > > can > > > consider each row or column to be a sample from larger reference > > population > > > of the full matrix. > > > > > > Well, sort of. If you have a document (user) and a word (item), then you > > have a joint probability that any given interaction will be between this > > document and word. We pretend in this case that each interaction is > > independent of every other which is patently not true, but very helpful. > > > > This joint probability can be approximated more or less accurately as the > > product of document (user) and word (item) popularities. In matrix > terms, > > this is the same as approximating the joint probability as a rank-1 > matrix > > that is the outer produced of user and item probability distributions, > each > > of which is a vector. The point of what we are trying to do is to find a > > good and economical approximation of the joint probability that captures > > important deviations from this rank-1 approximation. There are two > > problems > > that come up in trying to find this approximation. The first is that we > > see > > counts rather than probabilities and the second is that the matrix is > big. > > Sometimes, it is very big. > > > > > > > In that case the log likelihood significance for any > > > given observation within the sample will be based on the observation > > itself > > > (1), the count of observations for the column (documents with this > > > word/users that watched this movie), the count of observations for the > > row > > > (words in this document, movies this user has watched) and the total > > number > > > of any observation within the reference population. Given those > numbers, > > > G2 > > > or log likelihood is just a calculation of a, b, c and d (values > > described > > > above respectively) > > > > > > > As the blog I mentioned above makes more clear than original paper, you > > need > > to reduce the counts before taking them as entries in the contingency > > table. > > If you are looking at the typical problem of vocabulary in documents, > then > > the discrepancy won't be huge, but if you have any very popular items, > this > > could be a problem. > > > > Given the log likelihood for each individual observation, is the best way > > to > > > apply it simply to remove observations that don't have a high enough > > level > > > of significance? > > > > > > Roughly. It is a bit misleading to talk about "significance" here since > we > > aren't trying to do a classic frequentist hypothesis test. Instead, what > > we > > are trying to do is prioritize which joint terms we need to use to get a > > usefully better approximation of the underlying joint distribution. > > > > > > > Or is there a better way to normalize the values rather > > > than removing less significant ones. > > > > > > There are a lot of ways of approaching this. Most don't do any better
-
Re: Singular vectors of a recommendation Item-Item spaceTed Dunning 2011-08-31, 15:10
Mathematically speaking, random sampling is just fine. Stratifying based on
various criteria can help avoid loss of accuracy so if you had several clusters then down sampling heavily represented clusters might work, but the accurate definition of clusters is harder than the cooccurrence analysis that you are trying to facilitate. Separately down sampling each user is actually a form of this stratification strategy. The difference is that we know which users have a large number of data points so that we can do the stratification accurately and very cheaply. On Wed, Aug 31, 2011 at 1:31 AM, Lance Norskog <[EMAIL PROTECTED]> wrote: > "If you have a document (user) and a word (item), then you > have a joint probability that any given interaction will be between this > document and word. We pretend in this case that each interaction is > independent of every other which is patently not true, but very helpful." > > So if you subsample randomly to trim the data sizes, is it worth > deliberately breaking correlations? Lets say that in a user/item dataset, > almost all of the user/rating rows are in clusters of two or three > according > to the timestamp. Instead of just a random subsample, is it worth removing > 1 > rating from each cluster? This would strengthen the Bayesian assumption. > >
-
Re: Singular vectors of a recommendation Item-Item spaceLance Norskog 2011-08-31, 22:57
In this text-only notation, I though apostrophe meant inverse. What then is
matrix inversion? I see a fair amount of stuff here in what I think is MathML, but is displays raw in gmail. On Wed, Aug 31, 2011 at 8:04 AM, Ted Dunning <[EMAIL PROTECTED]> wrote: > Uhh... > > A' is the transpose of A. Not the inverse. > > A' A *is* the summation version. > > On Wed, Aug 31, 2011 at 1:24 AM, Lance Norskog <[EMAIL PROTECTED]> wrote: > > > "Also, if your original matrix is A, then it is usually a shorter path to > > results to analyze the word (item) cooccurrence matrix A'A. The methods > > below work either way." > > > > The cooccurrence definitions I'm finding only use the summation-based one > > in > > wikipedia. Are there any involving inverting the matrix instead? Or, as I > > suspect, are the two exactly the same but I don't know enough linear > > algebra? > > > > On Mon, Aug 29, 2011 at 3:28 PM, Ted Dunning <[EMAIL PROTECTED]> > > wrote: > > > > > Jeff, > > > > > > I think that this is a much simpler exposition: > > > http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html > > > > > > It makes the connection with entropy clear and allows a very simple > > > implementation for more than 2x2 situations. > > > > > > More comments in-line: > > > > > > On Mon, Aug 29, 2011 at 1:34 PM, Jeff Hansen <[EMAIL PROTECTED]> > wrote: > > > > > > > ... > > > > If we have a item/feature matrix (document/words, user/movies) then > we > > > can > > > > consider each row or column to be a sample from larger reference > > > population > > > > of the full matrix. > > > > > > > > > Well, sort of. If you have a document (user) and a word (item), then > you > > > have a joint probability that any given interaction will be between > this > > > document and word. We pretend in this case that each interaction is > > > independent of every other which is patently not true, but very > helpful. > > > > > > This joint probability can be approximated more or less accurately as > the > > > product of document (user) and word (item) popularities. In matrix > > terms, > > > this is the same as approximating the joint probability as a rank-1 > > matrix > > > that is the outer produced of user and item probability distributions, > > each > > > of which is a vector. The point of what we are trying to do is to find > a > > > good and economical approximation of the joint probability that > captures > > > important deviations from this rank-1 approximation. There are two > > > problems > > > that come up in trying to find this approximation. The first is that > we > > > see > > > counts rather than probabilities and the second is that the matrix is > > big. > > > Sometimes, it is very big. > > > > > > > > > > In that case the log likelihood significance for any > > > > given observation within the sample will be based on the observation > > > itself > > > > (1), the count of observations for the column (documents with this > > > > word/users that watched this movie), the count of observations for > the > > > row > > > > (words in this document, movies this user has watched) and the total > > > number > > > > of any observation within the reference population. Given those > > numbers, > > > > G2 > > > > or log likelihood is just a calculation of a, b, c and d (values > > > described > > > > above respectively) > > > > > > > > > > As the blog I mentioned above makes more clear than original paper, you > > > need > > > to reduce the counts before taking them as entries in the contingency > > > table. > > > If you are looking at the typical problem of vocabulary in documents, > > then > > > the discrepancy won't be huge, but if you have any very popular items, > > this > > > could be a problem. > > > > > > Given the log likelihood for each individual observation, is the best > way > > > to > > > > apply it simply to remove observations that don't have a high enough > > > level > > > > of significance? > > > > > > > > > Roughly. It is a bit misleading to talk about "significance" here Lance Norskog [EMAIL PROTECTED]
-
Re: Singular vectors of a recommendation Item-Item spaceDmitriy Lyubimov 2011-08-31, 23:06
we usually denote inverse , A^{-1} or just A^-1
Apostrophe, superscript star or {top} always mean transpose. I never saw apostrophe to be used for inverses. Perhaps confusion may be stemming from the fact that inverse equal transpose if matrices are orthogonal (or even orthonormal), so sometimes Q^{-1}\equiv{Q}' On Wed, Aug 31, 2011 at 3:57 PM, Lance Norskog <[EMAIL PROTECTED]> wrote: > In this text-only notation, I though apostrophe meant inverse. What then is > matrix inversion? > > I see a fair amount of stuff here in what I think is MathML, but is displays > raw in gmail. > > On Wed, Aug 31, 2011 at 8:04 AM, Ted Dunning <[EMAIL PROTECTED]> wrote: > >> Uhh... >> >> A' is the transpose of A. Not the inverse. >> >> A' A *is* the summation version. >> >> On Wed, Aug 31, 2011 at 1:24 AM, Lance Norskog <[EMAIL PROTECTED]> wrote: >> >> > "Also, if your original matrix is A, then it is usually a shorter path to >> > results to analyze the word (item) cooccurrence matrix A'A. The methods >> > below work either way." >> > >> > The cooccurrence definitions I'm finding only use the summation-based one >> > in >> > wikipedia. Are there any involving inverting the matrix instead? Or, as I >> > suspect, are the two exactly the same but I don't know enough linear >> > algebra? >> > >> > On Mon, Aug 29, 2011 at 3:28 PM, Ted Dunning <[EMAIL PROTECTED]> >> > wrote: >> > >> > > Jeff, >> > > >> > > I think that this is a much simpler exposition: >> > > http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html >> > > >> > > It makes the connection with entropy clear and allows a very simple >> > > implementation for more than 2x2 situations. >> > > >> > > More comments in-line: >> > > >> > > On Mon, Aug 29, 2011 at 1:34 PM, Jeff Hansen <[EMAIL PROTECTED]> >> wrote: >> > > >> > > > ... >> > > > If we have a item/feature matrix (document/words, user/movies) then >> we >> > > can >> > > > consider each row or column to be a sample from larger reference >> > > population >> > > > of the full matrix. >> > > >> > > >> > > Well, sort of. If you have a document (user) and a word (item), then >> you >> > > have a joint probability that any given interaction will be between >> this >> > > document and word. We pretend in this case that each interaction is >> > > independent of every other which is patently not true, but very >> helpful. >> > > >> > > This joint probability can be approximated more or less accurately as >> the >> > > product of document (user) and word (item) popularities. In matrix >> > terms, >> > > this is the same as approximating the joint probability as a rank-1 >> > matrix >> > > that is the outer produced of user and item probability distributions, >> > each >> > > of which is a vector. The point of what we are trying to do is to find >> a >> > > good and economical approximation of the joint probability that >> captures >> > > important deviations from this rank-1 approximation. There are two >> > > problems >> > > that come up in trying to find this approximation. The first is that >> we >> > > see >> > > counts rather than probabilities and the second is that the matrix is >> > big. >> > > Sometimes, it is very big. >> > > >> > > >> > > > In that case the log likelihood significance for any >> > > > given observation within the sample will be based on the observation >> > > itself >> > > > (1), the count of observations for the column (documents with this >> > > > word/users that watched this movie), the count of observations for >> the >> > > row >> > > > (words in this document, movies this user has watched) and the total >> > > number >> > > > of any observation within the reference population. Given those >> > numbers, >> > > > G2 >> > > > or log likelihood is just a calculation of a, b, c and d (values >> > > described >> > > > above respectively) >> > > > >> > > >> > > As the blog I mentioned above makes more clear than original paper, you >> > > need >> > > to reduce the counts before taking them as entries in the contingency
-
Re: Singular vectors of a recommendation Item-Item spaceDmitriy Lyubimov 2011-09-01, 00:24
PS
> On Wed, Aug 31, 2011 at 3:57 PM, Lance Norskog <[EMAIL PROTECTED]> wrote: >> >> I see a fair amount of stuff here in what I think is MathML, but is displays >> raw in gmail. >> this is usually tex. I am not familiar with mathml that close but i think it is fundamentally different
-
Re: Singular vectors of a recommendation Item-Item spaceTed Dunning 2011-09-01, 03:28
This is a very good point that seems very likely to be the source of the
confusion. On Wed, Aug 31, 2011 at 6:06 PM, Dmitriy Lyubimov <[EMAIL PROTECTED]> wrote: > Perhaps confusion may be stemming from the fact that inverse equal > transpose if matrices are orthogonal (or even orthonormal), so > sometimes Q^{-1}\equiv{Q}' >
-
Re: Singular vectors of a recommendation Item-Item spaceTed Dunning 2011-09-01, 03:32
The basics of latex notation are
^ for superscript _ for subscript {} for grouping \sum for summation \log for logs \Omega for upper case greek letter omega \alpha for lower case greek letter beta \int for integral. See http://www.codecogs.com/latex/eqneditor.php for a playground where you can test things. Try this, for instance, F(x,y)=0 ~~\mbox{and}~~ \left| \begin{array}{ccc} F''_{xx} & F''_{xy} & F'_x \\ F''_{yx} & F''_{yy} & F'_y \\ F'_x & F'_y & 0 \end{array}\ right| = 0 On Wed, Aug 31, 2011 at 7:24 PM, Dmitriy Lyubimov <[EMAIL PROTECTED]> wrote: > PS > > > On Wed, Aug 31, 2011 at 3:57 PM, Lance Norskog <[EMAIL PROTECTED]> > wrote: > >> > >> I see a fair amount of stuff here in what I think is MathML, but is > displays > >> raw in gmail. > >> > > this is usually tex. I am not familiar with mathml that close but i > think it is fundamentally different >
-
Re: Singular vectors of a recommendation Item-Item spaceKen Krugler 2011-09-01, 03:50
On Aug 31, 2011, at 8:32pm, Ted Dunning wrote: > The basics of latex notation are > > ^ for superscript > _ for subscript > {} for grouping > \sum for summation > \log for logs > \Omega for upper case greek letter omega > \alpha for lower case greek letter beta > \int for integral. > > See http://www.codecogs.com/latex/eqneditor.php for a playground where you > can test things. > > Try this, for instance, > > F(x,y)=0 ~~\mbox{and}~~ > \left| \begin{array}{ccc} > F''_{xx} & F''_{xy} & F'_x \\ > F''_{yx} & F''_{yy} & F'_y \\ > F'_x & F'_y & 0 > \end{array}\ > right| = 0 The codecogs.com site says that it's an invalid equation. -- Ken > On Wed, Aug 31, 2011 at 7:24 PM, Dmitriy Lyubimov <[EMAIL PROTECTED]> wrote: > >> PS >> >>> On Wed, Aug 31, 2011 at 3:57 PM, Lance Norskog <[EMAIL PROTECTED]> >> wrote: >>>> >>>> I see a fair amount of stuff here in what I think is MathML, but is >> displays >>>> raw in gmail. >>>> >> >> this is usually tex. I am not familiar with mathml that close but i >> think it is fundamentally different >> -------------------------- Ken Krugler +1 530-210-6378 http://bixolabs.com custom big data solutions & training Hadoop, Cascading, Mahout & Solr
-
Re: Singular vectors of a recommendation Item-Item spaceTed Dunning 2011-09-01, 03:57
Hmm... I see this:
http://latex.codecogs.com/gif.latex?F(x,y)=0%20~~mbox{and}~~%20left|%20begin{array}{ccc}%20F''_{xx}%20&%20F''_{xy}%20&%20F'_x%20\%20F''_{yx}%20&%20F''_{yy}%20&%20F'_y%20\%20F'_x%20&%20F'_y%20&%200%20end{array}right|%20=%200 Must be a cut and paste kind of thing. On Wed, Aug 31, 2011 at 10:50 PM, Ken Krugler <[EMAIL PROTECTED]>wrote: > > F(x,y)=0 ~~\mbox{and}~~ > > \left| \begin{array}{ccc} > > F''_{xx} & F''_{xy} & F'_x \\ > > F''_{yx} & F''_{yy} & F'_y \\ > > F'_x & F'_y & 0 > > \end{array}\ > > right| = 0 > > The codecogs.com site says that it's an invalid equation.
-
Re: Singular vectors of a recommendation Item-Item spaceLance Norskog 2011-09-01, 05:55
Yup! The last two lines break the backlash-right clause.
Cool tool! Thanks. On Wed, Aug 31, 2011 at 8:57 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > Hmm... I see this: > > > http://latex.codecogs.com/gif.latex?F(x,y)=0%20~~mbox{and}~~%20left|%20begin{array}{ccc}%20F''_{xx}%20&%20F''_{xy}%20&%20F'_x%20\%20F''_{yx}%20&%20F''_{yy}%20&%20F'_y%20\%20F'_x%20&%20F'_y%20&%200%20end{array}right|%20=%200<http://latex.codecogs.com/gif.latex?F%28x,y%29=0%20%7E%7E%5Cmbox%7Band%7D%7E%7E%20%5Cleft%7C%20%5Cbegin%7Barray%7D%7Bccc%7D%20F%27%27_%7Bxx%7D%20&%20F%27%27_%7Bxy%7D%20&%20F%27_x%20%5C%5C%20F%27%27_%7Byx%7D%20&%20F%27%27_%7Byy%7D%20&%20F%27_y%20%5C%5C%20F%27_x%20&%20F%27_y%20&%200%20%5Cend%7Barray%7D%5Cright%7C%20=%200> > > Must be a cut and paste kind of thing. > > On Wed, Aug 31, 2011 at 10:50 PM, Ken Krugler > <[EMAIL PROTECTED]>wrote: > > > > F(x,y)=0 ~~\mbox{and}~~ > > > \left| \begin{array}{ccc} > > > F''_{xx} & F''_{xy} & F'_x \\ > > > F''_{yx} & F''_{yy} & F'_y \\ > > > F'_x & F'_y & 0 > > > \end{array}\ > > > right| = 0 > > > > The codecogs.com site says that it's an invalid equation. > -- Lance Norskog [EMAIL PROTECTED] |