|
Tamas Jambor
2010-02-22, 12:54
Sean Owen
2010-02-22, 13:03
Tamas Jambor
2010-02-22, 13:14
Sean Owen
2010-02-22, 13:49
Tamas Jambor
2010-02-22, 14:16
Sean Owen
2010-02-22, 14:48
Tamas Jambor
2010-02-22, 16:18
Sean Owen
2010-02-22, 17:05
Tamas Jambor
2010-02-22, 18:41
Ted Dunning
2010-02-22, 18:47
Sean Owen
2010-02-23, 11:49
Sean Owen
2010-02-23, 23:25
Ted Dunning
2010-02-23, 23:32
Tamas Jambor
2010-02-24, 00:16
Ted Dunning
2010-02-24, 00:28
Tamas Jambor
2010-02-24, 02:39
Ted Dunning
2010-02-24, 05:55
|
-
weighted scoreTamas Jambor 2010-02-22, 12:54
hi,
Just wondering how you justify that you add +1 to the correlation, when you calculate the score for the recommendation. so that items which are not correlated constitute to the score. I think this biases the recommender towards the mean of the ratings of the target users (for item based), Tamas
-
Re: weighted scoreSean Owen 2010-02-22, 13:03
It's a good question. The bigger question here is, how do you create a
weighted average when weights can be negative? That leads to wacky results like predicting ratings of -5 when ratings range from 1 to 5. My fix was to make all weights nonnegative in this way. If you ignore items with similarity 0, what would you do with items with negative similarity? You could ignore them I suppose; it loses some key information, but might be OK. It also presupposes that similarity 0 means no resemblance at all; that's not necessarily what 0 means for similarity -- at least in the context of this framework. While it means no resemblance in the case of similarities built on things like the Pearson correlation, it doesn't for other metrics. Sean On Mon, Feb 22, 2010 at 12:54 PM, Tamas Jambor <[EMAIL PROTECTED]> wrote: > hi, > > Just wondering how you justify that you add +1 to the correlation, when you > calculate the score for the recommendation. > so that items which are not correlated constitute to the score. I think this > biases the recommender towards the mean of the ratings of the target users > (for item based), > > Tamas > > >
-
Re: weighted scoreTamas Jambor 2010-02-22, 13:14
I'm doing some studies on the bias of recommender system, and using this
approach combined with person correlation gives some very weird results. For example if I take items that have a mean of less than 2.5, it is more likely that those items are ranked higher than items which have a very high mean (ie higher than 3.5). it took me a while to figure out why, and the reason is the approach you take to calculate prediction always biases the score towards the mean. So that I end up with a very low variance for the predicted items compared to for example SVD. Tamas On 22/02/2010 13:03, Sean Owen wrote: > It's a good question. The bigger question here is, how do you create a > weighted average when weights can be negative? That leads to wacky > results like predicting ratings of -5 when ratings range from 1 to 5. > > My fix was to make all weights nonnegative in this way. If you ignore > items with similarity 0, what would you do with items with negative > similarity? > > You could ignore them I suppose; it loses some key information, but > might be OK. It also presupposes that similarity 0 means no > resemblance at all; that's not necessarily what 0 means for similarity > -- at least in the context of this framework. While it means no > resemblance in the case of similarities built on things like the > Pearson correlation, it doesn't for other metrics. > > Sean > > > On Mon, Feb 22, 2010 at 12:54 PM, Tamas Jambor<[EMAIL PROTECTED]> wrote: > >> hi, >> >> Just wondering how you justify that you add +1 to the correlation, when you >> calculate the score for the recommendation. >> so that items which are not correlated constitute to the score. I think this >> biases the recommender towards the mean of the ratings of the target users >> (for item based), >> >> Tamas >> >> >> >>
-
Re: weighted scoreSean Owen 2010-02-22, 13:49
Could you elaborate on your example? It's quite possible for an item
with an average rating of 2.5 to be a better recommendation for a user than one whose average is 3.5 -- depends on the user of course. Try the evaluation framework in org.apache.mahout.cf.taste.impl.eval in order to figure out whether one or the other implementation is more accurate, in the sense of correctly predicting ratings. It could be that the results "look" wrong but In what sense does the weighted average bias to the mean -- how does the choice of weights have this effect? You can't use similarity scores as weights directly since they're possibly negative, so something must be done, and I don't know of a standard answer to this. I'm open to a different way of patching this problem, but first want to understand where the bias is. For any given input, some algorithms will work well and some won't. SVD is a great approach, and wouldn't be surprised if it's working better for whatever input you have. On Mon, Feb 22, 2010 at 1:14 PM, Tamas Jambor <[EMAIL PROTECTED]> wrote: > I'm doing some studies on the bias of recommender system, and using this > approach combined with > person correlation gives some very weird results. For example if I take > items that have a mean of less than 2.5, it > is more likely that those items are ranked higher than items which have a > very high mean (ie higher than 3.5). it took me a while to > figure out why, and the reason is the approach you take to calculate > prediction always biases the score towards the mean. So that I end > up with a very low variance for the predicted items compared to for example > SVD. > > Tamas > > On 22/02/2010 13:03, Sean Owen wrote: >> >> It's a good question. The bigger question here is, how do you create a >> weighted average when weights can be negative? That leads to wacky >> results like predicting ratings of -5 when ratings range from 1 to 5. >> >> My fix was to make all weights nonnegative in this way. If you ignore >> items with similarity 0, what would you do with items with negative >> similarity? >> >> You could ignore them I suppose; it loses some key information, but >> might be OK. It also presupposes that similarity 0 means no >> resemblance at all; that's not necessarily what 0 means for similarity >> -- at least in the context of this framework. While it means no >> resemblance in the case of similarities built on things like the >> Pearson correlation, it doesn't for other metrics. >> >> Sean >> >> >> On Mon, Feb 22, 2010 at 12:54 PM, Tamas Jambor<[EMAIL PROTECTED]> >> wrote: >> >>> >>> hi, >>> >>> Just wondering how you justify that you add +1 to the correlation, when >>> you >>> calculate the score for the recommendation. >>> so that items which are not correlated constitute to the score. I think >>> this >>> biases the recommender towards the mean of the ratings of the target >>> users >>> (for item based), >>> >>> Tamas >>> >>> >>> >>> >
-
Re: weighted scoreTamas Jambor 2010-02-22, 14:16
On 22/02/2010 13:49, Sean Owen wrote: > Could you elaborate on your example? It's quite possible for an item > with an average rating of 2.5 to be a better recommendation for a user > than one whose average is 3.5 -- depends on the user of course. > > If I take all the items with an average rating of less than 2.5, and calculate the probability that they will be get the highest score for each user (ie ranked first), I get higher probability than for items that have an average rating of 3.5. > In what sense does the weighted average bias to the mean -- how does > the choice of weights have this effect? You can't use similarity > scores as weights directly since they're possibly negative, so > something must be done, and I don't know of a standard answer to this. > I'm open to a different way of patching this problem, but first want > to understand where the bias is. > > I think the reason why it is biased is that in item-based recommendation most of the time you can find some kind of correlation between any given items. and even it is negatively correlated you take it into account towards the score. For example if I take 4 items rated 1,1,5,5 by the user and the correlation between the target item is 1,1,0,0 respectively, I get 2 using your calculation and 1 using the standard one as follows: preference += theSimilarity * prefs.getValue(i); totalSimilarity += Math.abs(theSimilarity); score = preference / totalSimilarity; Tamas
-
Re: weighted scoreSean Owen 2010-02-22, 14:48
Yes this is very interesting --
On Mon, Feb 22, 2010 at 2:16 PM, Tamas Jambor <[EMAIL PROTECTED]> wrote: > If I take all the items with an average rating of less than 2.5, and > calculate the probability that > they will be get the highest score for each user (ie ranked first), I get > higher probability > than for items that have an average rating of 3.5. Yes that doesn't seem likely. There could be some bias in that you can't recommend items you already have rated, which would tend to be higher-ranked items, but I doubt that's the issue. Try another similarity metric? You don't have to use Pearson, see below. > I think the reason why it is biased is that in item-based recommendation > most of the time you can find > some kind of correlation between any given items. and even it is negatively > correlated you take it into account towards the score. > For example if I take 4 items rated 1,1,5,5 by the user and the correlation > between the target item is 1,1,0,0 respectively, I get 2 using > your calculation and 1 using the standard one as follows: > > preference += theSimilarity * prefs.getValue(i); > totalSimilarity += Math.abs(theSimilarity); > score = preference / totalSimilarity; What if the weights are 1,1,-1,-1? The estimate is -2 then. This is why I say this won't work. While in general I could ask why 2 is necessarily the "wrong" answer and 1 is "right" -- in the case Pearson I agree that 1 is the right answer. This isn't necessarily true for other similarity measures, where 0 doesn't have to mean "no mutual information". But perhaps I have overlooked another way to 'fix' the negative weight issue that is also compatible with Pearson's characteristics? In the world of users, I would argue that a similarity of 0, even when it is a 0 from a Pearson correlation, means there is *some* relationship between the two users -- they overlap in some items out the very many out there, which is a positive association. So, factoring in uncorrelated users is, I would say, more valid than ignoring them. That's one reason I actually like the effect of the "+1" over "+0". I think this is less true for items, as you say, since in many cases (like yours I think) there are more users than items. It is more likely to be able to compute some similarity between items; the existence of any similarity at all means less. The "+1" could distort more than "+0" -- but again I am not sure what else to do as "+0" leads to ill-defined results. But for your purposes you can easily adjust the implementation if you like. You could drop all non-positive similarities from consideration for example. You could just use a different implementation if it works better.
-
Re: weighted scoreTamas Jambor 2010-02-22, 16:18
> What if the weights are 1,1,-1,-1? The estimate is -2 then. This is > why I say this won't work. > you could trim the result so you would get 1, which is the same as what you get with your approach. > While in general I could ask why 2 is necessarily the "wrong" answer > and 1 is "right" -- in the case Pearson I agree that 1 is the right > answer. This isn't necessarily true for other similarity measures, > where 0 doesn't have to mean "no mutual information". > if you take cosine similarity, 0 mean that vectors are independent, that is also an implication that there is no mutual information. although cosine cannot get negative in this case. > In the world of users, I would argue that a similarity of 0, even when > it is a 0 from a Pearson correlation, means there is *some* > relationship between the two users -- they overlap in some items out > the very many out there, which is a positive association. So, > factoring in uncorrelated users is, I would say, more valid than > ignoring them. That's one reason I actually like the effect of the > "+1" over "+0". > > it is true that in my case this modification doesn't have that distinctive effect on the probability values using user-based recommender. > I think this is less true for items, as you say, since in many cases > (like yours I think) there are more users than items. It is more > likely to be able to compute some similarity between items; the > existence of any similarity at all means less. The "+1" could distort > more than "+0" -- but again I am not sure what else to do as "+0" > leads to ill-defined results. > > I agree that in this case you take into account information that is not relevant, and that distorts the result. actually I just ran a quick evaluation comparing the two approaches. I used some IR measures to evaluate. The first one is your approach, and the second one is the one I mentioned in the previous email. (movielens 1m dataset, item-based, pearson) NDCG@10 - 0.7285387375322516, 0.7634082972451305 NDCG@5 - 0.6967904224170633, 0.7549089634943423 PRECISION@10 - 0.6995672012037591, 0.7168691567887784 PRECISION@5 - 0.6973511214230481, 0.7418986852281506 MRR - 0.7998089019219268, 0.8685472365056628 (movielens 1m dataset, user-based (neighbourhood size - 200), pearson) NDCG@10 - 0.7646000398212323, 0.7644630753602327 NDCG@5 - 0.7423746185882404, 0.7420668850370802 PRECISION@10 - 0.7225191341799975, 0.7224362566729521 PRECISION@5 - 0.7364440024310716, 0.7360793414000781 MRR - 0.8422149426811263, 0.8421165522048173
-
Re: weighted scoreSean Owen 2010-02-22, 17:05
On Mon, Feb 22, 2010 at 4:18 PM, Tamas Jambor <[EMAIL PROTECTED]> wrote:
> >> What if the weights are 1,1,-1,-1? The estimate is -2 then. This is >> why I say this won't work. >> > > you could trim the result so you would get 1, which is the same as what you > get with your approach. What about the case where you have just 1-2 similar items, all similarity 0. The result is undefined. That could be patched too, but it's why it just feels like defining it this way is problematic. You can't meaningfully take a weighted average with negative weights. > if you take cosine similarity, 0 mean that vectors are independent, that is > also an implication that > there is no mutual information. although cosine cannot get negative in this > case. The cosine similarity can be negative, but yes 0 here also means no relation. This isn't true of, say, a Euclidean distance-based measure. > evaluate. The first one is your approach, and the second one is the one I > mentioned in the previous email. How are you dealing with negative/undefined predictions though? I am also not sure what defining it this does to the accuracy of estimated preference values, which is what the evaluator would test. This tends to push estimates towards negative values. I could believe it works a little better for your data set -- so you should use your variation, especially if you only care about precision and recall. I don't know that this is going to be better in general -- honestly don't know, I haven't studied this. Failing that, I am just having a hard time implementing this as a ill-defined weighted average, no matter what it seems to do to one data set. is there no third way? maybe someone can think of a standard solution to this.
-
Re: weighted scoreTamas Jambor 2010-02-22, 18:41
> The cosine similarity can be negative, but yes 0 here also means no > relation. This isn't true of, say, a Euclidean distance-based measure. > > in this case we only have positive ratings, so all the vectors stay in that space, that means it can't be negative. >> evaluate. The first one is your approach, and the second one is the one I >> mentioned in the previous email. >> > How are you dealing with negative/undefined predictions though? I am > also not sure what defining it this does to the accuracy of estimated > preference values, which is what the evaluator would test. This tends > to push estimates towards negative values. > > I could believe it works a little better for your data set -- so you > should use your variation, especially if you only care about precision > and recall. I don't know that this is going to be better in general -- > honestly don't know, I haven't studied this. Failing that, I am just > having a hard time implementing this as a ill-defined weighted > average, no matter what it seems to do to one data set. > it is also arguable if it is better to interpret negative correlation towards to score, especially with Pearson. maybe the best solution would be to dump all the negative values as not meaningful. but I personally think that leaving zero correlation as zero is quite important. Tamas
-
Re: weighted scoreTed Dunning 2010-02-22, 18:47
This all smells a lot like the problems that crop up in training
classifiers. Lots of systems have trouble with asymmetric goals (like 1 to 5) and are massively helped by offsetting by the mean (changing rating scale to -2 to 2 might help, or actually subtracting the observed mean). This comes up all the time in neural net training algorithms. Only in properly done regression systems like logistic regression where you fully actually take into account a loss function does this not bite you. Even so, the form of the loss function may be much simpler with one formulation or the other and interpretation of weights can also be simpler. I am not familiar with the details under discussion here, but just looking at the words being used makes it sound like pretty much the same problem. On Mon, Feb 22, 2010 at 9:05 AM, Sean Owen <[EMAIL PROTECTED]> wrote: > >> What if the weights are 1,1,-1,-1? The estimate is -2 then. This is > >> why I say this won't work > -- Ted Dunning, CTO DeepDyve
-
Re: weighted scoreSean Owen 2010-02-23, 11:49
Yes I want to keep thinking about this. I'm not satisfied that the
right answer is clear. Ted do you have any standard advice about how people do weighted averages when weights are negative? Capping the estimated preference could be reasonable in practice. It feels funny, but, it's also rare that the weighted average comes out negative. And, it merely affects estimates on items that are not going to be recommended. I'd have to add to recommenders an ability to specify the minimum and maximum possible preference. Not hard. Any thoughts on this? On Mon, Feb 22, 2010 at 6:47 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > This all smells a lot like the problems that crop up in training > classifiers. > > Lots of systems have trouble with asymmetric goals (like 1 to 5) and are > massively helped by offsetting by the mean (changing rating scale to -2 to 2 > might help, or actually subtracting the observed mean). > > This comes up all the time in neural net training algorithms. > > Only in properly done regression systems like logistic regression where you > fully actually take into account a loss function does this not bite you. > Even so, the form of the loss function may be much simpler with one > formulation or the other and interpretation of weights can also be simpler. > > I am not familiar with the details under discussion here, but just looking > at the words being used makes it sound like pretty much the same problem. > > On Mon, Feb 22, 2010 at 9:05 AM, Sean Owen <[EMAIL PROTECTED]> wrote: > >> >> What if the weights are 1,1,-1,-1? The estimate is -2 then. This is >> >> why I say this won't work >> > > > > -- > Ted Dunning, CTO > DeepDyve >
-
Re: weighted scoreSean Owen 2010-02-23, 23:25
Yes I think I understand what you're getting at and the examples. Loss
function here is just the 'penalty' for predicting a rating near to those of dissimilar users and far from those of similar users? If I read correctly, you think that a 'weighted average' (with negative weights in numerator and denominator -- yeah I don't like using the absolute value in the denominator, conceptually) plus capping is an intellectually sound way of handling this situation. And I think I am convinced by this way of rationalizing it, and so am no longer scared of negative weights. Let me then create a patch for this. On Tue, Feb 23, 2010 at 7:21 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > Weights can't be negative and still be weights. You can have large > (positive) weights on negative training examples (aka "not like this"), but > you can't really have a negative weight. >
-
Re: weighted scoreTed Dunning 2010-02-23, 23:32
On Tue, Feb 23, 2010 at 3:25 PM, Sean Owen <[EMAIL PROTECTED]> wrote:
> Yes I think I understand what you're getting at and the examples. Loss > function here is just the 'penalty' for predicting a rating near to > those of dissimilar users and far from those of similar users? > Yes. Exactly. > If I read correctly, you think that a 'weighted average' (with > negative weights in numerator and denominator...) plus > capping is an intellectually sound way of handling this situation. Exactly. And I think that the examples demonstrate reasonable behavior in a variety of regimes.
-
Re: weighted scoreTamas Jambor 2010-02-24, 00:16
ok. I understand now. but you would you express this loss function
mathematically? also there is one example when it wouldn't work: -1 similarity to one user with a single 1 rating and +1 similarity to another user with a 5 rating. In this case, the weighted average is undefined. but in practice this would be an easy 3. Tamas On 23/02/2010 23:32, Ted Dunning wrote: > On Tue, Feb 23, 2010 at 3:25 PM, Sean Owen<[EMAIL PROTECTED]> wrote: > > >> Yes I think I understand what you're getting at and the examples. Loss >> function here is just the 'penalty' for predicting a rating near to >> those of dissimilar users and far from those of similar users? >> >> > Yes. Exactly. > > > >> If I read correctly, you think that a 'weighted average' (with >> negative weights in numerator and denominator...) plus >> capping is an intellectually sound way of handling this situation. >> > > Exactly. And I think that the examples demonstrate reasonable behavior in a > variety of regimes. > >
-
Re: weighted scoreTed Dunning 2010-02-24, 00:28
On Tue, Feb 23, 2010 at 4:16 PM, Tamas Jambor <[EMAIL PROTECTED]>wrote:
> ok. I understand now. but you would you express this loss function > mathematically? > Did you mean to ask "HOW would I express this"? For many applications where averages seem usable with positive weights, I would use squared distance from positive examples and negative squared distance from negative examples. > > also there is one example when it wouldn't work: > > -1 similarity to one user with a single 1 rating and +1 similarity to > another user with a 5 rating. In this case, the > weighted average is undefined. but in practice this would be an easy 3. > 3 is incorrect, actually. The fact that the sum of the weights does, indeed, indicate that there is no single optimum. Since the numerator in the weighted average is non-zero, we know that the slope of the loss function is constant and negative. This means that the best choice within our constraints is to pick 5. This is better than 3 because it is both farther from the negatively weighted 1 rating and closer to positively rated 5 rating. Because of the peculiarity of squared error, 6 would be even better because moving a step further from the 1 outweighs the movement away from the 5. If we used mean absolute error, we would lose the applicability of weighted averages, but sometimes get more sensible answers because we wouldn't over-weight the outliers. This is known as "using an L_1 loss function" (as opposed to an L_2 loss). The median is an example of a statistic motivated by the L_1 loss. For L_1 loss in your example, any result less than 1 has the same loss, loss decreases rapidly between 1 and 5 and then is constant to the right of 5. Within our constraints, the optimum is 5.
-
Re: weighted scoreTamas Jambor 2010-02-24, 02:39
On 24/02/2010 00:28, Ted Dunning wrote:
> > For many applications where averages seem usable with positive weights, I > would use squared distance from positive examples and negative squared > distance from negative examples. > > > so it is going to maximize the distance between the prediction and dissimilar users, and minimize otherwise. is it something like this? argmin sum(c*(p-r)^2) where c is the correlation, p is the prediction and r is the true rating. T
-
Re: weighted scoreTed Dunning 2010-02-24, 05:55
I think so. To be more specific:
argmin_p sum_i (c_i * (p - r_i)^2 ) And I would call the different r_i "observed" rather than "true" ratings and note that they are plural (English is ambiguous there). On Tue, Feb 23, 2010 at 6:39 PM, Tamas Jambor <[EMAIL PROTECTED]>wrote: > argmin sum(c*(p-r)^2) > > where c is the correlation, p is the prediction and r is the true rating. > -- Ted Dunning, CTO DeepDyve |