|
|
Ted Dunning 2010-02-23, 19:21
Weights can't be negative and still be weights. You can have large (positive) weights on negative training examples (aka "not like this"), but you can't really have a negative weight.
How "not like this" is encoded depends a lot on the algorithm you are using. In a (roughly) least squares world such as used by correlational recommendation systems, you could invert the loss function so that you are going for maximum squared error for the negative examples. It is likely that you will have to avoid having the negative examples chase the solution arround more effectively than the positive examples attract it by using a looser loss function on negative examples. This also would take into account the fact that strong negative ratings are actually more like the right answer than the average case at large.
Averaging is just such a case since the mean is just the least squares solution for positive weights. When you have negative weights that represent examples that you want to avoid, you can simply include those weights in the weighted average and get the "correct" solution. Since the loss is unbounded negative, you get nonsensical solutions such as in your examples. For instance, if you have single negatively weighted example, the loss is minimized at infinity because the further you are from that example, the better. In your recommendations case, this should be handled by putting a constraint on the results (i.e. bounding to [1..5]). You also have to check your result to determine whether it is the maximum loss or minimum loss.
Example 1:
-1 similarity to a user with a single 5 rating on a movie and no other similarities to other users. Weighted average rating on this movie is (-1 * 5) / -1 = 5. But this is the loss MAXimum ... the worst possible answer. To get the right answer, we have to check both end-points. Within the range [1..5], the loss is lowest at 1.
Example 2:
-1 similarity to a user with a single 4 rating on a movie and nothing else. Weighted average is again the maximum and occurs at 4. Both endpoints are better than this, but 1 is further from the negative example and is this the best answer.
Example 3:
-1 similarity to one user with a single 4 rating and +1 similarity to another user with a 4 rating, both on the same movie. In this case, the weighted average is undefined (0/0). This occurs because the loss function is totally flat and has no optimum.
Example 4:
-1 similarity to a rating of 2, +1 similarity to a rating of 4 and +1 similarity to a rating of 5. The weighted average is (-2 + 4 + 5) / (-1+1+1) = 7 /1 = 7. The loss function is minimized at 7, but that is outside our constraints. Of the two end-points, 5 is the better answer.
On Tue, Feb 23, 2010 at 3:49 AM, Sean Owen <[EMAIL PROTECTED]> wrote:
> > Ted do you have any standard advice about how people do weighted > averages when weights are negative? -- Ted Dunning, CTO DeepDyve
-
Re: Fwd: weighted score
Tamas Jambor 2010-02-23, 20:54
not sure if I understand your examples. I thought this is not really a 'the loss function' since, these are memory based approaches, so there is no training in the classical machine learning sense.
Tamas
On 23/02/2010 19:21, Ted Dunning wrote: > Weights can't be negative and still be weights. You can have large > (positive) weights on negative training examples (aka "not like this"), but > you can't really have a negative weight. > > How "not like this" is encoded depends a lot on the algorithm you are > using. In a (roughly) least squares world such as used by correlational > recommendation systems, you could invert the loss function so that you are > going for maximum squared error for the negative examples. It is likely > that you will have to avoid having the negative examples chase the solution > arround more effectively than the positive examples attract it by using a > looser loss function on negative examples. This also would take into > account the fact that strong negative ratings are actually more like the > right answer than the average case at large. > > Averaging is just such a case since the mean is just the least squares > solution for positive weights. When you have negative weights that > represent examples that you want to avoid, you can simply include those > weights in the weighted average and get the "correct" solution. Since the > loss is unbounded negative, you get nonsensical solutions such as in your > examples. For instance, if you have single negatively weighted example, the > loss is minimized at infinity because the further you are from that example, > the better. In your recommendations case, this should be handled by putting > a constraint on the results (i.e. bounding to [1..5]). You also have to > check your result to determine whether it is the maximum loss or minimum > loss. > > Example 1: > > -1 similarity to a user with a single 5 rating on a movie and no other > similarities to other users. Weighted average rating on this movie is (-1 * > 5) / -1 = 5. But this is the loss MAXimum ... the worst possible answer. > To get the right answer, we have to check both end-points. Within the range > [1..5], the loss is lowest at 1. > > Example 2: > > -1 similarity to a user with a single 4 rating on a movie and nothing > else. Weighted average is again the maximum and occurs at 4. Both > endpoints are better than this, but 1 is further from the negative example > and is this the best answer. > > Example 3: > > -1 similarity to one user with a single 4 rating and +1 similarity to > another user with a 4 rating, both on the same movie. In this case, the > weighted average is undefined (0/0). This occurs because the loss function > is totally flat and has no optimum. > > Example 4: > > -1 similarity to a rating of 2, +1 similarity to a rating of 4 and +1 > similarity to a rating of 5. The weighted average is (-2 + 4 + 5) / > (-1+1+1) = 7 /1 = 7. The loss function is minimized at 7, but that is > outside our constraints. Of the two end-points, 5 is the better answer. > > On Tue, Feb 23, 2010 at 3:49 AM, Sean Owen<[EMAIL PROTECTED]> wrote: > > >> Ted do you have any standard advice about how people do weighted >> averages when weights are negative? >> > > > >
-
Re: Fwd: weighted score
Ted Dunning 2010-02-23, 23:07
Any time you are making an estimate, you have a loss function that expresses how much you like or dislike different estimates. In this memory based approach, you have several ratings that you would some how like to combine to get an estimate of the best predicted rating.
It is common to combine these ratings using a weighted average. Some approaches, however, come up with negative weights. In order to understand how to deal with negatively weighted examples, you have to go back to the underlying mathematics that is beneath the weighted average. That mathematics is expressed in terms of a quadratic loss function. This gives you three possibilities, one where positive weights dominate, one where negative weights dominate and a third where they balance out.
The examples I gave were in terms of the result of querying the similar users for movies with ratings. On Tue, Feb 23, 2010 at 12:54 PM, Tamas Jambor <[EMAIL PROTECTED]>wrote:
> not sure if I understand your examples. I thought this is not really a 'the > loss function' since, these are memory based approaches, so there is > no training in the classical machine learning sense. >
|
|