|
|
-
Recommendation scores from LogLikelihood Similarity recommender
Will C 2012-04-15, 22:47
I have a boolean input dataset, with user, item, and preference. Each preference is a 1.0 if it exists. Based on this dataset I had used a Tanimoto Similarity and tried both Boolean Pref User and Item Recommenders. After reading Mahout in Action and several threads on stack overflow, I saw that the LogLikelihood Similarity model was recommended for boolean dataset recommenders.
However, the scores I get for the recommended items using the LogLikelihood similarity are sometimes much greater than 1.0, even though none of the input scores are higher than that. I saw scores of 11.0 being returned for some users' recommendations.
This is making it very hard for me to use the scoring and estimation functions. I have switched back to Tanimoto for now, but am I doing something wrong, or am I incorrect in expecting the recommended scores and estimated preferences to be in the 0-1.0 range for this dataset?
-
Re: Recommendation scores from LogLikelihood Similarity recommender
Sean Owen 2012-04-16, 18:02
In the case of no ratings, the value you observe is *not* a predicted rating. After all, they are all 1.0 and so can't be used for ranking. The result is actually a sum of similarities, which is why it can be arbitrarily large. It is not supposed to be in [0,1] or anything like that.
On Sun, Apr 15, 2012 at 5:47 PM, Will C <[EMAIL PROTECTED]> wrote: > I have a boolean input dataset, with user, item, and preference. Each > preference is a 1.0 if it exists. Based on this dataset I had used a > Tanimoto Similarity and tried both Boolean Pref User and Item Recommenders. > > > After reading Mahout in Action and several threads on stack overflow, I saw > that the LogLikelihood Similarity model was recommended for boolean dataset > recommenders. > > However, the scores I get for the recommended items using the LogLikelihood > similarity are sometimes much greater than 1.0, even though none of the > input scores are higher than that. I saw scores of 11.0 being returned for > some users' recommendations. > > This is making it very hard for me to use the scoring and estimation > functions. I have switched back to Tanimoto for now, but am I doing > something wrong, or am I incorrect in expecting the recommended scores and > estimated preferences to be in the 0-1.0 range for this dataset?
-
Re: Recommendation scores from LogLikelihood Similarity recommender
Will C 2012-04-16, 18:35
Thanks for clearing that up.
On Mon, Apr 16, 2012 at 2:02 PM, Sean Owen <[EMAIL PROTECTED]> wrote:
> In the case of no ratings, the value you observe is *not* a predicted > rating. After all, they are all 1.0 and so can't be used for ranking. > The result is actually a sum of similarities, which is why it can be > arbitrarily large. It is not supposed to be in [0,1] or anything like > that. > > On Sun, Apr 15, 2012 at 5:47 PM, Will C <[EMAIL PROTECTED]> wrote: > > I have a boolean input dataset, with user, item, and preference. Each > > preference is a 1.0 if it exists. Based on this dataset I had used a > > Tanimoto Similarity and tried both Boolean Pref User and Item > Recommenders. > > > > > > After reading Mahout in Action and several threads on stack overflow, I > saw > > that the LogLikelihood Similarity model was recommended for boolean > dataset > > recommenders. > > > > However, the scores I get for the recommended items using the > LogLikelihood > > similarity are sometimes much greater than 1.0, even though none of the > > input scores are higher than that. I saw scores of 11.0 being returned > for > > some users' recommendations. > > > > This is making it very hard for me to use the scoring and estimation > > functions. I have switched back to Tanimoto for now, but am I doing > > something wrong, or am I incorrect in expecting the recommended scores > and > > estimated preferences to be in the 0-1.0 range for this dataset? >
-
Re: Recommendation scores from LogLikelihood Similarity recommender
Will C 2012-05-06, 17:48
So I've taken another try at using recommendations values. However, unlike something that a user is explicitly rating on a scale of 0-5. I am using a user's activity. Certain activities of a user toward an item are negative, and certain are positive.
If I have users 1 and 2 and 3, and product X, and their preferences are as follows:
1, X, -1 2, X, 1 3, X, 10
Clearly 2 and 3 are closer than 2 and 1, because they both like product X, just to varying degrees. However, most distance algorithms I've tried are incorrectly showing 1 and 2 closer because their difference is less.
Am I approaching this wrong? Other than switching to boolean preferences, is there a better way to approach this?
-Will
On Mon, Apr 16, 2012 at 2:35 PM, Will C <[EMAIL PROTECTED]> wrote:
> Thanks for clearing that up. > > > On Mon, Apr 16, 2012 at 2:02 PM, Sean Owen <[EMAIL PROTECTED]> wrote: > >> In the case of no ratings, the value you observe is *not* a predicted >> rating. After all, they are all 1.0 and so can't be used for ranking. >> The result is actually a sum of similarities, which is why it can be >> arbitrarily large. It is not supposed to be in [0,1] or anything like >> that. >> >> On Sun, Apr 15, 2012 at 5:47 PM, Will C <[EMAIL PROTECTED]> wrote: >> > I have a boolean input dataset, with user, item, and preference. Each >> > preference is a 1.0 if it exists. Based on this dataset I had used a >> > Tanimoto Similarity and tried both Boolean Pref User and Item >> Recommenders. >> > >> > >> > After reading Mahout in Action and several threads on stack overflow, I >> saw >> > that the LogLikelihood Similarity model was recommended for boolean >> dataset >> > recommenders. >> > >> > However, the scores I get for the recommended items using the >> LogLikelihood >> > similarity are sometimes much greater than 1.0, even though none of the >> > input scores are higher than that. I saw scores of 11.0 being returned >> for >> > some users' recommendations. >> > >> > This is making it very hard for me to use the scoring and estimation >> > functions. I have switched back to Tanimoto for now, but am I doing >> > something wrong, or am I incorrect in expecting the recommended scores >> and >> > estimated preferences to be in the 0-1.0 range for this dataset? >> > >
-
Re: Recommendation scores from LogLikelihood Similarity recommender
Sean Owen 2012-05-06, 18:24
That sounds a lot like something that the cosine similarity would pick up on for sure.
On Sun, May 6, 2012 at 6:48 PM, Will C <[EMAIL PROTECTED]> wrote:
> So I've taken another try at using recommendations values. However, unlike > something that a user is explicitly rating on a scale of 0-5. I am using a > user's activity. Certain activities of a user toward an item are negative, > and certain are positive. > > If I have users 1 and 2 and 3, and product X, and their preferences are as > follows: > > 1, X, -1 > 2, X, 1 > 3, X, 10 > > Clearly 2 and 3 are closer than 2 and 1, because they both like product X, > just to varying degrees. However, most distance algorithms I've tried are > incorrectly showing 1 and 2 closer because their difference is less. > > Am I approaching this wrong? Other than switching to boolean preferences, > is there a better way to approach this? > > -Will
-
Re: Recommendation scores from LogLikelihood Similarity recommender
Ted Dunning 2012-05-06, 19:53
As Sean points out, cosine should pick up on this. You will have the usual problems with small counts that any rating based system has.
And in spite of your last comment, I would strongly recommend that you test a boolean approach where in *any* action is considered positive and another where you consider only your positive actions and ignore your negative actions. If necessary, consider the negative actions at the presentation tier.
On Sun, May 6, 2012 at 10:48 AM, Will C <[EMAIL PROTECTED]> wrote:
> So I've taken another try at using recommendations values. However, unlike > something that a user is explicitly rating on a scale of 0-5. I am using a > user's activity. Certain activities of a user toward an item are negative, > and certain are positive. > > If I have users 1 and 2 and 3, and product X, and their preferences are as > follows: > > 1, X, -1 > 2, X, 1 > 3, X, 10 > > Clearly 2 and 3 are closer than 2 and 1, because they both like product X, > just to varying degrees. However, most distance algorithms I've tried are > incorrectly showing 1 and 2 closer because their difference is less. > > Am I approaching this wrong? Other than switching to boolean preferences, > is there a better way to approach this? >
-
Re: Recommendation scores from LogLikelihood Similarity recommender
Will C 2012-05-06, 20:47
Heh you're reading my mind.
I tried the cosine similarity and had exactly the problem with sparse rating recommendations that you mentioned. I'm switching back to the boolean data set and just having a minimum action threshold to cross, and I was just in the process of moving my logic around to handle negative actions as a filter.
Thanks for the quick responses!
-Will
On Sun, May 6, 2012 at 3:53 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
> As Sean points out, cosine should pick up on this. You will have the usual > problems with small counts that any rating based system has. > > And in spite of your last comment, I would strongly recommend that you test > a boolean approach where in *any* action is considered positive and another > where you consider only your positive actions and ignore your negative > actions. If necessary, consider the negative actions at the presentation > tier. > > On Sun, May 6, 2012 at 10:48 AM, Will C <[EMAIL PROTECTED]> wrote: > > > So I've taken another try at using recommendations values. However, > unlike > > something that a user is explicitly rating on a scale of 0-5. I am using > a > > user's activity. Certain activities of a user toward an item are > negative, > > and certain are positive. > > > > If I have users 1 and 2 and 3, and product X, and their preferences are > as > > follows: > > > > 1, X, -1 > > 2, X, 1 > > 3, X, 10 > > > > Clearly 2 and 3 are closer than 2 and 1, because they both like product > X, > > just to varying degrees. However, most distance algorithms I've tried > are > > incorrectly showing 1 and 2 closer because their difference is less. > > > > Am I approaching this wrong? Other than switching to boolean > preferences, > > is there a better way to approach this? > > >
|
|