|
|
-
Item Recommendations - Time based
Mridul Kapoor 2012-03-12, 13:28
I have been ramping up on Mahout recently. Found the book Mahout in Action really very helpful in this regard.
I have been planning to write a custom recommender (item-based). Would really appreciate help in this regard. What I actually have is something like this
In the system, there is no inherent explicit rating or preference system -- > either the user would have consumed a content, or would not have consumed > it. But then, I specifically want to consider 2 items most similar when > they are consumed/viewed within 1(one) hour of each other most number of > times.
Say, for Item X :
User U1 consumes(views) it at time T1 -- and in T1 +(-) 1 hour --- U1 also > consumes Items a, b, c and d.
User U2 consumes(views) it at time T2 -- and in T2 +(-) 1 hour --- U2 also > consumes Items a, c and e.
User U3 consumes(views) it at time T3 -- and in T3 +(-) 1 hour --- U3 also > consumes Items a and b. >
> So we would go on to say that Item X is co-occurring mostly with Item a > (at 3 instances), followed by Item b,c (each at 2 instances) and so on...
This is pretty much how I would like to compute the similarities . Going through the book Mahout in Action and other online resources, I wasn't able to find an implementation close enough to this. Would appreciate help on how to go further on this -- some pointers to how should I go on about it
Thanks Mridul
-
Re: Item Recommendations - Time based
Sean Owen 2012-03-12, 16:12
You can implement your own custom ItemSimilarity that computes this metric, or anything else you can imagine. In fact there is already a bit of API in DataModel for storing and retrieving timestamps too, so this should be easy.
It's probably a bit easier said than done given the exact logic you're implementing, but, that's how you'd approach it.
Sean
On Mon, Mar 12, 2012 at 1:28 PM, Mridul Kapoor <[EMAIL PROTECTED]> wrote: > I have been ramping up on Mahout recently. Found the book Mahout in Action > really very helpful in this regard. > > I have been planning to write a custom recommender (item-based). Would > really appreciate help in this regard. What I actually have is something > like this > > In the system, there is no inherent explicit rating or preference system -- >> either the user would have consumed a content, or would not have consumed >> it. But then, I specifically want to consider 2 items most similar when >> they are consumed/viewed within 1(one) hour of each other most number of >> times. > > > > Say, for Item X : > > > > User U1 consumes(views) it at time T1 -- and in T1 +(-) 1 hour --- U1 also >> consumes Items a, b, c and d. > > > > User U2 consumes(views) it at time T2 -- and in T2 +(-) 1 hour --- U2 also >> consumes Items a, c and e. > > > > User U3 consumes(views) it at time T3 -- and in T3 +(-) 1 hour --- U3 also >> consumes Items a and b. >> > >> So we would go on to say that Item X is co-occurring mostly with Item a >> (at 3 instances), followed by Item b,c (each at 2 instances) and so on... > > > > This is pretty much how I would like to compute the similarities . > > > > > Going through the book Mahout in Action and other online resources, I > wasn't able to find an implementation close enough to this. Would > appreciate help on how to go further on this -- some pointers to how should > I go on about it > > Thanks > Mridul
-
Re: Item Recommendations - Time based
Mridul Kapoor 2012-03-12, 17:12
Thanks Sean, I intend to write my own custom ItemSimilarity. What would you suggest to help it scale up. (I'll be using the MongoDbDataModel -- with over 100GB of data in the mongo collection that is intended to use)
Mridul
-
Re: Item Recommendations - Time based
Sean Owen 2012-03-12, 17:19
Similarity computations need to be very fast. I don't know if you can pre-compute them since they're time-dependent and I assume need to use up-to-the-second information.
You'll need to store something in memory to make this fast enough. That can make scale a problem, but, I am also guessing you can perhaps get away with storing just the last N seconds of data, and constantly prune it? That might save you.
Sean
On Mon, Mar 12, 2012 at 5:12 PM, Mridul Kapoor <[EMAIL PROTECTED]> wrote: > Thanks Sean, > I intend to write my own custom ItemSimilarity. What would you suggest to > help it scale up. (I'll be using the MongoDbDataModel -- with over 100GB of > data in the mongo collection that is intended to use) > > Mridul
-
Re: Item Recommendations - Time based
Mridul Kapoor 2012-03-12, 17:25
Yes, I can very much keep them pre-computed in a database. I intend to refresh these on a regular basis -- maybe once or twice a week. So pre-computing them is not a problem. The idea is to use these pre-computed similarity values to pull up recommendations on the fly (using a most-similar-sort-of-metric). So that changes the picture here a bit. On the other hand, though, I would really want this computation to take place as fast as possible. Leveraging Hadoop somehow for the ItemSimilarity computation could help, maybe (correct me if I am wrong here, and if I am right, how would I go about it)
Thanks Again ! Mridul
-
Re: Item Recommendations - Time based
Sean Owen 2012-03-12, 17:29
OK if that's the case, put the pre-computed values in a GenericItemSimilarity and you're done.
Hadoop most certainly does not help you compute anything 'on the fly'. It might help you precompute. Don't worry about distribution until you're sure you have a big scale problem, and that usually takes quite a bit of scale!
Sean
On Mon, Mar 12, 2012 at 5:25 PM, Mridul Kapoor <[EMAIL PROTECTED]> wrote: > Yes, I can very much keep them pre-computed in a database. I intend to > refresh these on a regular basis -- maybe once or twice a week. So > pre-computing them is not a problem. > The idea is to use these pre-computed similarity values to pull up > recommendations on the fly (using a most-similar-sort-of-metric). > So that changes the picture here a bit. On the other hand, though, I would > really want this computation to take place as fast as possible. Leveraging > Hadoop somehow for the ItemSimilarity computation could help, maybe > (correct me if I am wrong here, and if I am right, how would I go about it) > > Thanks Again ! > Mridul
-
Re: Item Recommendations - Time based
Ted Dunning 2012-03-12, 17:46
In order to get time similarity that you want, you can have virtual users for each session as well as real users for longer time periods. The longer periods will have weaker statistics so you probably won't have to weight things.
This will let you use the standard Mahout framework for everything except munch through your logs to get sessionized usage.
On Mon, Mar 12, 2012 at 10:29 AM, Sean Owen <[EMAIL PROTECTED]> wrote:
> OK if that's the case, put the pre-computed values in a > GenericItemSimilarity and you're done. > > Hadoop most certainly does not help you compute anything 'on the fly'. > It might help you precompute. Don't worry about distribution until > you're sure you have a big scale problem, and that usually takes quite > a bit of scale! > > Sean > > On Mon, Mar 12, 2012 at 5:25 PM, Mridul Kapoor <[EMAIL PROTECTED]> > wrote: > > Yes, I can very much keep them pre-computed in a database. I intend to > > refresh these on a regular basis -- maybe once or twice a week. So > > pre-computing them is not a problem. > > The idea is to use these pre-computed similarity values to pull up > > recommendations on the fly (using a most-similar-sort-of-metric). > > So that changes the picture here a bit. On the other hand, though, I > would > > really want this computation to take place as fast as possible. > Leveraging > > Hadoop somehow for the ItemSimilarity computation could help, maybe > > (correct me if I am wrong here, and if I am right, how would I go about > it) > > > > Thanks Again ! > > Mridul >
-
Re: Item Recommendations - Time based
Ted Dunning 2012-03-12, 17:48
Sean's comment is dead-on and your design inclinations are just fine. Hadoop can (eventually) help with the offline item similarity computation. The existing Mahout recommendation engine can do the actual item recommendation work at very high speed with an appropriate data store.
On Mon, Mar 12, 2012 at 10:25 AM, Mridul Kapoor <[EMAIL PROTECTED]>wrote:
> The idea is to use these pre-computed similarity values to pull up > recommendations on the fly (using a most-similar-sort-of-metric). > So that changes the picture here a bit. On the other hand, though, I would > really want this computation to take place as fast as possible. Leveraging > Hadoop somehow for the ItemSimilarity computation could help, maybe > (correct me if I am wrong here, and if I am right, how would I go about it) >
-
Re: Item Recommendations - Time based
Mridul Kapoor 2012-03-12, 18:02
Cool. Thanks.
So, from whatever I have gathered, now the way forward should be that (and again please correct me wherever I may have misunderstood)
1. I'll implement my own Custom ItemSimilarity. Use it for precomputation of item-item similarity values offline. Save these in a datastore.
2. Maybe run the Recommender with a GenericItemSimilarity (using the precomputed values) and use Mahout's speedy recommender as a web service - and call it in my app. Ted - could you delve deeper about the part about the sessions that you mentioned - I didnt get it completely, where you mention about virtual users etc - I see a window of opportunity here - where I might have to do less customization - and save time by using the existing Mahout framework.
Mridul
-
Re: Item Recommendations - Time based
Ted Dunning 2012-03-12, 18:52
Actually I don't think that you will need to implement your own item similarity.
Just preprocess your input by grouping by user and sorting by time. Then break user sessions into separate "users" and emit the standard user,item,pref format for the mahout processing. The pref will be always 1 in this case.
This should be close to what you need. You might augment this input with the same thing except with a longer horizon for sessions.
Sent from my iPhonen
On Mar 12, 2012, at 11:02 AM, Mridul Kapoor <[EMAIL PROTECTED]> wrote:
> Cool. Thanks. > > So, from whatever I have gathered, now the way forward should be that (and > again please correct me wherever I may have misunderstood) > > 1. I'll implement my own Custom ItemSimilarity. Use it for precomputation > of item-item similarity values offline. Save these in a datastore. > > 2. Maybe run the Recommender with a GenericItemSimilarity (using the > precomputed values) and use Mahout's speedy recommender as a web service - > and call it in my app. > > > Ted - could you delve deeper about the part about the sessions that you > mentioned - I didnt get it completely, where you mention about virtual > users etc - I see a window of opportunity here - where I might have to do > less customization - and save time by using the existing Mahout framework. > > Mridul
-
Re: Item Recommendations - Time based
Mridul Kapoor 2012-03-12, 19:00
Ah, right. And then I could go onto implement TanimotoSimilarity or some other maybe Thanks a Lot ! Mridul
-
Re: Item Recommendations - Time based
Christoph Hermann 2012-03-12, 19:17
Am Montag, 12. März 2012, 14:28:35 schrieb Mridul Kapoor: Hi, > I have been planning to write a custom recommender (item-based). Would > really appreciate help in this regard. What I actually have is something > like this > > In the system, there is no inherent explicit rating or preference system -- > > > either the user would have consumed a content, or would not have consumed > > it. But then, I specifically want to consider 2 items most similar when > > they are consumed/viewed within 1(one) hour of each other most number of > > times. > > Say, for Item X : > User U1 consumes(views) it at time T1 -- and in T1 +(-) 1 hour --- U1 also > consumes Items a, b, c and d. > User U2 consumes(views) it at time T2 -- and in T2 +(-) 1 hour --- U2 also > consumes Items a, c and e. > User U3 consumes(views) it at time T3 -- and in T3 +(-) 1 hour --- U3 also > consumes Items a and b. > So we would go on to say that Item X is co-occurring mostly with Item a > (at 3 instances), followed by Item b,c (each at 2 instances) and so on... > > This is pretty much how I would like to compute the similarities . Thats very similar to what i have implemented apart from the fact that i did not limit the timeframe but took it as a quality measure (smaller time difference = better value for recommendation). Works quite well. The more data, the better. My use case were lecture material downloads where the time co-occurence is based on the fact that most students behave like all other students. You might want to read: http://algo.informatik.uni-freiburg.de/mitarbeiter/hermann/files/aace-ed-media-2010-word-final.pdf regards Christoph
-
Re: Item Recommendations - Time based
Ted Dunning 2012-03-12, 22:59
I would generally recommend using the LLR similarity.
But if you have an itch, scratch it. I do think we have a tanimoto similarity already, possibly under a slightly different name.
Sent from my iPhone
On Mar 12, 2012, at 2:00 PM, Mridul Kapoor <[EMAIL PROTECTED]> wrote:
> Ah, right. And then I could go onto implement TanimotoSimilarity or some > other maybe > Thanks a Lot ! > Mridul
-
Re: Item Recommendations - Time based
Sean Owen 2012-03-12, 23:03
(It's out there as TanimotoCoefficientSimilarity -- not named JaccardSimilarity or anything.)
On Mon, Mar 12, 2012 at 10:59 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > I would generally recommend using the LLR similarity. > > But if you have an itch, scratch it. I do think we have a tanimoto similarity already, possibly under a slightly different name.
|
|