|
Way Cool
2012-06-21, 21:16
Sebastian Schelter
2012-06-21, 21:26
Sean Owen
2012-06-21, 21:29
Way Cool
2012-06-21, 22:26
Sean Owen
2012-06-21, 22:55
Saikat Kanjilal
2012-06-22, 01:53
Ted Dunning
2012-06-22, 04:33
Sebastian Schelter
2012-06-22, 05:43
Way Cool
2012-06-22, 05:57
Sean Owen
2012-06-22, 07:48
|
-
Performance issue with Item-based Recommendation and User-based RecommendationWay Cool 2012-06-21, 21:16
Hi, guys,
For item-based recommendation, I pre-calculated the item similarities on Hadoop per algorithm, which generated 20m rows each. The problem now is I can't just load them into memory via MySQLJDBCInMemoryItemSimilarity with 4GB memory. I tried MySQLJDBCItemSimilarity, however it's way too slow. What are the alternatives? For user-based recommendation, I can't load 100m lines of data model from FileDataModel into memory. It ran out of memory after 20m lines. The same issue with JDBCDataModel is way too slow. Does anyone precalculate the user similarities before and recommend items to a user? Anyone had the similar issues before? Thanks, YG
-
Re: Performance issue with Item-based Recommendation and User-based RecommendationSebastian Schelter 2012-06-21, 21:26
Hi Way Cool,
How many users and items do you have, how many similar items per item do you store? And what's your scenario? Being limited to 4GB in a production machine seems a little odd. --sebastian On 21.06.2012 23:16, Way Cool wrote: > Hi, guys, > > For item-based recommendation, I pre-calculated the item similarities on > Hadoop per algorithm, which generated 20m rows each. The problem now is I > can't just load them into memory via MySQLJDBCInMemoryItemSimilarity with > 4GB memory. I tried MySQLJDBCItemSimilarity, however it's way too slow. > What are the alternatives? > > For user-based recommendation, I can't load 100m lines of data model from > FileDataModel into memory. It ran out of memory after 20m lines. The same > issue with JDBCDataModel is way too slow. Does anyone precalculate the user > similarities before and recommend items to a user? > > Anyone had the similar issues before? > > Thanks, > > YG >
-
Re: Performance issue with Item-based Recommendation and User-based RecommendationSean Owen 2012-06-21, 21:29
I would suggest pruning similarities near 0, and then treating missing
similarities as 0 later at runtime. It may take a bit of coding. But you should be able to throw away a lot without compromising much of the result. On Thu, Jun 21, 2012 at 10:16 PM, Way Cool <[EMAIL PROTECTED]> wrote: > Hi, guys, > > For item-based recommendation, I pre-calculated the item similarities on > Hadoop per algorithm, which generated 20m rows each. The problem now is I > can't just load them into memory via MySQLJDBCInMemoryItemSimilarity with > 4GB memory. I tried MySQLJDBCItemSimilarity, however it's way too slow. > What are the alternatives? > > For user-based recommendation, I can't load 100m lines of data model from > FileDataModel into memory. It ran out of memory after 20m lines. The same > issue with JDBCDataModel is way too slow. Does anyone precalculate the user > similarities before and recommend items to a user? > > Anyone had the similar issues before? > > Thanks, > > YG
-
Re: Performance issue with Item-based Recommendation and User-based RecommendationWay Cool 2012-06-21, 22:26
Thanks guys for your quick response.
We have a couple millions of items and 40 millions users (including anonymous users). Up to 50 items were generated per item. I will try minimum similarity. Is there any document or a parameter defined in itemsimilarity job? What about user-based recommendation? Any ideas how we can make that happen without loading everything in memory? Thanks. On Thu, Jun 21, 2012 at 3:29 PM, Sean Owen <[EMAIL PROTECTED]> wrote: > I would suggest pruning similarities near 0, and then treating missing > similarities as 0 later at runtime. It may take a bit of coding. But > you should be able to throw away a lot without compromising much of > the result. > > On Thu, Jun 21, 2012 at 10:16 PM, Way Cool <[EMAIL PROTECTED]> wrote: > > Hi, guys, > > > > For item-based recommendation, I pre-calculated the item similarities on > > Hadoop per algorithm, which generated 20m rows each. The problem now is I > > can't just load them into memory via MySQLJDBCInMemoryItemSimilarity with > > 4GB memory. I tried MySQLJDBCItemSimilarity, however it's way too slow. > > What are the alternatives? > > > > For user-based recommendation, I can't load 100m lines of data model from > > FileDataModel into memory. It ran out of memory after 20m lines. The same > > issue with JDBCDataModel is way too slow. Does anyone precalculate the > user > > similarities before and recommend items to a user? > > > > Anyone had the similar issues before? > > > > Thanks, > > > > YG >
-
Re: Performance issue with Item-based Recommendation and User-based RecommendationSean Owen 2012-06-21, 22:55
OK, you're already pruning a fair bit then, in the sense that you keep
top 50 similarities (by absolute value) per item. More is probably not productive as you're already keeping only a small fraction of all of them. (100M pairs and ~20 bytes needed per pair... should get in about 2GB of heap. That's a lot of the 4GB you have available but seems like it ought to about fit? are you giving Java enough heap? here are my general default settings for this kind of app -- applicable here too: http://myrrix.com/documentation-serving-layer/) You just have a load of items. Any process that scales as the square of the number of items is going to hurt when you get to millions of them. A process based on user-user similarity, when there are 40M, is only going to be much worse. Consider not pre-computing all these pairs. Compute them and cache them in real-time. Instead use the CandidateItemStrategy to significantly reduce the number of item-item similarities you need to look at. That may mitigate the fact that you don't have them all in memory. You can throw more hardware at this, if you're willing to move to a completely batch-oriented Hadoop-based computation. You won't be limited by RAM but it will be an offline process. I am a big fan of matrix-factorization-based at the moment since you can run most of the computation offline whenever you like, but still make real-time approximate updates. These sorts of things only scale linearly with the number of items and users, and not even with the size of the pref input. I think you may have to shoot for this kind of hybrid system in the end to do updates in real-time. On Thu, Jun 21, 2012 at 11:26 PM, Way Cool <[EMAIL PROTECTED]> wrote: > Thanks guys for your quick response. > > We have a couple millions of items and 40 millions users (including > anonymous users). Up to 50 items were generated per item. > > I will try minimum similarity. Is there any document or a parameter defined > in itemsimilarity job? > > What about user-based recommendation? Any ideas how we can make that happen > without loading everything in memory? > > Thanks. > > > On Thu, Jun 21, 2012 at 3:29 PM, Sean Owen <[EMAIL PROTECTED]> wrote: > >> I would suggest pruning similarities near 0, and then treating missing >> similarities as 0 later at runtime. It may take a bit of coding. But >> you should be able to throw away a lot without compromising much of >> the result. >> >> On Thu, Jun 21, 2012 at 10:16 PM, Way Cool <[EMAIL PROTECTED]> wrote: >> > Hi, guys, >> > >> > For item-based recommendation, I pre-calculated the item similarities on >> > Hadoop per algorithm, which generated 20m rows each. The problem now is I >> > can't just load them into memory via MySQLJDBCInMemoryItemSimilarity with >> > 4GB memory. I tried MySQLJDBCItemSimilarity, however it's way too slow. >> > What are the alternatives? >> > >> > For user-based recommendation, I can't load 100m lines of data model from >> > FileDataModel into memory. It ran out of memory after 20m lines. The same >> > issue with JDBCDataModel is way too slow. Does anyone precalculate the >> user >> > similarities before and recommend items to a user? >> > >> > Anyone had the similar issues before? >> > >> > Thanks, >> > >> > YG >>
-
Re: Performance issue with Item-based Recommendation and User-based RecommendationSaikat Kanjilal 2012-06-22, 01:53
I'm using the Hadoop based ItemSimilarity but will be preloading the results into Cassandra and using that as the real time data output, will let you know how it goes.
Sent from my iPhone On Jun 21, 2012, at 2:16 PM, Way Cool <[EMAIL PROTECTED]> wrote: > Hi, guys, > > For item-based recommendation, I pre-calculated the item similarities on > Hadoop per algorithm, which generated 20m rows each. The problem now is I > can't just load them into memory via MySQLJDBCInMemoryItemSimilarity with > 4GB memory. I tried MySQLJDBCItemSimilarity, however it's way too slow. > What are the alternatives? > > For user-based recommendation, I can't load 100m lines of data model from > FileDataModel into memory. It ran out of memory after 20m lines. The same > issue with JDBCDataModel is way too slow. Does anyone precalculate the user > similarities before and recommend items to a user? > > Anyone had the similar issues before? > > Thanks, > > YG
-
Re: Performance issue with Item-based Recommendation and User-based RecommendationTed Dunning 2012-06-22, 04:33
Of course, after pruning to 50 lines per item, the size is linear in the
number of items. The cost to get there may be quadratic, but the final size should be linear. On Thu, Jun 21, 2012 at 4:55 PM, Sean Owen <[EMAIL PROTECTED]> wrote: > You just have a load of items. Any process that scales as the square > of the number of items is going to hurt when you get to millions of > them. A process based on user-user similarity, when there are 40M, is > only going to be much worse. >
-
Re: Performance issue with Item-based Recommendation and User-based RecommendationSebastian Schelter 2012-06-22, 05:43
What is your usecase exactly that you have millions of items but only
4GB RAM on the server? Curious :) On 22.06.2012 00:26, Way Cool wrote: > Thanks guys for your quick response. > > We have a couple millions of items and 40 millions users (including > anonymous users). Up to 50 items were generated per item. > > I will try minimum similarity. Is there any document or a parameter defined > in itemsimilarity job? > > What about user-based recommendation? Any ideas how we can make that happen > without loading everything in memory? > > Thanks. > > > On Thu, Jun 21, 2012 at 3:29 PM, Sean Owen <[EMAIL PROTECTED]> wrote: > >> I would suggest pruning similarities near 0, and then treating missing >> similarities as 0 later at runtime. It may take a bit of coding. But >> you should be able to throw away a lot without compromising much of >> the result. >> >> On Thu, Jun 21, 2012 at 10:16 PM, Way Cool <[EMAIL PROTECTED]> wrote: >>> Hi, guys, >>> >>> For item-based recommendation, I pre-calculated the item similarities on >>> Hadoop per algorithm, which generated 20m rows each. The problem now is I >>> can't just load them into memory via MySQLJDBCInMemoryItemSimilarity with >>> 4GB memory. I tried MySQLJDBCItemSimilarity, however it's way too slow. >>> What are the alternatives? >>> >>> For user-based recommendation, I can't load 100m lines of data model from >>> FileDataModel into memory. It ran out of memory after 20m lines. The same >>> issue with JDBCDataModel is way too slow. Does anyone precalculate the >> user >>> similarities before and recommend items to a user? >>> >>> Anyone had the similar issues before? >>> >>> Thanks, >>> >>> YG >> >
-
Re: Performance issue with Item-based Recommendation and User-based RecommendationWay Cool 2012-06-22, 05:57
4GB can fit one or two types of item similarities, however I have couple
more based on different similarity measurements. For user-user similarity, I don't think we can compute them and cache them at the run-time because of high memory consumption. As you know the data model (preferences) alone can't fit in 4GB memory. I will try SVD and ALS. Are they good for both user-based and item-based recommendations? Thanks. On Thu, Jun 21, 2012 at 4:55 PM, Sean Owen <[EMAIL PROTECTED]> wrote: > OK, you're already pruning a fair bit then, in the sense that you keep > top 50 similarities (by absolute value) per item. More is probably not > productive as you're already keeping only a small fraction of all of > them. > > (100M pairs and ~20 bytes needed per pair... should get in about 2GB > of heap. That's a lot of the 4GB you have available but seems like it > ought to about fit? are you giving Java enough heap? here are my > general default settings for this kind of app -- applicable here too: > http://myrrix.com/documentation-serving-layer/) > > You just have a load of items. Any process that scales as the square > of the number of items is going to hurt when you get to millions of > them. A process based on user-user similarity, when there are 40M, is > only going to be much worse. > > Consider not pre-computing all these pairs. Compute them and cache > them in real-time. Instead use the CandidateItemStrategy to > significantly reduce the number of item-item similarities you need to > look at. That may mitigate the fact that you don't have them all in > memory. > > > You can throw more hardware at this, if you're willing to move to a > completely batch-oriented Hadoop-based computation. You won't be > limited by RAM but it will be an offline process. > > > I am a big fan of matrix-factorization-based at the moment since you > can run most of the computation offline whenever you like, but still > make real-time approximate updates. These sorts of things only scale > linearly with the number of items and users, and not even with the > size of the pref input. I think you may have to shoot for this kind of > hybrid system in the end to do updates in real-time. > > > On Thu, Jun 21, 2012 at 11:26 PM, Way Cool <[EMAIL PROTECTED]> wrote: > > Thanks guys for your quick response. > > > > We have a couple millions of items and 40 millions users (including > > anonymous users). Up to 50 items were generated per item. > > > > I will try minimum similarity. Is there any document or a parameter > defined > > in itemsimilarity job? > > > > What about user-based recommendation? Any ideas how we can make that > happen > > without loading everything in memory? > > > > Thanks. > > > > > > On Thu, Jun 21, 2012 at 3:29 PM, Sean Owen <[EMAIL PROTECTED]> wrote: > > > >> I would suggest pruning similarities near 0, and then treating missing > >> similarities as 0 later at runtime. It may take a bit of coding. But > >> you should be able to throw away a lot without compromising much of > >> the result. > >> > >> On Thu, Jun 21, 2012 at 10:16 PM, Way Cool <[EMAIL PROTECTED]> > wrote: > >> > Hi, guys, > >> > > >> > For item-based recommendation, I pre-calculated the item similarities > on > >> > Hadoop per algorithm, which generated 20m rows each. The problem now > is I > >> > can't just load them into memory via MySQLJDBCInMemoryItemSimilarity > with > >> > 4GB memory. I tried MySQLJDBCItemSimilarity, however it's way too > slow. > >> > What are the alternatives? > >> > > >> > For user-based recommendation, I can't load 100m lines of data model > from > >> > FileDataModel into memory. It ran out of memory after 20m lines. The > same > >> > issue with JDBCDataModel is way too slow. Does anyone precalculate the > >> user > >> > similarities before and recommend items to a user? > >> > > >> > Anyone had the similar issues before? > >> > > >> > Thanks, > >> > > >> > YG > >> >
-
Re: Performance issue with Item-based Recommendation and User-based RecommendationSean Owen 2012-06-22, 07:48
SVD and ALS aren't "user-based" or "item-based". They don't operate by
computing similarities, which is the good news. I think the ALS model is more appropriate. The SVD is more sophisticated (complex and hard to compute), arguably "overkill" for what recommenders need to do, and doesn't deal with sparse input as well. On Fri, Jun 22, 2012 at 6:57 AM, Way Cool <[EMAIL PROTECTED]> wrote: > I will try SVD and ALS. Are they good for both user-based and item-based > recommendations? |