|
|
-
Mahout Performance Issues with Item Based Recommender
Jonathan Nassau 2012-07-24, 19:46
I’m writing an item based recommender and am getting extreme bottlenecks on my speed during the “doGetCandidateItems” I’m open to *any *suggestions on how to improve the speed because right now it’s taking over an hour which makes no sense. Even if I use a .txt as my dataModel is it very slow.
The table has all the preferences (1000) for every single user (115000) and there are an average of only (50) prefs for user.
The full table will obviously have more than this but that’s the very small subset I’m using.
I want to recommend items and use RecommendedBecause() on each item to give some detail about each recommendation but with the speed like this I can’t even do that. Thanks so much code is below,
Jonathan
MySQLJDBCDataModel dataModel=*null*;
*try* {
Class.*forName*( "net.sourceforge.jtds.jdbc.Driver"); net.sourceforge.jtds.jdbcx.JtdsDataSource ds *new*net.sourceforge.jtds.jdbcx.JtdsDataSource();
ds.setServerName("xxxx");
ds.setDatabaseName("xxxx");
ds.setUser("xxxx");
ds.setPassword(*password*);
ds.setDomain("xxxx");
dataModel = *new* MySQLJDBCDataModel(
ds, "test_tbl", "user_id",
"item_id", "preference",*null*);
} *catch*(Exception e) {System.*out*.println("can't connect");} ItemSimilarity similarity = *new*FileItemSimilarity( *new* File("output/part-r-00000"));
ItemBasedRecommender recommender *new*GenericItemBasedRecommender(dataModel, similarity);
Recommender cachingRecommender *new*CachingRecommender(recommender);
List<RecommendedItem> uRec=cachingRecommender.recommend(userid,10);
System.*out*.print("Recommendations:"+ uRec);
}
-
Mahout Performance Issues with Item Based Recommender
Jonathan Nassau 2012-07-24, 20:07
I may have sent this already if so I apologize for the duplicate I’m writing an item based recommender and am getting extreme bottlenecks on my speed during the “doGetCandidateItems” I’m open to *any *suggestions on how to improve the speed because right now it’s taking over an hour which makes no sense. Even if I use a .txt as my dataModel is it very slow.
The table has all the preferences (1000) for every single user (115000) and there are an average of only (50) prefs for user.
The full table will obviously have more than this but that’s the very small subset I’m using.
I want to recommend items and use RecommendedBecause() on each item to give some detail about each recommendation but with the speed like this I can’t even do that. Thanks so much code is below,
Jonathan
MySQLJDBCDataModel dataModel=*null*;
*try* {
Class.*forName*( "net.sourceforge.jtds.jdbc.Driver"); net.sourceforge.jtds.jdbcx.JtdsDataSource ds =*new* net.sourceforge.jtds.jdbcx.JtdsDataSource();
ds.setServerName("xxxx");
ds.setDatabaseName("xxxx");
ds.setUser("xxxx");
ds.setPassword(*password*);
ds.setDomain("xxxx");
dataModel = *new* MySQLJDBCDataModel(
ds, "test_tbl", "user_id",
"item_id", "preference",*null*);
} *catch*(Exception e) {System.*out*.println("can't connect");} ItemSimilarity similarity = *new* FileItemSimilarity(*new*File("output/part-r-00000"));
ItemBasedRecommender recommender *new*GenericItemBasedRecommender(dataModel, similarity);
Recommender cachingRecommender = *new* CachingRecommender(recommender);
List<RecommendedItem> uRec=cachingRecommender.recommend(userid,10);
System.*out*.print("Recommendations:"+ uRec);
}
-
Re: Mahout Performance Issues with Item Based Recommender
Sean Owen 2012-07-24, 20:49
Unless your data set is tiny (thousands of users / items), you can't really run straight off a database. It is far too data intensive. Real-time always means "in memory" to me.
Look at the ReloadFromJDBCDataModel wrapper, which will cache the DB data in memory. This should be orders of magnitude faster. Of course, you have to have enough memory. But the scale you describe should comfortably fit on a normal server machine.
Sean
On Tue, Jul 24, 2012 at 8:46 PM, Jonathan Nassau <[EMAIL PROTECTED]>wrote:
> I’m writing an item based recommender and am getting extreme bottlenecks on > my speed during the “doGetCandidateItems” I’m open to *any *suggestions on > how to improve the speed because right now it’s taking over an hour which > makes no sense. Even if I use a .txt as my dataModel is it very slow. > > The table has all the preferences (1000) for every single user (115000) and > there are an average of only (50) prefs for user. > > The full table will obviously have more than this but that’s the very small > subset I’m using. > > > > I want to recommend items and use RecommendedBecause() on each item to give > some detail about each recommendation but with the speed like this I can’t > even do that. > > > Thanks so much code is below, > > Jonathan > > > > > > MySQLJDBCDataModel dataModel=*null*; > > *try* { > > Class.*forName*( > "net.sourceforge.jtds.jdbc.Driver"); > > > net.sourceforge.jtds.jdbcx.JtdsDataSource ds > *new*net.sourceforge.jtds.jdbcx.JtdsDataSource(); > > ds.setServerName("xxxx"); > > ds.setDatabaseName("xxxx"); > > ds.setUser("xxxx"); > > ds.setPassword(*password*); > > ds.setDomain("xxxx"); > > > > > > > > dataModel = *new* MySQLJDBCDataModel( > > ds, "test_tbl", > "user_id", > > "item_id", > "preference",*null*); > > > > } *catch*(Exception e) {System.*out*.println("can't > connect");} > > > > > ItemSimilarity similarity > *new*FileItemSimilarity( > *new* File("output/part-r-00000")); > > ItemBasedRecommender recommender > *new*GenericItemBasedRecommender(dataModel, similarity); > > Recommender cachingRecommender > *new*CachingRecommender(recommender); > > List<RecommendedItem> > uRec=cachingRecommender.recommend(userid,10); > > System.*out*.print("Recommendations:"+ > uRec); > > } >
-
Re: Mahout Performance Issues with Item Based Recommender
Jonathan Nassau 2012-07-24, 21:36
Thanks so much that sped it up incredible amounts to the same speed as when i just use the .txt in memory. However, it's still taking a full minute to get a recommendation for a user.
Are there any ways to speed it up more? On Tue, Jul 24, 2012 at 4:49 PM, Sean Owen <[EMAIL PROTECTED]> wrote:
> Unless your data set is tiny (thousands of users / items), you can't really > run straight off a database. It is far too data intensive. Real-time always > means "in memory" to me. > > Look at the ReloadFromJDBCDataModel wrapper, which will cache the DB data > in memory. This should be orders of magnitude faster. Of course, you have > to have enough memory. But the scale you describe should comfortably fit on > a normal server machine. > > Sean > > On Tue, Jul 24, 2012 at 8:46 PM, Jonathan Nassau > <[EMAIL PROTECTED]>wrote: > > > I’m writing an item based recommender and am getting extreme bottlenecks > on > > my speed during the “doGetCandidateItems” I’m open to *any *suggestions > on > > how to improve the speed because right now it’s taking over an hour which > > makes no sense. Even if I use a .txt as my dataModel is it very slow. > > > > The table has all the preferences (1000) for every single user (115000) > and > > there are an average of only (50) prefs for user. > > > > The full table will obviously have more than this but that’s the very > small > > subset I’m using. > > > > > > > > I want to recommend items and use RecommendedBecause() on each item to > give > > some detail about each recommendation but with the speed like this I > can’t > > even do that. > > > > > > Thanks so much code is below, > > > > Jonathan > > > > > > > > > > > > MySQLJDBCDataModel dataModel=*null*; > > > > *try* { > > > > Class.*forName*( > > "net.sourceforge.jtds.jdbc.Driver"); > > > > > > net.sourceforge.jtds.jdbcx.JtdsDataSource ds > > *new*net.sourceforge.jtds.jdbcx.JtdsDataSource(); > > > > ds.setServerName("xxxx"); > > > > ds.setDatabaseName("xxxx"); > > > > ds.setUser("xxxx"); > > > > ds.setPassword(*password*); > > > > ds.setDomain("xxxx"); > > > > > > > > > > > > > > > > dataModel = *new* MySQLJDBCDataModel( > > > > ds, > "test_tbl", > > "user_id", > > > > "item_id", > > "preference",*null*); > > > > > > > > } *catch*(Exception e) > {System.*out*.println("can't > > connect");} > > > > > > > > > > ItemSimilarity similarity > > *new*FileItemSimilarity( > > *new* File("output/part-r-00000")); > > > > ItemBasedRecommender recommender > > *new*GenericItemBasedRecommender(dataModel, similarity); > > > > Recommender cachingRecommender > > *new*CachingRecommender(recommender); > > > > List<RecommendedItem> > > uRec=cachingRecommender.recommend(userid,10); > > > > System.*out*.print("Recommendations:"+ > > uRec); > > > > } > > >
-
Re: Mahout Performance Issues with Item Based Recommender
Sean Owen 2012-07-24, 21:45
Hmm, that doesn't sound right. This isn't all that big for data.
Any chance you've run a profiler to see the hotspot
My guess is that you need to set a CandidateItemStrategy to cut down the number of items considered.
On Tue, Jul 24, 2012 at 10:36 PM, Jonathan Nassau <[EMAIL PROTECTED] > wrote:
> Thanks so much that sped it up incredible amounts to the same speed as when > i just use the .txt in memory. However, it's still taking a full minute to > get a recommendation for a user. > > Are there any ways to speed it up more? > >
-
Re: Mahout Performance Issues with Item Based Recommender
Jonathan Nassau 2012-07-24, 22:05
Yeah I haven't done that, i'm going to look into that now. But in case it could solve everything immediately, how would i set up a CandidateItemStrategy in a way that would speed up the data?
On Tue, Jul 24, 2012 at 5:45 PM, Sean Owen <[EMAIL PROTECTED]> wrote:
> Hmm, that doesn't sound right. This isn't all that big for data. > > Any chance you've run a profiler to see the hotspot > > My guess is that you need to set a CandidateItemStrategy to cut down the > number of items considered. > > On Tue, Jul 24, 2012 at 10:36 PM, Jonathan Nassau < > [EMAIL PROTECTED] > > wrote: > > > Thanks so much that sped it up incredible amounts to the same speed as > when > > i just use the .txt in memory. However, it's still taking a full minute > to > > get a recommendation for a user. > > > > Are there any ways to speed it up more? > > > > >
-
Re: Mahout Performance Issues with Item Based Recommender
Sean Owen 2012-07-25, 10:09
Look at SamplingCandidateItemsStrategy and its arguments. These are the knobs you can turn to reduce the amount of data considered. You might start with something low like 10 for each of the first 3 args.
You can set this on an ItemBasedRecommender once configured.
On Tue, Jul 24, 2012 at 11:05 PM, Jonathan Nassau <[EMAIL PROTECTED] > wrote:
> Yeah I haven't done that, i'm going to look into that now. > But in case it could solve everything immediately, how would i set up > a CandidateItemStrategy in a way that would speed up the data? > >
-
Re: Mahout Performance Issues with Item Based Recommender
Jonathan Nassau 2012-07-25, 18:59
I changed that by doing ItemBasedRecommender recommender new GenericItemBasedRecommender( model, similarity, new SamplingCandidateItemsStrategy(10, 10,10,115000,500), new SamplingCandidateItemsStrategy(10, 10,10,115000,500) ); It's taking 25 seconds now, so adding the SamplingCandidateItemsStrategy cut the time in half. I don't have any profilers to dig in deeper but I thought the natural output might help as it shows where the time lags are a little bit. If this helps awesome, if you need a profiler output or you can't help anymore I'll find a way to get one that works. (the one i have is broken for newer versions of eclipse)
12/07/25 14:52:48 WARN jdbc.AbstractJDBCDataModel: You are not using ConnectionPoolDataSource. Make sure your DataSource pools connections to the database itself, or database performance will be severely reduced. 12/07/25 14:52:48 INFO jdbc.ReloadFromJDBCDataModel: Loading new JDBC delegate data... 12/07/25 14:53:06 INFO model.GenericDataModel: Processed 10000 users 12/07/25 14:53:06 INFO model.GenericDataModel: Processed 20000 users 12/07/25 14:53:07 INFO model.GenericDataModel: Processed 30000 users 12/07/25 14:53:07 INFO model.GenericDataModel: Processed 40000 users 12/07/25 14:53:07 INFO model.GenericDataModel: Processed 50000 users 12/07/25 14:53:07 INFO model.GenericDataModel: Processed 60000 users 12/07/25 14:53:08 INFO model.GenericDataModel: Processed 70000 users 12/07/25 14:53:08 INFO model.GenericDataModel: Processed 80000 users 12/07/25 14:53:09 INFO model.GenericDataModel: Processed 90000 users 12/07/25 14:53:09 INFO model.GenericDataModel: Processed 100000 users 12/07/25 14:53:10 INFO model.GenericDataModel: Processed 110000 users 12/07/25 14:53:10 INFO model.GenericDataModel: Processed 115481 users 12/07/25 14:53:13 INFO jdbc.ReloadFromJDBCDataModel: New data loaded. 12/07/25 14:53:13 INFO file.FileItemSimilarity: Creating FileItemSimilarity for file output/part-r-00000 Wed Jul 25 14:53:16 EDT 2012 :done On Wed, Jul 25, 2012 at 6:09 AM, Sean Owen <[EMAIL PROTECTED]> wrote:
> Look at SamplingCandidateItemsStrategy and its arguments. These are the > knobs you can turn to reduce the amount of data considered. You might start > with something low like 10 for each of the first 3 args. > > You can set this on an ItemBasedRecommender once configured. > > On Tue, Jul 24, 2012 at 11:05 PM, Jonathan Nassau < > [EMAIL PROTECTED] > > wrote: > > > Yeah I haven't done that, i'm going to look into that now. > > But in case it could solve everything immediately, how would i set up > > a CandidateItemStrategy in a way that would speed up the data? > > > > >
|
|