Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Mahout, mail # user - Approaches for combining multiple types of item data for user-user similarity


Copy link to this message
-
Approaches for combining multiple types of item data for user-user similarity
Ken Krugler 2012-07-03, 22:20
Hi all,

I'm curious what approaches are recommended for generating user-user similarity, when I've got two (or more) distinct types of item data, both of which are fairly large.

E.g. let's say I had a set of users where I knew both (a) what books they had bought on Amazon, and (b) what YouTube videos they had watched.

For each user, I want to find the 10 most similar other users.

 - I could create two separate models, find the nearest 30 users for each user, and combine (maybe with weighting) the results.
 - I could toss all of the data into one model - and I could use a value of < 1.0 for whichever type of preference is less important.

Any other suggestions? Input on the above two approaches?

Thanks!

-- Ken

--------------------------
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr