Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Mahout, mail # user - RecommenderJob uses indirection for ItemIDs


Copy link to this message
-
Re: RecommenderJob uses indirection for ItemIDs
Sean Owen 2011-06-12, 10:43
The keys have to be hashed to be used as int offsets into a vector. While
loading the mapping isn't ideal it does only scale as the number of items
and users.
 On Jun 12, 2011 3:47 AM, "Lance Norskog" <[EMAIL PROTECTED]> wrote:
> The RecommenderJob makes a "side" file which maps a fabricated integer
> index to a long ItemID. Why is this needed? Couldn't the
> RecommenderJob propagate the long ItemID directly? Note that this
> forces all instances of AggregateAndReduceRecommender to load the
> entire map. Part of the Map/Reduce rules are 'nothing needs to know
> everything'.
>
> Is this a sparse/dense optimization? If so, have the distributed
> algorithms advanced enough to make this indirection unnecessary?
>
> --
> Lance Norskog
> [EMAIL PROTECTED]