|
|
-
RecommenderJob uses indirection for ItemIDs
Lance Norskog 2011-06-12, 02:47
The RecommenderJob makes a "side" file which maps a fabricated integer index to a long ItemID. Why is this needed? Couldn't the RecommenderJob propagate the long ItemID directly? Note that this forces all instances of AggregateAndReduceRecommender to load the entire map. Part of the Map/Reduce rules are 'nothing needs to know everything'.
Is this a sparse/dense optimization? If so, have the distributed algorithms advanced enough to make this indirection unnecessary?
-- Lance Norskog [EMAIL PROTECTED]
-
Re: RecommenderJob uses indirection for ItemIDs
Sean Owen 2011-06-12, 10:43
The keys have to be hashed to be used as int offsets into a vector. While loading the mapping isn't ideal it does only scale as the number of items and users. On Jun 12, 2011 3:47 AM, "Lance Norskog" <[EMAIL PROTECTED]> wrote: > The RecommenderJob makes a "side" file which maps a fabricated integer > index to a long ItemID. Why is this needed? Couldn't the > RecommenderJob propagate the long ItemID directly? Note that this > forces all instances of AggregateAndReduceRecommender to load the > entire map. Part of the Map/Reduce rules are 'nothing needs to know > everything'. > > Is this a sparse/dense optimization? If so, have the distributed > algorithms advanced enough to make this indirection unnecessary? > > -- > Lance Norskog > [EMAIL PROTECTED]
-
Re: RecommenderJob uses indirection for ItemIDs
Lance Norskog 2011-06-12, 23:26
Ah! So if it was a sparse vector it could be indexed directly. Or the mapping could be with a hash-indexed representation as used with Lucene vectors.
On Sun, Jun 12, 2011 at 3:43 AM, Sean Owen <[EMAIL PROTECTED]> wrote: > The keys have to be hashed to be used as int offsets into a vector. While > loading the mapping isn't ideal it does only scale as the number of items > and users. > On Jun 12, 2011 3:47 AM, "Lance Norskog" <[EMAIL PROTECTED]> wrote: >> The RecommenderJob makes a "side" file which maps a fabricated integer >> index to a long ItemID. Why is this needed? Couldn't the >> RecommenderJob propagate the long ItemID directly? Note that this >> forces all instances of AggregateAndReduceRecommender to load the >> entire map. Part of the Map/Reduce rules are 'nothing needs to know >> everything'. >> >> Is this a sparse/dense optimization? If so, have the distributed >> algorithms advanced enough to make this indirection unnecessary? >> >> -- >> Lance Norskog >> [EMAIL PROTECTED] >
-- Lance Norskog [EMAIL PROTECTED]
-
Re: RecommenderJob uses indirection for ItemIDs
Sean Owen 2011-06-12, 23:28
No all vectors here use int to express dimension. It is nothing to do with sparseness. On Jun 13, 2011 12:26 AM, "Lance Norskog" <[EMAIL PROTECTED]> wrote: > Ah! So if it was a sparse vector it could be indexed directly. Or the > mapping could be with a hash-indexed representation as used with > Lucene vectors. > > On Sun, Jun 12, 2011 at 3:43 AM, Sean Owen <[EMAIL PROTECTED]> wrote: >> The keys have to be hashed to be used as int offsets into a vector. While >> loading the mapping isn't ideal it does only scale as the number of items >> and users. >> On Jun 12, 2011 3:47 AM, "Lance Norskog" <[EMAIL PROTECTED]> wrote: >>> The RecommenderJob makes a "side" file which maps a fabricated integer >>> index to a long ItemID. Why is this needed? Couldn't the >>> RecommenderJob propagate the long ItemID directly? Note that this >>> forces all instances of AggregateAndReduceRecommender to load the >>> entire map. Part of the Map/Reduce rules are 'nothing needs to know >>> everything'. >>> >>> Is this a sparse/dense optimization? If so, have the distributed >>> algorithms advanced enough to make this indirection unnecessary? >>> >>> -- >>> Lance Norskog >>> [EMAIL PROTECTED] >> > > > > -- > Lance Norskog > [EMAIL PROTECTED]
|
|
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by
Sematext