Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Mahout, mail # user - Re: Parallel ALS-WR on very large matrix -- crashing (I think)


Copy link to this message
-
Re: Parallel ALS-WR on very large matrix -- crashing (I think)
Nicholas Kolegraff 2012-02-02, 18:56
Ok, I took a bit deeper look into this having changed some parameters and
kicked off the new job..

Seems plausible that I didn't have enough memory for some of the mappers --
unless I'm missing something here.
An upper bound on the memory would be (assuming my original parameter of 25
features)
8Mil * 25 Features = 200Mil
(multiply by 8 bytes assuming double precision floating point) and we get:
1.6billion
1.6B / (1024^3) = ~1.5GB memory needed

The tasktracker heapsize and datanode heap sizes were only set to: 1GB

So I have changed the bootstrap action on EC2 as follows (this is a diff
between the original and the changes I made)
# Parameters of the array:
# [mapred.child.java.opts, mapred.tasktracker.map.tasks.maximum,
mapred.tasktracker.reduce.tasks.maximum]
29c29
<   "m2.2xlarge"  => ["-Xmx4096m", "6",  "2"],
---
>   "m2.2xlarge"  => ["-Xmx8192m", "3",  "2"],
# Parameters of the array (Vars modified in hadoop.env.sh)
# [HADOOP_JOBTRACKER_HEAPSIZE, HADOOP_NAMENODE_HEAPSIZE,
HADOOP_TASKTRACKER_HEAPSIZE, HADOOP_DATANODE_HEAPSIZE]
47c47
<   "m2.2xlarge"  => ["2048", "8192", "1024", "1024"],
---
>   "m2.2xlarge"  => ["4096", "16384", "2048", "2048"]

On Thu, Feb 2, 2012 at 8:40 AM, Sebastian Schelter <[EMAIL PROTECTED]> wrote:

> Hmm, are you sure that the mappers have enough memory? You can set that
> via Dmapred.child.java.opts=-Xmx[some number]m
>
> --sebastian
>
> On 02.02.2012 17:37, Nicholas Kolegraff wrote:
> > Sounds good. Thanks Sebastian
> >
> > The interesting thing is -- I tried to sample the matrix down one time to
> > about 10% of non-zeros -- and worked no problem.
> >
> > On Thu, Feb 2, 2012 at 8:31 AM, Sebastian Schelter <[EMAIL PROTECTED]>
> wrote:
> >
> >> Your parameters look good, except if you have binary data, you should
> >> set --implicitFeedback=true. You could also set numFeatures to a very
> >> small value (like 5) just to see if that helps.
> >>
> >> The mappers load one of the feature matrices into memory which are dense
> >> (#items x #features entries or #users x #features entries). Are you sure
> >> that the mappers have enough memory for that?
> >>
> >> It's really strange that you have problems with such small data, I
> >> tested this with Netflix (> 100M non-zeros) on a few machines and it
> >> worked quite well.
> >>
> >> --sebastian
> >>
> >>
> >>
> >> On 02.02.2012 17:25, Nicholas Kolegraff wrote:
> >>> I will up the ante with the time out and report back -- thanks all for
> >> the
> >>> suggestions
> >>>
> >>> Hey, Sebastian -- Here are the arguments I am using:
> >>> --input matrix --output ALS --numFeatures 25 --numIterations 10
> --lambda
> >>> 0.065
> >>> When the mapper loads the matrix into memory it only loads the actual
> >>> non-zero data, correct?
> >>>
> >>> Hey Ted -- I messed up on the sparsity.  Turns out there are only 70M
> >>> non-zero elements.
> >>>
> >>> Oh, and, I only have binary data -- I wasn't sure of the implications
> >> with
> >>> ALS-WR on binary data -- I couldn't find anything to suggest otherwise.
> >>> I am using data of the format user,item,1
> >>> I have read about probabilistic factorization -- which works with
> binary
> >>> data -- and perhaps naively, thought ALS-WR was similar so
> what-the-heck
> >> :-)
> >>>
> >>> I'd love nothing more than to share the data, however, I'd probably get
> >> in
> >>> some trouble :-)
> >>> Perhaps I could generate a matrix with a similar distribution? -- I'll
> >> have
> >>> to check on that and see if it is ok #bureaucracy
> >>>
> >>> Stay tuned...
> >>>
> >>> On Thu, Feb 2, 2012 at 1:47 AM, Sebastian Schelter <[EMAIL PROTECTED]>
> >> wrote:
> >>>
> >>>> Nicholas,
> >>>>
> >>>> can you give us the detailed arguments you start the job with? I'd
> >>>> especially be interested in the number of features (--numFeatures) you
> >>>> use. Do you use the job with implicit feedback data
> >>>> (--implicitFeedback=true)?
> >>>>
> >>>> The memory requirements of the job are the following:
> >>>>
> >>>> In each iteration either the item-features matrix (items x features)