Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Plain View
Mahout, mail # user - Re: Parallel ALS-WR on very large matrix -- crashing (I think)


+
Kate Ericson 2012-02-02, 00:58
+
Ted Dunning 2012-02-02, 01:12
+
Nicholas Kolegraff 2012-02-02, 01:23
+
Kate Ericson 2012-02-02, 01:32
+
Ted Dunning 2012-02-02, 01:44
+
Nicholas Kolegraff 2012-02-02, 02:03
+
Sean Owen 2012-02-02, 08:53
+
Sebastian Schelter 2012-02-02, 09:47
+
Nicholas Kolegraff 2012-02-02, 16:25
+
Sebastian Schelter 2012-02-02, 16:31
+
Nicholas Kolegraff 2012-02-02, 16:37
+
Sebastian Schelter 2012-02-02, 16:40
+
Nicholas Kolegraff 2012-02-02, 18:56
Copy link to this message
-
Re: Parallel ALS-WR on very large matrix -- crashing (I think)
Ken Krugler 2012-02-02, 19:25
Hi Nicholas,

On Feb 2, 2012, at 10:56am, Nicholas Kolegraff wrote:

> Ok, I took a bit deeper look into this having changed some parameters and
> kicked off the new job..
>
> Seems plausible that I didn't have enough memory for some of the mappers --
> unless I'm missing something here.
> An upper bound on the memory would be (assuming my original parameter of 25
> features)
> 8Mil * 25 Features = 200Mil
> (multiply by 8 bytes assuming double precision floating point) and we get:
> 1.6billion
> 1.6B / (1024^3) = ~1.5GB memory needed
>
> The tasktracker heapsize and datanode heap sizes were only set to: 1GB

The memory you need for this task is based on the mapped.child.java.opts setting (the -Xmx setting), not what's allocated for the NameNode, JobTracker, DataNode or TaskTracker.

In fact increasing the DataNode & TaskTracker sizes removes memory that could/should be used by the child JVMs that the TaskTracker creates to run your map & reduce tasks.

Currently it looks like you have 4GB allocated for m2.2xlarge tasks, which should be sufficient given your analysis above.

-- Ken

>
> So I have changed the bootstrap action on EC2 as follows (this is a diff
> between the original and the changes I made)
> # Parameters of the array:
> # [mapred.child.java.opts, mapred.tasktracker.map.tasks.maximum,
> mapred.tasktracker.reduce.tasks.maximum]
> 29c29
> <   "m2.2xlarge"  => ["-Xmx4096m", "6",  "2"],
> ---
>>  "m2.2xlarge"  => ["-Xmx8192m", "3",  "2"],
> # Parameters of the array (Vars modified in hadoop.env.sh)
> # [HADOOP_JOBTRACKER_HEAPSIZE, HADOOP_NAMENODE_HEAPSIZE,
> HADOOP_TASKTRACKER_HEAPSIZE, HADOOP_DATANODE_HEAPSIZE]
> 47c47
> <   "m2.2xlarge"  => ["2048", "8192", "1024", "1024"],
> ---
>>  "m2.2xlarge"  => ["4096", "16384", "2048", "2048"]
>
>
>
> On Thu, Feb 2, 2012 at 8:40 AM, Sebastian Schelter <[EMAIL PROTECTED]> wrote:
>
>> Hmm, are you sure that the mappers have enough memory? You can set that
>> via Dmapred.child.java.opts=-Xmx[some number]m
>>
>> --sebastian
>>
>> On 02.02.2012 17:37, Nicholas Kolegraff wrote:
>>> Sounds good. Thanks Sebastian
>>>
>>> The interesting thing is -- I tried to sample the matrix down one time to
>>> about 10% of non-zeros -- and worked no problem.
>>>
>>> On Thu, Feb 2, 2012 at 8:31 AM, Sebastian Schelter <[EMAIL PROTECTED]>
>> wrote:
>>>
>>>> Your parameters look good, except if you have binary data, you should
>>>> set --implicitFeedback=true. You could also set numFeatures to a very
>>>> small value (like 5) just to see if that helps.
>>>>
>>>> The mappers load one of the feature matrices into memory which are dense
>>>> (#items x #features entries or #users x #features entries). Are you sure
>>>> that the mappers have enough memory for that?
>>>>
>>>> It's really strange that you have problems with such small data, I
>>>> tested this with Netflix (> 100M non-zeros) on a few machines and it
>>>> worked quite well.
>>>>
>>>> --sebastian
>>>>
>>>>
>>>>
>>>> On 02.02.2012 17:25, Nicholas Kolegraff wrote:
>>>>> I will up the ante with the time out and report back -- thanks all for
>>>> the
>>>>> suggestions
>>>>>
>>>>> Hey, Sebastian -- Here are the arguments I am using:
>>>>> --input matrix --output ALS --numFeatures 25 --numIterations 10
>> --lambda
>>>>> 0.065
>>>>> When the mapper loads the matrix into memory it only loads the actual
>>>>> non-zero data, correct?
>>>>>
>>>>> Hey Ted -- I messed up on the sparsity.  Turns out there are only 70M
>>>>> non-zero elements.
>>>>>
>>>>> Oh, and, I only have binary data -- I wasn't sure of the implications
>>>> with
>>>>> ALS-WR on binary data -- I couldn't find anything to suggest otherwise.
>>>>> I am using data of the format user,item,1
>>>>> I have read about probabilistic factorization -- which works with
>> binary
>>>>> data -- and perhaps naively, thought ALS-WR was similar so
>> what-the-heck
>>>> :-)
>>>>>
>>>>> I'd love nothing more than to share the data, however, I'd probably get
>>>> in
>>>>> some trouble :-)
>>
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr
+
Nicholas Kolegraff 2012-02-03, 01:48
+
Nicholas Kolegraff 2012-02-09, 02:50