Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Plain View
Mahout, mail # user - Re: Parallel ALS-WR on very large matrix -- crashing (I think)


+
Kate Ericson 2012-02-02, 00:58
+
Ted Dunning 2012-02-02, 01:12
+
Nicholas Kolegraff 2012-02-02, 01:23
+
Kate Ericson 2012-02-02, 01:32
+
Ted Dunning 2012-02-02, 01:44
+
Nicholas Kolegraff 2012-02-02, 02:03
+
Sean Owen 2012-02-02, 08:53
+
Sebastian Schelter 2012-02-02, 09:47
Copy link to this message
-
Re: Parallel ALS-WR on very large matrix -- crashing (I think)
Nicholas Kolegraff 2012-02-02, 16:25
I will up the ante with the time out and report back -- thanks all for the
suggestions

Hey, Sebastian -- Here are the arguments I am using:
--input matrix --output ALS --numFeatures 25 --numIterations 10 --lambda
0.065
When the mapper loads the matrix into memory it only loads the actual
non-zero data, correct?

Hey Ted -- I messed up on the sparsity.  Turns out there are only 70M
non-zero elements.

Oh, and, I only have binary data -- I wasn't sure of the implications with
ALS-WR on binary data -- I couldn't find anything to suggest otherwise.
I am using data of the format user,item,1
I have read about probabilistic factorization -- which works with binary
data -- and perhaps naively, thought ALS-WR was similar so what-the-heck :-)

I'd love nothing more than to share the data, however, I'd probably get in
some trouble :-)
Perhaps I could generate a matrix with a similar distribution? -- I'll have
to check on that and see if it is ok #bureaucracy

Stay tuned...

On Thu, Feb 2, 2012 at 1:47 AM, Sebastian Schelter <[EMAIL PROTECTED]> wrote:

> Nicholas,
>
> can you give us the detailed arguments you start the job with? I'd
> especially be interested in the number of features (--numFeatures) you
> use. Do you use the job with implicit feedback data
> (--implicitFeedback=true)?
>
> The memory requirements of the job are the following:
>
> In each iteration either the item-features matrix (items x features) or
> the user-features matrix (users x features) is loaded into the memory of
> each mapper. Then the original user-item matrix (or its transpose) is
> read row-wise by the mappers and they recompute the features via
>
> AlternatingLeastSquaresSolver/ImplicitFeedbackAlternatingLeastSquaresSolver.
>
> --sebastian
>
>
> On 02.02.2012 09:53, Sean Owen wrote:
> > I have seen this happen in "normal" operation when the sorting on the
> > mapper is taking a long long time, because the output is large. You can
> > tell it to increase the timeout.  If this is what is happening, you won't
> > have a chance to update a counter as a keep-alive ping, but yes that is
> > generally right otherwise. If this is the case it's that a mapper is
> > outputting a whole lot of info, perhaps 'too much'. I don't know for
> sure,
> > just another a guess for the pile.
> >
> > On Thu, Feb 2, 2012 at 1:44 AM, Ted Dunning <[EMAIL PROTECTED]>
> wrote:
> >
> >> Status reporting happens automatically when output is generated.  In a
> long
> >> computation, it is good form to occasionally update a counter or
> otherwise
> >> indicate that the computation is still progressing.
> >>
> >> On Wed, Feb 1, 2012 at 5:23 PM, Nicholas Kolegraff
> >> <[EMAIL PROTECTED]>wrote:
> >>
> >>> Do you know if it should still report status in the midst of a complex
> >>> task?  Seems questionable that it wouldn't just send a friendly hello?
> >>>
> >>>
> >>
> >
>
>
+
Sebastian Schelter 2012-02-02, 16:31
+
Nicholas Kolegraff 2012-02-02, 16:37
+
Sebastian Schelter 2012-02-02, 16:40
+
Nicholas Kolegraff 2012-02-02, 18:56
+
Ken Krugler 2012-02-02, 19:25
+
Nicholas Kolegraff 2012-02-03, 01:48
+
Nicholas Kolegraff 2012-02-09, 02:50