|
|
-
need review for stochastic matrix decomposition doc
Ted Dunning 2010-04-11, 13:49
I am working out the details of a fully scalable MR implementation of the stochastic decomposition and would appreciate some reviews. See https://issues.apache.org/jira/browse/MAHOUT-376 for details.
-
Re: need review for stochastic matrix decomposition doc
Dmitriy Lyubimov 2010-04-12, 04:55
On Sun, Apr 11, 2010 at 6:49 AM, Ted Dunning <[EMAIL PROTECTED]> wrote: > I am working out the details of a fully scalable MR implementation of the > stochastic decomposition and would appreciate some reviews. > > See https://issues.apache.org/jira/browse/MAHOUT-376 for details. > The first step feels a little different from our previous discussion. I remember asking questions like how we get the Q and you seem to have implied we just take AOmega instead of orthonormalizing it (i was wondering where that step would go). For reference, i am attaching summary of our previous discussion which i took liberty of slightly reworking. ( i wanted to test it out using Pig but still did not quite manage to squeeze it into my schedule). Could you please elaborate on qr() proceduer there? If you mean QR decompostion, then Q is commonly thought of as a result of Gramm-Schmidt procedure which is iterative (and hence kinda tough to do without considering other rows of AOmega)
-
Re: need review for stochastic matrix decomposition doc
Ted Dunning 2010-04-12, 05:54
The qr step that I am talking about is, indeed, QR decomposition and would typically be implemented with a sequential local implementation of Gram-Schmidt. The win in the outline that I gave was that I was proposing decomposing independent blocks of the A Omega product rather than the whole thing at once. This block decomposition *is* parallelizable, but it isn't really a QR decomposition of the entire matrix. It does, however, provide an orthonormal basis of A Omega. Since we don't care about the R part of the decomposition anyway, that seems good enough. (let me know if there is somewhere that I could describe the process better to make clear that I am suggesting decomposition of indepdendent blocks) On Sun, Apr 11, 2010 at 9:55 PM, Dmitriy Lyubimov <[EMAIL PROTECTED]> wrote: > > > On Sun, Apr 11, 2010 at 6:49 AM, Ted Dunning <[EMAIL PROTECTED]>wrote: > >> I am working out the details of a fully scalable MR implementation of the >> stochastic decomposition and would appreciate some reviews. >> >> See https://issues.apache.org/jira/browse/MAHOUT-376 for details. >> > > > The first step feels a little different from our previous discussion. I > remember asking questions like how we get the Q and you seem to have implied > we just take AOmega instead of orthonormalizing it (i was wondering where > that step would go). > > For reference, i am attaching summary of our previous discussion which i > took liberty of slightly reworking. ( i wanted to test it out using Pig but > still did not quite manage to squeeze it into my schedule). > > Could you please elaborate on qr() proceduer there? If you mean QR > decompostion, then Q is commonly thought of as a result of Gramm-Schmidt > procedure which is iterative (and hence kinda tough to do without > considering other rows of AOmega) > > >
-
Re: need review for stochastic matrix decomposition doc
Ted Dunning 2010-04-12, 05:54
Can you attach that to the JIRA? Attachments are stripped from the mailing list, I think.
On Sun, Apr 11, 2010 at 9:55 PM, Dmitriy Lyubimov <[EMAIL PROTECTED]> wrote:
> For reference, i am attaching summary of our previous discussion which i > took liberty of slightly reworking.
-
Re: need review for stochastic matrix decomposition doc
Dmitriy Lyubimov 2010-04-12, 06:24
np, if you see any value in it.
On Sun, Apr 11, 2010 at 10:54 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
> Can you attach that to the JIRA? Attachments are stripped from the mailing > list, I think. > > On Sun, Apr 11, 2010 at 9:55 PM, Dmitriy Lyubimov <[EMAIL PROTECTED]> > wrote: > > > For reference, i am attaching summary of our previous discussion which i > > took liberty of slightly reworking. >
-
Re: need review for stochastic matrix decomposition doc
Ted Dunning 2010-04-12, 16:32
I will be attaching Dmitry's pdf.
The basic difference is that I have gone back to a form closer to the original paper to avoid computing an ill-conditioned SVD.
Dmitriy also said:
I guess at this point i don't understand the details of block QR. I guess > i'll need to read up on block QR. I guess you could indeed go into a little > more into details of that qr() call and perhaps give geometry of individual > Yi, Qi and Ri matrices in step 1 ( i assume they should have s x (k+p), s x > ?, ? x (k+p) where s is the block height .) >
There isn't anything magic about the block QR and I doubt you will find any information on the variant that I am using. The basic idea is just that I am doing QR decompositions on row-wise blocks of A Omega.
On Sun, Apr 11, 2010 at 10:54 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
> > Can you attach that to the JIRA? Attachments are stripped from the mailing > list, I think. > > On Sun, Apr 11, 2010 at 9:55 PM, Dmitriy Lyubimov <[EMAIL PROTECTED]>wrote: > >> For reference, i am attaching summary of our previous discussion which i >> took liberty of slightly reworking. > > >
-
Re: need review for stochastic matrix decomposition doc
Dmitriy Lyubimov 2010-04-13, 15:42
Why can't we say s=(k+p) and accumulate (k+p) x (k+p) A*Omega rows at one time in the mapper and make sure we orthogonalize (k+p) rows of Q in one block (by running a stock QR)? That way orthogonalization of the final Q will be quaranteed (although columns of Q would be scaled at approx [m/(k+p)] )?
-Dmitriy
On Mon, Apr 12, 2010 at 9:32 AM, Ted Dunning <[EMAIL PROTECTED]> wrote:
> I will be attaching Dmitry's pdf. > > The basic difference is that I have gone back to a form closer to the > original paper to avoid computing an ill-conditioned SVD. > > Dmitriy also said: > > I guess at this point i don't understand the details of block QR. I guess > > i'll need to read up on block QR. I guess you could indeed go into a > little > > more into details of that qr() call and perhaps give geometry of > individual > > Yi, Qi and Ri matrices in step 1 ( i assume they should have s x (k+p), s > x > > ?, ? x (k+p) where s is the block height .) > > > > There isn't anything magic about the block QR and I doubt you will find any > information on the variant that I am using. The basic idea is just that I > am doing QR decompositions on row-wise blocks of A Omega. > > On Sun, Apr 11, 2010 at 10:54 PM, Ted Dunning <[EMAIL PROTECTED]> > wrote: > > > > > Can you attach that to the JIRA? Attachments are stripped from the > mailing > > list, I think. > > > > On Sun, Apr 11, 2010 at 9:55 PM, Dmitriy Lyubimov <[EMAIL PROTECTED] > >wrote: > > > >> For reference, i am attaching summary of our previous discussion which > i > >> took liberty of slightly reworking. > > > > > > >
-
Re: need review for stochastic matrix decomposition doc
Ted Dunning 2010-04-13, 16:41
What you are suggesting is close to what I am doing.
There are two differences,
1) the number of rows does depend on k+p, but not directly. The number of rows should be as large as practical to make the operations more efficient.
2) Oops. You are right. The columns will need to be renormalized by the n (the number of blocks)
On Tue, Apr 13, 2010 at 8:42 AM, Dmitriy Lyubimov <[EMAIL PROTECTED]> wrote:
> Why can't we say s=(k+p) and accumulate (k+p) x (k+p) A*Omega rows at one > time in the mapper and make sure we orthogonalize (k+p) rows of Q in one > block (by running a stock QR)? That way orthogonalization of the final Q > will be quaranteed (although columns of Q would be scaled at approx > [m/(k+p)] )? > > -Dmitriy > > On Mon, Apr 12, 2010 at 9:32 AM, Ted Dunning <[EMAIL PROTECTED]> > wrote: > > > I will be attaching Dmitry's pdf. > > > > The basic difference is that I have gone back to a form closer to the > > original paper to avoid computing an ill-conditioned SVD. > > > > Dmitriy also said: > > > > I guess at this point i don't understand the details of block QR. I guess > > > i'll need to read up on block QR. I guess you could indeed go into a > > little > > > more into details of that qr() call and perhaps give geometry of > > individual > > > Yi, Qi and Ri matrices in step 1 ( i assume they should have s x (k+p), > s > > x > > > ?, ? x (k+p) where s is the block height .) > > > > > > > There isn't anything magic about the block QR and I doubt you will find > any > > information on the variant that I am using. The basic idea is just that > I > > am doing QR decompositions on row-wise blocks of A Omega. > > > > On Sun, Apr 11, 2010 at 10:54 PM, Ted Dunning <[EMAIL PROTECTED]> > > wrote: > > > > > > > > Can you attach that to the JIRA? Attachments are stripped from the > > mailing > > > list, I think. > > > > > > On Sun, Apr 11, 2010 at 9:55 PM, Dmitriy Lyubimov <[EMAIL PROTECTED] > > >wrote: > > > > > >> For reference, i am attaching summary of our previous discussion > which > > i > > >> took liberty of slightly reworking. > > > > > > > > > > > >
-
Re: need review for stochastic matrix decomposition doc
Dmitriy Lyubimov 2010-04-13, 17:14
I see . to minimize loss of orthogonality. Brilliant.
On Tue, Apr 13, 2010 at 9:41 AM, Ted Dunning <[EMAIL PROTECTED]> wrote:
> What you are suggesting is close to what I am doing. > > There are two differences, > > 1) the number of rows does depend on k+p, but not directly. The number of > rows should be as large as practical to make the operations more efficient. > > 2) Oops. You are right. The columns will need to be renormalized by the n > (the number of blocks) > > On Tue, Apr 13, 2010 at 8:42 AM, Dmitriy Lyubimov <[EMAIL PROTECTED]> > wrote: > > > Why can't we say s=(k+p) and accumulate (k+p) x (k+p) A*Omega rows at one > > time in the mapper and make sure we orthogonalize (k+p) rows of Q in one > > block (by running a stock QR)? That way orthogonalization of the final Q > > will be quaranteed (although columns of Q would be scaled at approx > > [m/(k+p)] )? > > > > -Dmitriy > > > > On Mon, Apr 12, 2010 at 9:32 AM, Ted Dunning <[EMAIL PROTECTED]> > > wrote: > > > > > I will be attaching Dmitry's pdf. > > > > > > The basic difference is that I have gone back to a form closer to the > > > original paper to avoid computing an ill-conditioned SVD. > > > > > > Dmitriy also said: > > > > > > I guess at this point i don't understand the details of block QR. I > guess > > > > i'll need to read up on block QR. I guess you could indeed go into a > > > little > > > > more into details of that qr() call and perhaps give geometry of > > > individual > > > > Yi, Qi and Ri matrices in step 1 ( i assume they should have s x > (k+p), > > s > > > x > > > > ?, ? x (k+p) where s is the block height .) > > > > > > > > > > There isn't anything magic about the block QR and I doubt you will find > > any > > > information on the variant that I am using. The basic idea is just > that > > I > > > am doing QR decompositions on row-wise blocks of A Omega. > > > > > > On Sun, Apr 11, 2010 at 10:54 PM, Ted Dunning <[EMAIL PROTECTED]> > > > wrote: > > > > > > > > > > > Can you attach that to the JIRA? Attachments are stripped from the > > > mailing > > > > list, I think. > > > > > > > > On Sun, Apr 11, 2010 at 9:55 PM, Dmitriy Lyubimov <[EMAIL PROTECTED] > > > >wrote: > > > > > > > >> For reference, i am attaching summary of our previous discussion > > which > > > i > > > >> took liberty of slightly reworking. > > > > > > > > > > > > > > > > > >
-
Re: need review for stochastic matrix decomposition doc
Ted Dunning 2010-04-13, 17:22
I have been testing the in-core version of this without the blocking and the results have been very impressive.
On Tue, Apr 13, 2010 at 10:14 AM, Dmitriy Lyubimov <[EMAIL PROTECTED]>wrote:
> I see . to minimize loss of orthogonality. Brilliant. > > On Tue, Apr 13, 2010 at 9:41 AM, Ted Dunning <[EMAIL PROTECTED]> > wrote: > > > What you are suggesting is close to what I am doing. > > > > There are two differences, > > > > 1) the number of rows does depend on k+p, but not directly. The number > of > > rows should be as large as practical to make the operations more > efficient. > > > > 2) Oops. You are right. The columns will need to be renormalized by the > n > > (the number of blocks) > > > > On Tue, Apr 13, 2010 at 8:42 AM, Dmitriy Lyubimov <[EMAIL PROTECTED]> > > wrote: > > > > > Why can't we say s=(k+p) and accumulate (k+p) x (k+p) A*Omega rows at > one > > > time in the mapper and make sure we orthogonalize (k+p) rows of Q in > one > > > block (by running a stock QR)? That way orthogonalization of the final > Q > > > will be quaranteed (although columns of Q would be scaled at approx > > > [m/(k+p)] )? > > > > > > -Dmitriy > > > > > > On Mon, Apr 12, 2010 at 9:32 AM, Ted Dunning <[EMAIL PROTECTED]> > > > wrote: > > > > > > > I will be attaching Dmitry's pdf. > > > > > > > > The basic difference is that I have gone back to a form closer to the > > > > original paper to avoid computing an ill-conditioned SVD. > > > > > > > > Dmitriy also said: > > > > > > > > I guess at this point i don't understand the details of block QR. I > > guess > > > > > i'll need to read up on block QR. I guess you could indeed go into > a > > > > little > > > > > more into details of that qr() call and perhaps give geometry of > > > > individual > > > > > Yi, Qi and Ri matrices in step 1 ( i assume they should have s x > > (k+p), > > > s > > > > x > > > > > ?, ? x (k+p) where s is the block height .) > > > > > > > > > > > > > There isn't anything magic about the block QR and I doubt you will > find > > > any > > > > information on the variant that I am using. The basic idea is just > > that > > > I > > > > am doing QR decompositions on row-wise blocks of A Omega. > > > > > > > > On Sun, Apr 11, 2010 at 10:54 PM, Ted Dunning <[EMAIL PROTECTED] > > > > > > wrote: > > > > > > > > > > > > > > Can you attach that to the JIRA? Attachments are stripped from the > > > > mailing > > > > > list, I think. > > > > > > > > > > On Sun, Apr 11, 2010 at 9:55 PM, Dmitriy Lyubimov < > [EMAIL PROTECTED] > > > > >wrote: > > > > > > > > > >> For reference, i am attaching summary of our previous discussion > > > which > > > > i > > > > >> took liberty of slightly reworking. > > > > > > > > > > > > > > > > > > > > > > > > >
|
|