|
|
-
Mahout LanczosSolver explanation
Aniruddha Basak 2012-08-01, 01:00
Hi, I am working on Spectral Kmeans which involves an eigen-decomposition step using Lanczos. As I did not get exact similar results as expected, I tried to understand the implementation. I have one question about "LanczosSolver .java" : In the "solve" method, while building the "tridiagonal matrix" there is a step just after the multiplication job (performed on Hadoop as TimesSquaredJob) as ------------ nextVector.assign(new Scale(1.0 / state.getScaleFactor())); ----------- I could not understand why this scaling is performed?
( When I compared the results on a small matrix to an equivalent Matlab script, I found the results are exactly similar WITHOUT this scaling. Including this scaling makes the results different from the Matlab results. ) Thanks, Aniruddha
-
Re: Mahout LanczosSolver explanation
Ted Dunning 2012-08-01, 01:24
WHy are you using Lanczos? Why not use something more recent?
On Tue, Jul 31, 2012 at 7:00 PM, Aniruddha Basak <[EMAIL PROTECTED]>wrote:
> Hi, > I am working on Spectral Kmeans which involves an eigen-decomposition step > using Lanczos. As I did not get exact similar results as expected, I tried > to understand the > implementation. > I have one question about "LanczosSolver .java" : > In the "solve" method, while building the "tridiagonal matrix" there is a > step just > after the multiplication job (performed on Hadoop as TimesSquaredJob) as > ------------ > nextVector.assign(new Scale(1.0 / state.getScaleFactor())); > ----------- > I could not understand why this scaling is performed? > > ( When I compared the results on a small matrix to an equivalent Matlab > script, > I found the results are exactly similar WITHOUT this scaling. Including > this scaling > makes the results different from the Matlab results. ) > > > Thanks, > Aniruddha > >
-
RE: Mahout LanczosSolver explanation
Aniruddha Basak 2012-08-01, 01:28
I did not choose Lanczos. It found it being used as a part of Spectral-Kmeans implemented in Mahout. Can u please recommend some better alternative of Lanczos.
Thanks, Aniruddha -----Original Message----- From: Ted Dunning [mailto:[EMAIL PROTECTED]] Sent: Tuesday, July 31, 2012 6:24 PM To: [EMAIL PROTECTED] Cc: Jake Mannix Subject: Re: Mahout LanczosSolver explanation
WHy are you using Lanczos? Why not use something more recent?
On Tue, Jul 31, 2012 at 7:00 PM, Aniruddha Basak <[EMAIL PROTECTED]>wrote:
> Hi, > I am working on Spectral Kmeans which involves an eigen-decomposition > step using Lanczos. As I did not get exact similar results as > expected, I tried to understand the implementation. > I have one question about "LanczosSolver .java" : > In the "solve" method, while building the "tridiagonal matrix" there > is a step just after the multiplication job (performed on Hadoop as > TimesSquaredJob) as > ------------ > nextVector.assign(new Scale(1.0 / state.getScaleFactor())); > ----------- > I could not understand why this scaling is performed? > > ( When I compared the results on a small matrix to an equivalent > Matlab script, I found the results are exactly similar WITHOUT this > scaling. Including this scaling makes the results different from the > Matlab results. ) > > > Thanks, > Aniruddha > >
-
Re: Mahout LanczosSolver explanation
Ted Dunning 2012-08-01, 01:34
Stochastic SVD and alternating least squares would both be better, I think.
On Tue, Jul 31, 2012 at 7:28 PM, Aniruddha Basak <[EMAIL PROTECTED]>wrote:
> I did not choose Lanczos. It found it being used as a part of > Spectral-Kmeans > implemented in Mahout. > Can u please recommend some better alternative of Lanczos. > > Thanks, > Aniruddha > > > -----Original Message----- > From: Ted Dunning [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, July 31, 2012 6:24 PM > To: [EMAIL PROTECTED] > Cc: Jake Mannix > Subject: Re: Mahout LanczosSolver explanation > > WHy are you using Lanczos? Why not use something more recent? > > On Tue, Jul 31, 2012 at 7:00 PM, Aniruddha Basak <[EMAIL PROTECTED] > >wrote: > > > Hi, > > I am working on Spectral Kmeans which involves an eigen-decomposition > > step using Lanczos. As I did not get exact similar results as > > expected, I tried to understand the implementation. > > I have one question about "LanczosSolver .java" : > > In the "solve" method, while building the "tridiagonal matrix" there > > is a step just after the multiplication job (performed on Hadoop as > > TimesSquaredJob) as > > ------------ > > nextVector.assign(new Scale(1.0 / state.getScaleFactor())); > > ----------- > > I could not understand why this scaling is performed? > > > > ( When I compared the results on a small matrix to an equivalent > > Matlab script, I found the results are exactly similar WITHOUT this > > scaling. Including this scaling makes the results different from the > > Matlab results. ) > > > > > > Thanks, > > Aniruddha > > > > >
-
Re: Mahout LanczosSolver explanation
Jake Mannix 2012-08-01, 04:02
On Tue, Jul 31, 2012 at 6:00 PM, Aniruddha Basak <[EMAIL PROTECTED]>wrote:
> Hi,**** > > I am working on Spectral Kmeans which involves an eigen-decomposition step > **** > > using Lanczos. As I did not get exact similar results as expected, I tried > to understand the**** > > implementation.**** > > I have one question about “LanczosSolver .java” :**** > > In the “solve” method, while building the “tridiagonal matrix” there is a > step just**** > > after the multiplication job (performed on Hadoop as TimesSquaredJob) as** > ** > > ------------**** > > nextVector.assign(*new* Scale(1.0 / state.getScaleFactor()));**** > > -----------**** > > I could not understand why this scaling is performed? >
It's to help with floating point overflow issues in naive Lanczos implementations: each time you multiply the basis vectors by the matrix^2, it grows roughly like the largest singular of the matrix (squared). Do this a fifty or sixty times and you blow over 10^308 and you're screwed. Scaling the basis vectors down after each iteration is equivalent to normalizing the input matrix to have smaller singular values (uniformly), and turns floating point overflow problems into underflow problems. Thankfully, when you have entries in your basis vectors which are close to zero, it's ok to have them be *actually* zero. Unless you do it to all of them, naturally.
Either way, uniform scaling doesn't change the output singular vectors, so the result is unchanged if you don't end up running into floating point precision issues. > **** > > **(When I compared the results on a small matrix to an equivalent Matlab > script, > > ** > > I found the results are exactly similar WITHOUT this scaling. Including > this scaling**** > > makes the results different from the Matlab results. )**** > > ** ** > > ** ** > > Thanks,**** > > Aniruddha**** > > ** ** >
--
-jake
|
|