Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Mahout, mail # user - Singular Value Decomposition does not return correct eigenvalues and -vectors


Copy link to this message
-
Re: Singular Value Decomposition does not return correct eigenvalues and -vectors
Ted Dunning 2011-09-29, 12:36
As well, the stochastic projection algorithms in general have little to
recommend them for full rank decompositions.  It may help a tiny bit to
start with a QR decomposition, but the internals of an in-memory
implementation should handle that.

If you are doing a partial decomposition, however, the random projection
stuff is great for in-memory decompositions as well as out-of-core or
map-reduce implementations.

On Thu, Sep 29, 2011 at 12:08 PM, Dmitriy Lyubimov <[EMAIL PROTECTED]>wrote:

> Also, like Ted said, using k= full rank is the same as running regular in
> core method, both memory and could wise.
>
> This method is for computing thin svd (k + p not to exceed perhaps 1000 for
> practical purposes ) on otherwise really large inputs.
>
> Please beware of misuse.
>
> -Dmitriy
> On Sep 28, 2011 1:32 PM, "Markus Holtermann" <[EMAIL PROTECTED]>
> wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > Hey guys.
> >
> > Thank you for all the information about Singular Value Decomposition.
> > The Lanczos algorithm seems to be a bad choice for small matrices. But
> > the Stochastic SVD with k = full rank and p = 0 (thanks Dmitriy
> > Lyubimov for implementing that) works fine.
> >
> > So far, Markus
> >
> > On 09/23/2011 08:46 PM, Dmitriy Lyubimov wrote:
> >> I already fixed full rank (p =0) on the trunk. It was just an
> >> invalid assertion, the algorithm isn't limiting that. So k=3 p=0
> >> should be ok now in the trunk. On Sep 23, 2011 8:34 PM, "Ted
> >> Dunning" <[EMAIL PROTECTED]> wrote:
> >>> Markus,
> >>>
> >>> Try testing on a 20x20 matrix if you want to use p>0. The issue
> >>> is that this is an approximation algorithm that works for
> >>> reasonably high
> >> dimension.
> >>> 3 is not reasonably high. 20 is probably marginal.
> >>>
> >>> On Fri, Sep 23, 2011 at 4:42 PM, Dmitriy Lyubimov
> >>> <[EMAIL PROTECTED] wrote:
> >>>
> >>>> oh, ok, apparently you need to use p>0.
> >>>>
> >>>> but then there's a problem that ther's k+p >=m (input height)
> >>>> requirement so I guess this is a corner case i did not account
> >>>> for.
> >>>>
> >>>> you can use k=2 and p=1 and caveat is that even though 3
> >>>> singular values will be computed, only 2 of them will be saved.
> >>>> this solver always assumes "thin" decomposition requirement\s,
> >>>> although distinction is purely technical, it is only a matter a
> >>>> patch to enable p=0.
> >>>>
> >>>> It is only a case because your input so small. In practice,
> >>>> input is much "longer" than k+p rows so it hasn't come up as an
> >>>> issue. Point is, it will not do full rank decomposition with
> >>>> small matrices; but then, you don't want to use it with small
> >>>> matrices :)
> >>>>
> >>>> alhough i can engineer a patch to allow p=0 and full rank
> >>>> decompositions for short wide matrices if it is that
> >>>> important.
> >>>>
> >>>> -dmitriy
> >>>>
> >>>> On Fri, Sep 23, 2011 at 3:42 PM, Markus Holtermann
> >>>> <[EMAIL PROTECTED]> wrote:
> >>>>> Thank you for all your responses.
> >>>>>
> >>>>> ref. Dan Brickley: ------------------ hopefully you did dream
> >>>>> ;-)
> >>>>>
> >>>>> ref. Dmitriy Lyubimov: ---------------------- When I run
> >>>>> `mahout ssvd -i A.seq -o A-ssvd/ -k 3 -p 0` I get an
> >>>>> IllegalArgumentException. You can find the traceback at
> >>>>> http://paste.pocoo.org/show/481168/ .
> >>>>>
> >>>>> ref. Ted Dunning: ----------------- I am running the M/R
> >>>>> version of SVD in local mode. I didn't install Hadoop except
> >>>>> what is coming via `mvn install`. If I understand the code
> >>>>> correctly, the `--inMemory` argument is only relevant for the
> >>>>> "EigenVerificationJob" -- I didn't run that.
> >>>>>
> >>>>> Here are the latest results for the calculations as described
> >>>>> in my previous mail:
> >>>>>
> >>>>> For 1: Key class: class org.apache.hadoop.io.IntWritable
> >>>>> Value Class: class org.apache.mahout.math.VectorWritable Key: