Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Mahout, mail # user - RecommenderJob and NaN


Copy link to this message
-
Re: RecommenderJob and NaN
Ted Dunning 2011-10-13, 20:14
Usage within AWS is a neighborly thing to do.

But yes, Amazon donates this bandwidth.

On Thu, Oct 13, 2011 at 8:11 PM, Lance Norskog <[EMAIL PROTECTED]> wrote:

> Is the Apache public download bandwidth donated by Amazon? Or should we try
> to keep usage within AWS?
>
> On Thu, Oct 13, 2011 at 3:47 AM, Grant Ingersoll <[EMAIL PROTECTED]
> >wrote:
>
> >
> > On Oct 13, 2011, at 4:01 AM, Sebastian Schelter wrote:
> >
> > > Grant,
> > >
> > > Can you share a little more details about the results, do you get any
> > > exceptions? Or do you just get no results?
> >
> > No results.
> >
> > >
> > > Using the NaNs inside the similarity matrix vectors has been included
> in
> > > the job for a very long time and should not cause any problems. As Sean
> > > already mentioned we have unit tests with toy data that should catch
> the
> > > very obvious errors in this code.
> >
> > Yeah, I don't know what happened.  I know I was getting results as little
> > as two weeks ago.  I will try rolling back to an earlier commit.
> >
> > >
> > > Can you share the dataset? I can do a testrun on my research cluster.
> >
> > I already have earlier in this thread.  There is a small set via the link
> > below or you can use the ASF email public dataset on Amazon or any subset
> of
> > it.
> >
> >
> > >
> > > --sebastian
> > >
> > > On 13.10.2011 08:37, Sean Owen wrote:
> > >> RecommenderJob? The unit tests run it all the time.
> > >> There should not be any glitches with static variables -- don't think
> > >> there are any.
> > >>
> > >> On Thu, Oct 13, 2011 at 7:33 AM, Lance Norskog <[EMAIL PROTECTED]>
> > wrote:
> > >>> Is this job working well for anyone now?
> > >>> When was the last time this job worked for someone?
> > >>>
> > >>> On Wed, Oct 12, 2011 at 11:30 AM, Grant Ingersoll <
> [EMAIL PROTECTED]
> > >wrote:
> > >>>
> > >>>> Both local and on EC2
> > >>>>
> > >>>> On Oct 12, 2011, at 2:10 PM, Ken Krugler wrote:
> > >>>>
> > >>>>> Hi Grant,
> > >>>>>
> > >>>>> Just curious, are you running this locally or distributed?
> > >>>>>
> > >>>>> I'd run into a similar issue, though in a completely different
> > algorithm
> > >>>> (Jimmy Lin's PageRank implementation) due to the use of a static
> > variable.
> > >>>>>
> > >>>>> When running locally, this wasn't getting cleared between loops,
> and
> > thus
> > >>>> I got wonky results.
> > >>>>>
> > >>>>> The same thing would have happened with JVM reuse enabled.
> > >>>>>
> > >>>>> -- Ken
> > >>>>>
> > >>>>> On Oct 12, 2011, at 3:28pm, Grant Ingersoll wrote:
> > >>>>>
> > >>>>>> Digging some more:
> > >>>>>>
> > >>>>>> In AggregateAndRecommend, around lines 143, I have, for userId 0,
> a
> > >>>> simColumn of:
> > >>>>>>
> > >>>>
> >
> {22966:0.9566912651062012,81901:0.9566912651062012,263375:0.9566912651062012,263374:0.9566912651062012,263376:NaN}
> > >>>>>>
> > >>>>>> Which then becomes the numerator and the denom.
> > >>>>>>
> > >>>>>> Looping, my next simCol is:
> > >>>>>>
> > >>>>
> >
> {22966:0.9566912651062012,81901:0.9566912651062012,263375:NaN,263374:0.9566912651062012,263376:0.9566912651062012}
> > >>>>>>
> > >>>>>> and then
> > >>>>>>
> > >>>>
> >
> {22966:0.9566912651062012,81901:0.9566912651062012,263375:0.9566912651062012,263374:NaN,263376:0.9566912651062012}
> > >>>>>>
> > >>>>>> ...
> > >>>>>>
> > >>>>>> Each time, those are getting added into the numerators/denoms
> value,
> > >>>> such that by the time we are done looping (line 161), we have:
> > >>>>>> numerators: {22966:NaN,81901:NaN,263376:NaN,263375:NaN,263374:NaN}
> > >>>>>> denoms: {22966:NaN,81901:NaN,263376:NaN,263375:NaN,263374:NaN}
> > >>>>>>
> > >>>>>> numberOfSimilarItemsUsed:
> > >>>> {81901:5.0,22966:5.0,263376:5.0,263375:5.0,263374:5.0}
> > >>>>>>
> > >>>>>> Not sure on how to interpret this as I haven't dug into the math
> > here
> > >>>> yet or figured out where those NaN are coming from originally.
> > >>>>>>
> > >>>>>> On Oct 11, 2011, at 2:55 PM, Grant Ingersoll wrote: