Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Mahout, mail # dev - Dirichlet - NormalModel.pdf() calculation problem


Copy link to this message
-
Dirichlet - NormalModel.pdf() calculation problem
Derek O'Callaghan 2010-09-21, 17:57
Hi Jeff,

I mentioned this issue in my last mail to the CDbw thread, but I thought
I'd create a separate thread for it as it's a different problem
(although similar).

When s0 is 1, NormalModel.computeParameters() will set stdDev to
Double.MIN_VALUE. However, this causes a problem in subsequent calls to
pdf() from DirichletState.adjustedProbability() . In such a case, the
call to "double sd2 = stdDev * stdDev;" will set sd2 to 0, which causes
pdf() to return NaN. This means that the call to
UncommonDistribution.rMultinom() will return 0, and so (I think) all
subsequent points will be assigned to cluster 0.

FYI I was able to workaround this by changing the following in
NormalModel.pdf():

return ex / (stdDev * SQRT2PI);

to:

double pdf = ex / (stdDev * SQRT2PI);
if (Double.isNaN(pdf)) {
      pdf = 0.0;
}
return pdf;
As you mentioned in the other thread,
AbstractCluster.computeParameters() will also set the radius to
Double.MIN_VALUE when s0 is 1, although I'm not sure if that's used
anywhere that'll cause a similar problem as in pdf() above.
Derek