: In the BM25 case, scores would decrease in some situations with very
: high TF values because of floating point issues, e.g. so
: score(freq=100,000) would be unexpectedly less than
: score(freq=99,999), all other things being equal. There may be other
: ways to re-arrange the code to avoid this problem, feel free to open
: an issue if you can optimize the code better while still behaving
: properly!

i don't have any idea how to optimize the current code, and I am
completley willing to believe the changes in LUCENE-7997 are an
improvement in terms of correctness -- which is certainly more important
then performance -- I just wanted to point out that Alan's observation
about LUCENE-8018 being the only commit around the time the performance
graphs dip wasn't accurate before anyone started ripping their hair out
trying to explain it.

If you think the float/double math in LUCENE-7997 might explain the change
in mike's graphs, then maybe mike can annotate them to record that?

(Wild spit balling idea: would be worth while to offer an
"ImpreciseBM25Similarity" that used floats instead of doubles for people
who want to eek out every lsat bit of performance -- provided it was
heavily documented with caveats regarding inaccurate scores due to
rounding errors?)
-Hoss
http://www.lucidworks.com/

---------------------------------------------------------------------
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB