Chiming in on this old thread with a summary of our findings:

- when we added one multi-jvm server to the cluster, Elasticsearch did tread each JVM as a "regular" node and we ended up with 4x the data on this new server.
- to combat this, we used the [watermark]( setting to prevent too many shards from being allocated to the new node

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB