: We are using Solr 7.1.0 to index a database of addresses.  We have found
: that our index size increases massively when we add one extra field to
: the index, even though that field is stored and not indexed, and doesn’t

what about docValues?

: When we run an index load without the problematic field present, the
: Solr index size is 5.5GB.  When we add the field into the index, the
: size grows to 13.3GB.  The field itself is a maximum of 46 characters in
: length and on average is 19 characters. We have ~14,000,000 rows in
: total to index of which only ~200,000 have this field present at all
: (i.e. not null in database).  Given that we don’t want to index the
: field, only store it I would have thought (perhaps naively) that the
: storage increase would be approximately 200,000 * 19 = 3.8M bytes =
: 3.6MB rather than the 7.5GB we are seeing.

if the field has docValues enabled, then there will be some overhead for
every doc in the index -- even the ones that don't have a value in this
field.  (allthough i'd still be very suprised if it accounted for 7G)

: - The problematic field is created through the API as follows:
:
:   curl -X POST -H 'Content-type:application/json' --data-binary '{
:     "add-field":{
:       "name":"buildingName",
:       "type":"string",
:       "stored":true,
:       "indexed":false
:     }
:   }' http://localhost:8983/solr/address/schema

...that's going to cause the field to inherit any (non-overridden)
settings from the fieldType "string" -- in the 7.1 _default configset,
"string" is defined with docValues="true"

You can see *all* properties set on a field -- regardless of wether they
are set on the fieldType, or are implicit hardcoded defaults in the
implementation of the fieldType via the 'showDefaults=true' Schema API
option.

Consider these API examples from the techproducts demo...

$ curl 'http://localhost:8983/solr/techproducts/schema/fields/cat'
{
  "responseHeader":{
    "status":0,
    "QTime":0},
  "field":{
    "name":"cat",
    "type":"string",
    "multiValued":true,
    "indexed":true,
    "stored":true}}

$ curl 'http://localhost:8983/solr/techproducts/schema/fields/cat?showDefaults=true'
{
  "responseHeader":{
    "status":0,
    "QTime":0},
  "field":{
    "name":"cat",
    "type":"string",
    "indexed":true,
    "stored":true,
    "docValues":false,
    "termVectors":false,
    "termPositions":false,
    "termOffsets":false,
    "termPayloads":false,
    "omitNorms":true,
    "omitTermFreqAndPositions":true,
    "omitPositions":false,
    "storeOffsetsWithPositions":false,
    "multiValued":true,
    "large":false,
    "sortMissingLast":true,
    "required":false,
    "tokenized":false,
    "useDocValuesAsStored":true}}

-Hoss
http://www.lucidworks.com/
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB