-Re: Solr v3.5.0 - numFound changes when paging through results on 8-shard cluster
Chris Hostetter 2012-06-19, 21:40
: Confirming that there are no active records being written, the "numFound"
: value is decreasing as we page through the results.
1) check that the "clones" of each shard are in fact identical (just look
at the index files on each machine and make sure they are the same.
2) distributed searching relies heavily on using a uniqeuKey, and can
behave oddly if documents with identical keys exist in multiple shards.
If i remember correctly, what you are describing sounds like one of the
things that can hapen if you violate the uniqueKey rule across differnet
shards when indexing.
I *think* what you are seeing is that in the distributed request for
page#1 the coordinator sums up the numFound from all shards, and merges
results 1-$rows acording to the sort, likewise for pages 2 & 3 when you
get to page #4, it suddenly sees that doc#9876543 is included in hte
responses from 3 diff shards, and it subtracts 2 from the numFound, and so
on as you page farther through the results. the more documents with
duplicate uniqueKeys it find in the results as it pages through, the lower
the cumulative numFound gets.
: For example,
: Page1 - numFound = 3683
: Page2 - numFound = 3683
: Page3 - numFound = 3683
: Page4 - numFound = 2866
: Page5 - numFound = 2419
: Page5 - numFound = 1898
: Page6 - numFound = 1898
: PageN - numFound = 1898