|
|
-
Re: Question about LUCENE-3097 - Post Group FacetingMartijn v Groningen 2011-08-06, 13:27
The facet result for field productType will show the following count:
BOOK: 1 DVD: 0 So yes, because of post group faceting you'll miss the second facet. This is basically the same example I described in LUCENE-3097. I've also described three ways of calculating facet counts in combination grouping. The third way which I've named matrix counts (field value & group value combination) would give the result that you expect. However this isn't implemented yet. In Solr this would require changes in the FacetComponent. I hope this explains it a bit! Martijn On 5 August 2011 16:28, Joshua Harness <[EMAIL PROTECTED]> wrote: > Martin - > > Thanks for the reply. I understand your answer about the segments. > However, I'm still cloudy about faceting with respect to the group head. > Perhaps an example will clarify my confusion. Suppose I have 3 order > documents with the following data: > > *orderNumber: 1 > customerNumber: 1 > totalInCents: 1500 > productType: 'BOOK' > > orderNumber: 2 > customerNumber: 1 > totalInCents: 500 > productType: 'BOOK' > > orderNumber: 3 > customerNumber: 1 > totalInCents: 1000 > productType: 'DVD' > > * > > * *Imagine I perform a search for items greater than or equal to 1000 > cents grouped by customer number. I would expect to get order numbers 1 and > 3 back grouped underneath customer id. Lets assume that order number 1 is > considered the most relevant document (in your scenario). Will the post > group faceting miss that I actually have two facet values for productType: > BOOK and DVD? > > Thanks! > > Josh > > > On Fri, Aug 5, 2011 at 4:22 AM, Martijn v Groningen < > [EMAIL PROTECTED]> wrote: > >> Hi Josh, >> >> For post grouping the documents don't need to reside in the same segment. >> Lucene's grouping module has a collector (TermAllGroupHeadsCollector) that >> can >> collect the most relevant document for each group (GroupHead). This >> collector can produce a int[] or a FixedBitSet that can be used during >> faceting to produce >> post group facets (patch in SOLR-2665 uses this). During faceting only the >> the groupheads are known, because of this field values that are different in >> documents >> less relevant than the most relevant document of a group aren't taken into >> account. This is the same as in example described in the description of >> LUCENE-3097. >> Hope this helps! >> >> Martijn >> >> >> On 4 August 2011 22:59, Joshua Harness <[EMAIL PROTECTED]> wrote: >> >>> Hello - >>> >>> Please let me know if this question is more appropriate of the user >>> list. I had assumed the developer list was more appropriate since the ticket >>> is still open. I was analyzing the comments on LUCENE-3097<https://issues.apache.org/jira/browse/LUCENE-3097>and had a couple of questions. >>> >>> A comment<https://issues.apache.org/jira/browse/LUCENE-3097?focusedCommentId=13033953&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13033953>started a small thread that mentioned that all documents in a given group >>> would need to be contiguous and in the same segment. Also - a statement was >>> made that ' The app would have to ensure this'. I was unclear the result of >>> this conversation. It sounded like maybe this could have turned out to not >>> be the case. What is the status of this? Does my application have to ensure >>> all the documents in the group are in the same segment? How would one >>> accomplish this? >>> >>> Another comment<https://issues.apache.org/jira/browse/LUCENE-3097?focusedCommentId=13038297&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13038297>mentioned that 'we pick only the head doc...as long as the head doc is >>> guaranteed to have the same value for field X, it safe to use that doc to >>> represent the entire group for facet counting'. Does this mean that there >>> is a restriction placed on me that the head document must have field values >>> that match the rest of the documents in the same group? Or is this simply an Met vriendelijke groet, Martijn van Groningen |