Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Lucene, mail # dev - revisit naming for grouping/join?


Copy link to this message
-
Re: revisit naming for grouping/join?
Chris Hostetter 2011-07-04, 19:38

: Maybe modules/nesteddocuments (I think that's more descriptive than
: subdocuments)?

either way ... subdocuments has the advantage of being a shorter directory
name.  

i kinda wonder about first impressions and the entomology of "nested" ...
it makes me think of bird nests and russion dolls, neither of which
really convey the point: nesting in birds is about protecting/incubating
and is only a single layer; while russian nesting dolls are singular
wrappers arround wrappers arround wrappers.

subdocuments seems like it might better because it conveys more of a
hierarchical nature (to me anyway).

: How about NestedDocumentQuery?  And NestedDocumentCollector?
:
: See, you can use NestedDocumentQuery but collect it with any ordinary
: collector if you don't care about the "nesting" (ie, you are only
: interested in matches in the parent document space).  The
: NestedDocumentCollector also collects all the nested docs matching
: each parent hit.

Hmmm...

My suggestion of ParentDocumentQuery was based on the understanding that
the simplest usecase was...

  Query inner = getSomethingThatMatchesSomeChildDocs();
  Filter parents = someFilterThatMatcheAllKnownParentDocs()
  Query outer = new ParentDocumentQuery(inner, parents)
  TopDocs results = searcher.search(outer)

...and in this case "results" will contain the parents of the child
documents that match inner.  is that correct?

if so, then indepenent of the Collector, "ParentDocumentQuery" (or
ParentDocumentQueryWrapper) still seems like it makes the most sense.

For the Collector, i realize now that i totally missunderstood it's api --
for some reason i thought it would wrap another Collector and proxy to the
inner collector only the parents, independently collecting/recording the
groups of parent->children info which could be asked for later.  

"ChildDocumentsCollector" definitely doesn't make ense -- it's not
just collecting children, it's collecting Groups made up of parents
and children ... GroupCollector is obviously too general though ... i
would toss out "ParentChildrenTopGroupCollector" to make it clear that:
  a) what you can get out of it are instances of TopGroups
  b) the Groups consists of Parents and Children

...but that may be trying to convey too much in a classname.  

I certianly wouldn't complain about NestedDocumentCollector or
SubDocumentCollector if people like those better.
-Hoss

---------------------------------------------------------------------