|
Grzegorz Tańczyk
2012-03-08, 09:30
Michael McCandless
2012-03-08, 11:12
Grzegorz Tańczyk
2012-03-08, 12:22
Grzegorz Tańczyk
2012-03-08, 12:23
Michael McCandless
2012-03-09, 11:06
Grzegorz Tańczyk
2012-03-09, 13:52
Michael McCandless
2012-03-09, 17:19
|
-
BlockGroupingCollector, not always getting first documentGrzegorz Tańczyk 2012-03-08, 09:30
Hello,
I am using BlockGroupingCollector for first time and I have some small problem with it. Indexing code is pretty much copy of the one from docs. Searching looks like this: Filter groupEndFilter = new CachingWrapperFilter(new QueryWrapperFilter(new TermQuery(new Term("last", "true")))); ... BlockGroupingCollector c = new BlockGroupingCollector(SORT_SCORE, offset + n, false, groupEndFilter); searcher.search(query, filter, c); TopGroups groups = c.getTopGroups(SORT_ID, offset, 0, 1, true); if (groups != null) { results.total_hits = groups.totalGroupCount.intValue(); for (int i = 0; i < groups.groups.length; i++) if (groups.groups[i].totalHits > 0) results.add(getResult(searcher, groups.groups[i].scoreDocs[0])); } So I want to get top groups for given query with documents sorted by their IDs. For some reason I don't always get first document from group. It's like every 10th group of search results does not have document with lowest ID on first position in scoreDocs. ID is numeric field. Sorting groups by field values works fine. Documents also are sorted by their IDs during indexing and I'm adding them as block. What am I doing wrong? -- Regards, Grzegorz
-
Re: BlockGroupingCollector, not always getting first documentMichael McCandless 2012-03-08, 11:12
Hmm... that doesn't sound good.
Is the issue repeatable once it happens? And, when it happens, can you verify that the index is corrrect (eg, the missing doc is retrievable by non-grouped searches)? This way we can isolate the issue to the search-side. Can you boil it down to a small test case? Mike McCandless http://blog.mikemccandless.com On Thu, Mar 8, 2012 at 4:30 AM, Grzegorz Tańczyk <[EMAIL PROTECTED]> wrote: > Hello, > > I am using BlockGroupingCollector for first time and I have some small > problem with it. Indexing code is pretty much copy of the one from docs. > Searching looks like this: > > Filter groupEndFilter = new CachingWrapperFilter(new > QueryWrapperFilter(new TermQuery(new Term("last", "true")))); > ... > BlockGroupingCollector c = new > BlockGroupingCollector(SORT_SCORE, offset + n, false, groupEndFilter); > searcher.search(query, filter, c); > TopGroups groups = c.getTopGroups(SORT_ID, offset, 0, 1, > true); > if (groups != null) { > results.total_hits = groups.totalGroupCount.intValue(); > for (int i = 0; i < groups.groups.length; i++) > if (groups.groups[i].totalHits > 0) > results.add(getResult(searcher, > groups.groups[i].scoreDocs[0])); > } > > So I want to get top groups for given query with documents sorted by their > IDs. For some reason I don't always get first document from group. It's like > every 10th group of search results does not have document with lowest ID on > first position in scoreDocs. > ID is numeric field. Sorting groups by field values works fine. > Documents also are sorted by their IDs during indexing and I'm adding them > as block. > > What am I doing wrong? > > -- > Regards, > Grzegorz ---------------------------------------------------------------------
-
Re: Re: BlockGroupingCollector, not always getting first documentGrzegorz Tańczyk 2012-03-08, 12:22
Hello,
Thanks for reply, I can find first document from group using non grouping search. To be sure about this I deleted index and indexed only first 100 groups which gives around 2300 documents and I see the problem on at least half of groups. No problem in finding first documents normally. I noticed this problem first when I had indexed few thousands groups. When I index everything(15k groups, which means around 200k documents, commit every 500 groups) the problem is no more or at least I can't find any group with non first document in scoreDocs[0]. I'm reindexing it since morning, I will reindex it once again to be sure about this one. I'm not Lucene internals expert, but maybe this problem is somehow connected to segment merging? Some additional info: I'm using Lucene 3.5.0. Sort: public final static Sort SORT_ID = new Sort(new SortField("id_n", SortField.INT)); Adding field to document: doc.add(new NumericField("id_n", Store.NO, true).setIntValue(rs.getInt(1))); (I checked how it works with Store.YES, it didn't change anything.) I also call searcher.setDefaultFieldSortScoring(true, true) before grouping search. Calling optimize() also didn't help(but anyway I wouldn't use this method even if it was the solution for this problem ;-) ) Index writer config has default settings. For now I'm using workaround, but I'm looking forward to finding solution of this problem. W dniu 2012-03-08 12:12, Michael McCandless pisze: > Hmm... that doesn't sound good. > > Is the issue repeatable once it happens? And, when it happens, can > you verify that the index is corrrect (eg, the missing doc is > retrievable by non-grouped searches)? This way we can isolate the > issue to the search-side. > > Can you boil it down to a small test case? ---------------------------------------------------------------------
-
Re: Re: BlockGroupingCollector, not always getting first documentGrzegorz Tańczyk 2012-03-08, 12:23
Hello,
Thanks for reply, I can find first document from group using non grouping search. To be sure about this I deleted index and indexed only first 100 groups which gives around 2300 documents and I see the problem on at least half of groups. No problem in finding first documents normally. I noticed this problem first when I had indexed few thousands groups. When I index everything(15k groups, which means around 200k documents, commit every 500 groups) the problem is no more or at least I can't find any group with non first document in scoreDocs[0]. I'm reindexing it since morning, I will reindex it once again to be sure about this one. I'm not Lucene internals expert, but maybe this problem is somehow connected to segment merging? Some additional info: I'm using Lucene 3.5.0. Sort: public final static Sort SORT_ID = new Sort(new SortField("id_n", SortField.INT)); Adding field to document: doc.add(new NumericField("id_n", Store.NO, true).setIntValue(rs.getInt(1))); (I checked how it works with Store.YES, it didn't change anything.) I also call searcher.setDefaultFieldSortScoring(true, true) before grouping search. Calling optimize() also didn't help(but anyway I wouldn't use this method even if it was the solution for this problem ) Index writer config has default settings. For now I'm using workaround, but I'm looking forward to finding solution of this problem. W dniu 2012-03-08 12:12, Michael McCandless pisze: > Hmm... that doesn't sound good. > > Is the issue repeatable once it happens? And, when it happens, can > you verify that the index is corrrect (eg, the missing doc is > retrievable by non-grouped searches)? This way we can isolate the > issue to the search-side. > > Can you boil it down to a small test case?
-
Re: Re: BlockGroupingCollector, not always getting first documentMichael McCandless 2012-03-09, 11:06
On Thu, Mar 8, 2012 at 7:22 AM, Grzegorz Tańczyk
<[EMAIL PROTECTED]> wrote: > Hello, > > Thanks for reply, I can find first document from group using non grouping > search. OK, so the index seems ok. > To be sure about this I deleted index and indexed only first 100 groups > which gives around 2300 documents and I see the problem on at least half of > groups. No problem in finding first documents normally. > I noticed this problem first when I had indexed few thousands groups. Hmm. > When I index everything(15k groups, which means around 200k documents, > commit every 500 groups) the problem is no more or at least I can't find any > group with non first document in scoreDocs[0]. I'm reindexing it since > morning, I will reindex it once again to be sure about this one. Weird that the full index doesn't show the issue but the partial index does. > I'm not Lucene internals expert, but maybe this problem is somehow connected > to segment merging? Well, a simple way to test this is to use set NoMergePolicy on the IndexWriterConfig. > Some additional info: > > I'm using Lucene 3.5.0. > > Sort: > public final static Sort SORT_ID = new Sort(new SortField("id_n", > SortField.INT)); > > Adding field to document: > doc.add(new NumericField("id_n", Store.NO, true).setIntValue(rs.getInt(1))); > > (I checked how it works with Store.YES, it didn't change anything.) > > I also call searcher.setDefaultFieldSortScoring(true, true) before grouping > search. If you don't call this, is the issue still there? > Calling optimize() also didn't help(but anyway I wouldn't use this method > even if it was the solution for this problem ;-) ) OK. Did calling optimize() change which docs were missing...? > Index writer config has default settings. Are you doing any deleteDocuments or updateDocument calls? > For now I'm using workaround, but I'm looking forward to finding solution of > this problem. Wait, what's the workaround? I noticed you pass maxDocsPerGroup=1; if you increase that (eg to 10) does it change the bug...? Is it possible to boil this down to a small test case? Mike McCandless http://blog.mikemccandless.com ---------------------------------------------------------------------
-
Re: Re: Re: BlockGroupingCollector, not always getting first documentGrzegorz Tańczyk 2012-03-09, 13:52
Hello,
I found the problem and it was my misunderstanding. I didn't get first documents in every group, because some of head documents didn't match given query. I made a wrong assumption that I can sort between all documents within group. -- Regards, Grzegorz ---------------------------------------------------------------------
-
Re: Re: Re: BlockGroupingCollector, not always getting first documentMichael McCandless 2012-03-09, 17:19
Phew, thanks for bringing closure!
Mike McCandless http://blog.mikemccandless.com On Fri, Mar 9, 2012 at 8:52 AM, Grzegorz Tańczyk <[EMAIL PROTECTED]> wrote: > Hello, > > I found the problem and it was my misunderstanding. I didn't get first > documents in every group, because some of head documents didn't match given > query. I made a wrong assumption that I can sort between all documents > within group. > > -- > Regards, > Grzegorz > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- |