|
|
-
What does "out of order" mean?
Alexander Veit 2009-11-27, 10:51
Hi,
The documentation of org.apache.lucene.search.Collector uses the obscure term "out of order". What does "order" mean? The natural order of document IDs, a scoring order, or some other order?
-- Cheers, Alex
---------------------------------------------------------------------
+
Alexander Veit 2009-11-27, 10:51
-
Re: What does "out of order" mean?
Michael McCandless 2009-11-27, 11:07
It refers to the order in which the docIDs are delivered to your Collector.
"Normally" they are always delivered in increasing order.
However, some queries (well, currently only certain BooleanQuery cases) can achieve substantial search speedup if they are allowed to deliver docIDs to your collector out of order. In this case, docs are processed in batches (chunks of 1024 docIDs at once), and within a batch you may receive docIDs out of order.
Many collectors don't mind getting docIDs out of order, and so it's important to return "true" from your acceptDocsOutOfOrder method so Lucene can allow BooleanQuery to run faster.
Mike
On Fri, Nov 27, 2009 at 5:51 AM, Alexander Veit <[EMAIL PROTECTED]> wrote: > Hi, > > The documentation of org.apache.lucene.search.Collector uses the obscure > term "out of order". What does "order" mean? The natural order of document > IDs, a scoring order, or some other order? > > -- > Cheers, > Alex > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >
---------------------------------------------------------------------
+
Michael McCandless 2009-11-27, 11:07
-
Re: What does "out of order" mean?
Stefan Trcek 2009-11-27, 13:13
On Friday 27 November 2009 12:07:07 Michael McCandless wrote: > Re: What does "out of order" mean? > > It refers to the order in which the docIDs are delivered to your > Collector. > > "Normally" they are always delivered in increasing order. > > However, some queries (well, currently only certain BooleanQuery > cases) can achieve substantial search speedup if they are allowed to > deliver docIDs to your collector out of order. In this case, docs > are processed in batches (chunks of 1024 docIDs at once), and within > a batch you may receive docIDs out of order. > > Many collectors don't mind getting docIDs out of order, and so it's > important to return "true" from your acceptDocsOutOfOrder method so > Lucene can allow BooleanQuery to run faster.
Can this paragraph go to the docs? May be I missed it, but I stumpled upon "out of order" and "in order" several times and wasn't sure what will be the consequence of the decision. Not even sure what will be the "don't care" case.
I like "don't care" options like "Version.LUCENE_CURRENT" very much. It allows the library to do the best if I don't care.
Stefan
---------------------------------------------------------------------
+
Stefan Trcek 2009-11-27, 13:13
-
Re: What does "out of order" mean?
Michael McCandless 2009-11-27, 13:49
On Fri, Nov 27, 2009 at 8:13 AM, Stefan Trcek <[EMAIL PROTECTED]> wrote: > On Friday 27 November 2009 12:07:07 Michael McCandless wrote: >> Re: What does "out of order" mean? >> >> It refers to the order in which the docIDs are delivered to your >> Collector. >> >> "Normally" they are always delivered in increasing order. >> >> However, some queries (well, currently only certain BooleanQuery >> cases) can achieve substantial search speedup if they are allowed to >> deliver docIDs to your collector out of order. In this case, docs >> are processed in batches (chunks of 1024 docIDs at once), and within >> a batch you may receive docIDs out of order. >> >> Many collectors don't mind getting docIDs out of order, and so it's >> important to return "true" from your acceptDocsOutOfOrder method so >> Lucene can allow BooleanQuery to run faster. > > Can this paragraph go to the docs?
OK I just committed this to the javadocs. Thanks!
> May be I missed it, but I stumpled upon "out of order" and "in order" > several times and wasn't sure what will be the consequence of the > decision. Not even sure what will be the "don't care" case. > > I like "don't care" options like "Version.LUCENE_CURRENT" very much. > It allows the library to do the best if I don't care.
Right, this is in general an important effect of the Version.LUCENE_CURRENT option -- you give Lucene the freedom to 1) fix bugs from past versions and 2) improve defaults for settings for better out-of-the-box performance.
But if precise back compat is important to your app, so important that you want newer versions of Lucene to emulate the bugs of past releases, then you set the Version to a specific release (eg Version.LUCENE_24).
For this particualr setting (in- vs out-of-order docIDs during collection), Lucene's core collectors (that sort by relevance score, and by field values) are carefully picked depending on whether the query itself would like to score docIDs out of order. We do this because there's a small performance gain for these collectors if they know the docIDs will arrive in order.
So the "don't care" equivalent here is to use IndexSearcher's normal search APIs (ie, we don't use Version to switch this on or off).
Mike
---------------------------------------------------------------------
+
Michael McCandless 2009-11-27, 13:49
-
Re: What does "out of order" mean?
Stefan Trcek 2009-11-27, 15:21
On Friday 27 November 2009 14:49:07 Michael McCandless wrote: > > So the "don't care" equivalent here is to use IndexSearcher's normal > search APIs (ie, we don't use Version to switch this on or off).
Thanks for the hint. For an unknown reason I once fell into the "search(query, filter, collector)" method. I see that I can do that simpler with "search(Query, Filter, int, Sort)".
Stefan
---------------------------------------------------------------------
+
Stefan Trcek 2009-11-27, 15:21
-
Re: What does "out of order" mean?
Stefan Trcek 2009-11-30, 11:16
On Friday 27 November 2009 14:49:07 Michael McCandless wrote: > So the "don't care" equivalent here is to use IndexSearcher's normal > search APIs (ie, we don't use Version to switch this on or off).
Hmm - Searcher/IndexSearchers search methods are "Low level", "Expert", "Expert + low level" or return a TopDocs/TopFieldDocs object, which itself claimes to be "Expert". I appreciate the labeling but I guess the road to go is somewhat hidden.
Stefan
---------------------------------------------------------------------
+
Stefan Trcek 2009-11-30, 11:16
-
Re: What does "out of order" mean?
Michael McCandless 2009-11-30, 13:24
I agree, it's silly we label things like TopDocs/TopFieldDocs as expert -- they are no longer for "low level" APIs (or, perhaps since we've removed the "high level" API (= Hits), what remains should no longer be considered low level).
Do you wanna cough up a patch to correct these?
Mike
On Mon, Nov 30, 2009 at 6:16 AM, Stefan Trcek <[EMAIL PROTECTED]> wrote: > On Friday 27 November 2009 14:49:07 Michael McCandless wrote: >> So the "don't care" equivalent here is to use IndexSearcher's normal >> search APIs (ie, we don't use Version to switch this on or off). > > Hmm - Searcher/IndexSearchers search methods are "Low > level", "Expert", "Expert + low level" or return a TopDocs/TopFieldDocs > object, which itself claimes to be "Expert". > I appreciate the labeling but I guess the road to go is somewhat hidden. > > Stefan > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >
---------------------------------------------------------------------
+
Michael McCandless 2009-11-30, 13:24
-
Re: What does "out of order" mean?
Stefan Trcek 2009-11-30, 17:22
On Monday 30 November 2009 14:24:20 Michael McCandless wrote: > I agree, it's silly we label things like TopDocs/TopFieldDocs as > expert -- they are no longer for "low level" APIs (or, perhaps since > we've removed the "high level" API (= Hits), what remains should no > longer be considered low level). > > Do you wanna cough up a patch to correct these? I'd do, but was not successful to get the svn repo some months ago. I have to claim the sys admin for any svn repo to open a door through the firewall. Gave up due to $ nmap -p3690 svn.apache.org PORT STATE SERVICE 3690/tcp filtered unknown But I got the git repo at http://git.apache.org/lucene.git/That works out of the box. So the remaining hurdle is to create the patches. If there is no way to accept git patches (see attachment, they are somewhat different) I'd try to setup the git-svn bridge locally, just to create the patches. Stefan
+
Stefan Trcek 2009-11-30, 17:22
-
Re: What does "out of order" mean?
Michael McCandless 2009-11-30, 17:42
I was able to apply that git patch just fine -- so I think it'll work? Thanks! Mike On Mon, Nov 30, 2009 at 12:22 PM, Stefan Trcek <[EMAIL PROTECTED]> wrote: > On Monday 30 November 2009 14:24:20 Michael McCandless wrote: >> I agree, it's silly we label things like TopDocs/TopFieldDocs as >> expert -- they are no longer for "low level" APIs (or, perhaps since >> we've removed the "high level" API (= Hits), what remains should no >> longer be considered low level). >> >> Do you wanna cough up a patch to correct these? > > I'd do, but was not successful to get the svn repo some months ago. I > have to claim the sys admin for any svn repo to open a door through the > firewall. Gave up due to > > $ nmap -p3690 svn.apache.org > PORT STATE SERVICE > 3690/tcp filtered unknown > > But I got the git repo at http://git.apache.org/lucene.git/> That works out of the box. So the remaining hurdle is to create the > patches. If there is no way to accept git patches (see attachment, they > are somewhat different) I'd try to setup the git-svn bridge locally, > just to create the patches. > > Stefan > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > ---------------------------------------------------------------------
+
Michael McCandless 2009-11-30, 17:42
-
Re: What does "out of order" mean?
Stefan Trcek 2009-12-01, 09:52
On Monday 30 November 2009 18:42:50 Michael McCandless wrote: > I was able to apply that git patch just fine -- so I think it'll > work?
Good to hear it works that simple. This patch completes the task. It is a "two file" patch, so if this will work too, I'm confident.
Stefan
> On Mon, Nov 30, 2009 at 12:22 PM, Stefan Trcek <[EMAIL PROTECTED]> wrote: > > On Monday 30 November 2009 14:24:20 Michael McCandless wrote: > >> I agree, it's silly we label things like TopDocs/TopFieldDocs as > >> expert -- they are no longer for "low level" APIs (or, perhaps > >> since we've removed the "high level" API (= Hits), what remains > >> should no longer be considered low level). > >> > >> Do you wanna cough up a patch to correct these?
+
Stefan Trcek 2009-12-01, 09:52
-
Re: What does "out of order" mean?
Michael McCandless 2009-12-01, 10:07
OK -- none of IndexSearcher's search methods needed tweaking? Just TopDocs/TopFieldDocs?
Mike
On Tue, Dec 1, 2009 at 4:52 AM, Stefan Trcek <[EMAIL PROTECTED]> wrote: > On Monday 30 November 2009 18:42:50 Michael McCandless wrote: >> I was able to apply that git patch just fine -- so I think it'll >> work? > > Good to hear it works that simple. > This patch completes the task. > It is a "two file" patch, so if this will work too, I'm confident. > > Stefan > >> On Mon, Nov 30, 2009 at 12:22 PM, Stefan Trcek <[EMAIL PROTECTED]> > wrote: >> > On Monday 30 November 2009 14:24:20 Michael McCandless wrote: >> >> I agree, it's silly we label things like TopDocs/TopFieldDocs as >> >> expert -- they are no longer for "low level" APIs (or, perhaps >> >> since we've removed the "high level" API (= Hits), what remains >> >> should no longer be considered low level). >> >> >> >> Do you wanna cough up a patch to correct these? > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] >
---------------------------------------------------------------------
+
Michael McCandless 2009-12-01, 10:07
-
Re: What does "out of order" mean?
Stefan Trcek 2009-12-01, 10:31
On Tuesday 01 December 2009 11:07:41 Michael McCandless wrote: > OK -- none of IndexSearcher's search methods needed tweaking? Just > TopDocs/TopFieldDocs?
Yes, you can use these methods in Searcher, they are sufficient:
TopDocs Searcher.search(Query query, Filter filter, int n) TopFieldDocs Searcher.search(Query query, Filter filter, int n, Sort sort)
Stefan
---------------------------------------------------------------------
+
Stefan Trcek 2009-12-01, 10:31
-
Re: What does "out of order" mean?
Michael McCandless 2009-12-01, 11:33
Super, thanks. I'll commit your patch, fixing javadocs for TopDocs/TopFieldDocs.
Mike
On Tue, Dec 1, 2009 at 5:31 AM, Stefan Trcek <[EMAIL PROTECTED]> wrote: > On Tuesday 01 December 2009 11:07:41 Michael McCandless wrote: >> OK -- none of IndexSearcher's search methods needed tweaking? Just >> TopDocs/TopFieldDocs? > > Yes, you can use these methods in Searcher, they are sufficient: > > TopDocs Searcher.search(Query query, Filter filter, int n) > TopFieldDocs Searcher.search(Query query, Filter filter, int n, Sort > sort) > > Stefan > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >
---------------------------------------------------------------------
+
Michael McCandless 2009-12-01, 11:33
-
Re: What does "out of order" mean?
Michael McCandless 2009-12-01, 13:15
OK I committed this, plus further removes of "expert" from TopDocs, to trunk (future 3.1), 2.9 and 3.0 branches.
Thanks!
Mike
On Tue, Dec 1, 2009 at 5:31 AM, Stefan Trcek <[EMAIL PROTECTED]> wrote: > On Tuesday 01 December 2009 11:07:41 Michael McCandless wrote: >> OK -- none of IndexSearcher's search methods needed tweaking? Just >> TopDocs/TopFieldDocs? > > Yes, you can use these methods in Searcher, they are sufficient: > > TopDocs Searcher.search(Query query, Filter filter, int n) > TopFieldDocs Searcher.search(Query query, Filter filter, int n, Sort > sort) > > Stefan > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >
---------------------------------------------------------------------
+
Michael McCandless 2009-12-01, 13:15
-
Re: What does "out of order" mean?
Nick Burch 2009-11-30, 17:51
On Mon, Nov 30, 2009 at 12:22 PM, Stefan Trcek <[EMAIL PROTECTED]> wrote: > I'd do, but was not successful to get the svn repo some months ago. I > have to claim the sys admin for any svn repo to open a door through the > firewall. Gave up due to > > $ nmap -p3690 svn.apache.org > PORT STATE SERVICE > 3690/tcp filtered unknown Apache svn doesn't use the svnserve protocol, it uses plain old HTTP (or HTTPS for committers), so you only need port 80 access, and that should be open everywhere. You can get the svn url, and the appropriate commandline, from: http://lucene.apache.org/java/docs/developer-resources.htmlNick
+
Nick Burch 2009-11-30, 17:51
-
Re: What does "out of order" mean?
Stefan Trcek 2009-12-01, 14:11
On Monday 30 November 2009 18:51:34 Nick Burch wrote: > On Mon, Nov 30, 2009 at 12:22 PM, Stefan Trcek <[EMAIL PROTECTED]> wrote: > > I'd do, but was not successful to get the svn repo some months ago. > > I have to claim the sys admin for any svn repo to open a door > > through the firewall. Gave up due to > > > > $ nmap -p3690 svn.apache.org > > PORT STATE SERVICE > > 3690/tcp filtered unknown > > Apache svn doesn't use the svnserve protocol, it uses plain old HTTP > (or HTTPS for committers), so you only need port 80 access, and that > should be open everywhere. > > You can get the svn url, and the appropriate commandline, from: > http://lucene.apache.org/java/docs/developer-resources.htmlThanks, I got it and it works. I just talked to a collegue who is familiar with svn: the patch format of git and svn seems to be the same. As I've never used svn, but use git regularly I can stay with git. Stefan ---------------------------------------------------------------------
+
Stefan Trcek 2009-12-01, 14:11
|
|