|
Artem Chereisky
2010-01-13, 23:16
Andrew C. Smith
2010-01-14, 01:09
Digy
2010-01-14, 17:21
Nicholas Paldino [.NET/C#...
2010-01-14, 18:09
Digy
2010-01-14, 19:44
Nicholas Paldino [.NET/C#...
2010-01-14, 20:24
Granroth, Neal V.
2010-01-14, 20:37
Nicholas Paldino [.NET/C#...
2010-01-14, 21:35
Digy
2010-01-14, 22:16
Nicholas Paldino [.NET/C#...
2010-01-14, 23:10
Franklin Simmons
2010-01-14, 23:32
Michael Garski
2010-01-15, 00:03
Ben Martz
2010-01-14, 23:54
Michael Garski
2010-01-14, 23:56
Ben Martz
2010-01-15, 00:54
Ciaran Roarty
2010-01-14, 21:16
Todd Carrico
2010-01-14, 21:18
Nicholas Paldino [.NET/C#...
2010-01-14, 18:10
Franklin Simmons
2010-01-14, 19:02
Todd Carrico
2010-01-14, 17:23
|
-
at least one docArtem Chereisky 2010-01-13, 23:16
Hi,
Given a boolean query and/or a filter, what is the best way to see if there is at least one matching document? I tried a simple hit collector which sets a flag on the first Collect method. Ideally I would want to stop collecting at that point but I couldn't find a way of doing that. I also tried: TopDocs docs = _searcher.Search(query, filter, 1), but it seems to iterate through all matches as docs.totalHits is set the the actual number of matches. So, is there a better way Regards, Art +
Artem Chereisky 2010-01-13, 23:16
-
Re: at least one docAndrew C. Smith 2010-01-14, 01:09
One things is the index searcher may not show new documents added. You will
have to re-open the index searcher. Also take a look at the following document: http://wiki.apache.org/lucene-java/ImproveSearchingSpeed On Wed, Jan 13, 2010 at 6:16 PM, Artem Chereisky <[EMAIL PROTECTED]>wrote: > Hi, > > Given a boolean query and/or a filter, what is the best way to see if there > is at least one matching document? > > I tried a simple hit collector which sets a flag on the first Collect > method. Ideally I would want to stop collecting at that point but I > couldn't > find a way of doing that. > I also tried: TopDocs docs = _searcher.Search(query, filter, 1), but it > seems to iterate through all matches as docs.totalHits is set the the > actual > number of matches. > > So, is there a better way > > Regards, > Art > +
Andrew C. Smith 2010-01-14, 01:09
-
RE: at least one docDigy 2010-01-14, 17:21
The formal way is throwing exception in the HitCollector.Collect to stop
iteration. DIGY -----Original Message----- From: Artem Chereisky [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 1:16 AM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: at least one doc Hi, Given a boolean query and/or a filter, what is the best way to see if there is at least one matching document? I tried a simple hit collector which sets a flag on the first Collect method. Ideally I would want to stop collecting at that point but I couldn't find a way of doing that. I also tried: TopDocs docs = _searcher.Search(query, filter, 1), but it seems to iterate through all matches as docs.totalHits is set the the actual number of matches. So, is there a better way Regards, Art +
Digy 2010-01-14, 17:21
-
RE: at least one docNicholas Paldino [.NET/C#... 2010-01-14, 18:09
Wow, that's just... Horrible from a design perspective. Doesn't
matter which language it's implemented in. - Nick -----Original Message----- From: Digy [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 12:22 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc The formal way is throwing exception in the HitCollector.Collect to stop iteration. DIGY -----Original Message----- From: Artem Chereisky [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 1:16 AM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: at least one doc Hi, Given a boolean query and/or a filter, what is the best way to see if there is at least one matching document? I tried a simple hit collector which sets a flag on the first Collect method. Ideally I would want to stop collecting at that point but I couldn't find a way of doing that. I also tried: TopDocs docs = _searcher.Search(query, filter, 1), but it seems to iterate through all matches as docs.totalHits is set the the actual number of matches. So, is there a better way Regards, Art +
Nicholas Paldino [.NET/C#... 2010-01-14, 18:09
-
RE: at least one docDigy 2010-01-14, 19:44
I also have thought many times why HitCollector.Collect doesn't return a
boolean value indicating no more results are needed. Maybe, a small performance increment. DIGY -----Original Message----- From: Nicholas Paldino [.NET/C# MVP] [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 8:09 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc Wow, that's just... Horrible from a design perspective. Doesn't matter which language it's implemented in. - Nick -----Original Message----- From: Digy [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 12:22 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc The formal way is throwing exception in the HitCollector.Collect to stop iteration. DIGY -----Original Message----- From: Artem Chereisky [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 1:16 AM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: at least one doc Hi, Given a boolean query and/or a filter, what is the best way to see if there is at least one matching document? I tried a simple hit collector which sets a flag on the first Collect method. Ideally I would want to stop collecting at that point but I couldn't find a way of doing that. I also tried: TopDocs docs = _searcher.Search(query, filter, 1), but it seems to iterate through all matches as docs.totalHits is set the the actual number of matches. So, is there a better way Regards, Art +
Digy 2010-01-14, 19:44
-
RE: at least one docNicholas Paldino [.NET/C#... 2010-01-14, 20:24
DIGY,
In .NET it's more than small. Throwing and catching an exception (which is required here) is orders of magnitude slower than just returning a value. It has to do with the stack unwind and restoration, and I'm sure it's similar in Java. - Nick -----Original Message----- From: Digy [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 2:44 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc I also have thought many times why HitCollector.Collect doesn't return a boolean value indicating no more results are needed. Maybe, a small performance increment. DIGY -----Original Message----- From: Nicholas Paldino [.NET/C# MVP] [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 8:09 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc Wow, that's just... Horrible from a design perspective. Doesn't matter which language it's implemented in. - Nick -----Original Message----- From: Digy [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 12:22 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc The formal way is throwing exception in the HitCollector.Collect to stop iteration. DIGY -----Original Message----- From: Artem Chereisky [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 1:16 AM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: at least one doc Hi, Given a boolean query and/or a filter, what is the best way to see if there is at least one matching document? I tried a simple hit collector which sets a flag on the first Collect method. Ideally I would want to stop collecting at that point but I couldn't find a way of doing that. I also tried: TopDocs docs = _searcher.Search(query, filter, 1), but it seems to iterate through all matches as docs.totalHits is set the the actual number of matches. So, is there a better way Regards, Art +
Nicholas Paldino [.NET/C#... 2010-01-14, 20:24
-
RE: at least one docGranroth, Neal V. 2010-01-14, 20:37
Experience shows otherwise.
- Neal -----Original Message----- From: Nicholas Paldino [.NET/C# MVP] [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 2:24 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc DIGY, In .NET it's more than small. Throwing and catching an exception (which is required here) is orders of magnitude slower than just returning a value. It has to do with the stack unwind and restoration, and I'm sure it's similar in Java. - Nick -----Original Message----- From: Digy [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 2:44 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc I also have thought many times why HitCollector.Collect doesn't return a boolean value indicating no more results are needed. Maybe, a small performance increment. DIGY -----Original Message----- From: Nicholas Paldino [.NET/C# MVP] [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 8:09 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc Wow, that's just... Horrible from a design perspective. Doesn't matter which language it's implemented in. - Nick -----Original Message----- From: Digy [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 12:22 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc The formal way is throwing exception in the HitCollector.Collect to stop iteration. DIGY -----Original Message----- From: Artem Chereisky [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 1:16 AM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: at least one doc Hi, Given a boolean query and/or a filter, what is the best way to see if there is at least one matching document? I tried a simple hit collector which sets a flag on the first Collect method. Ideally I would want to stop collecting at that point but I couldn't find a way of doing that. I also tried: TopDocs docs = _searcher.Search(query, filter, 1), but it seems to iterate through all matches as docs.totalHits is set the the actual number of matches. So, is there a better way Regards, Art +
Granroth, Neal V. 2010-01-14, 20:37
-
RE: at least one docNicholas Paldino [.NET/C#... 2010-01-14, 21:35
Neal,
With all due respect, you are wrong. Here is a program which demonstrates why (.NET 3.5/C# 3.0 required, but can easily be downconverted to 2.0 if necessary): using System; using System.Diagnostics; namespace ExceptionTest { class Program { static void Main(string[] args) { Console.WriteLine("TryParse (no exception) (ticks/iteration): {0:000.0000}", TestTryParse()); Console.WriteLine("Parse (exception) (ticks/iteration): {0:000.0000}", TestParse()); } const int Iterations = 10000; static double Test(Func<bool> test) { // Do a garbage collection, make them both start // from the same baseline. GC.Collect(); // Create the stopwatch. Stopwatch s = new Stopwatch(); s.Start(); // Iterate. for (int index = 0; index < Iterations; index++) { // Perform the action. test(); } // Return the result. This is the elapsed ticks divided // by the iterations. return s.ElapsedTicks / (double) Iterations; } // Invalid parse string. const string InvalidParseString = "10240223 This will not parse"; static double TestTryParse() { // Call the test method. return Test(() => { // The output value. int val; // Try and parse. return Int32.TryParse(InvalidParseString, out val); }); } static double TestParse() { // Call the test method. return Test(() => { // The value. int val; // Wrap in a try catch. try { // Parse. val = Int32.Parse(InvalidParseString); // Return true. return true; } catch (Exception) { return false; } }); } } } On my machine, using a return value over a caught exception takes ~000.5902 ticks per iteration. For throwing an exception and catching it, it takes ~168.4368 ticks per iteration. The return value is about 285 times faster, and you will see this on any machine (and qualifies the statement "orders of magnitude" since it is at least two orders of magnitude faster). Your experience probably tells you that in a one-off situation, it's not noticeable, and you are right, it isn't (noticeable that is), but that doesn't mean that it's not significantly faster than other options that are available. - Nick -----Original Message----- From: Granroth, Neal V. [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 3:37 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc Experience shows otherwise. - Neal -----Original Message----- From: Nicholas Paldino [.NET/C# MVP] [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 2:24 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc DIGY, In .NET it's more than small. Throwing and catching an exception (which is required here) is orders of magnitude slower than just returning a value. It has to do with the stack unwind and restoration, and I'm sure it's similar in Java. - Nick -----Original Message----- From: Digy [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 2:44 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc I also have thought many times why HitCollector.Collect doesn't return a boolean value indicating no more results are needed. Maybe, a small performance increment. DIGY From: Nicholas Paldino [.NET/C# MVP] [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 8:09 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc Wow, that's just... Horrible from a design perspective. Doesn't matter which language it's implemented in. - Nick From: Digy [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 12:22 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc The formal way is throwing exception in the HitCollector.Collect to stop iteration. DIGY From: Artem Chereisky [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 1:16 AM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: at least one doc Hi, Given a boolean query and/or a filter, what is the best way to see if there is at least one matching document? I tried a simple hit collector which sets a flag on the first Collect method. Ideally I would want to stop collecting at that point but I couldn't find a way of doing that. I also tried: TopDocs docs = _searcher.Search(query, filter, 1), but it seems to iterate through all matches as docs.totalHits is set the the actual number of matches. So, is there a better way Regards, Art +
Nicholas Paldino [.NET/C#... 2010-01-14, 21:35
-
RE: at least one docDigy 2010-01-14, 22:16
The problem is not the cost of "throwing an exception" vs. "returning a
value". If you pass a MaxDoc# to HitCollector, you have an option "not to read" the docs from the index exceeding the Maxdoc#. The costly part of this process is "reader.Document(doc)". The problem arises if you have a millions of results: You have to iterate all over the results although you want to have top MaxDox# docs. The real performance problem is, which one is better: an empty iteration of millions of times or an exception (say at 100 or 500)? DIGY From: Nicholas Paldino [.NET/C# MVP] [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 11:35 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc Neal, With all due respect, you are wrong. Here is a program which demonstrates why (.NET 3.5/C# 3.0 required, but can easily be downconverted to 2.0 if necessary): using System; using System.Diagnostics; namespace ExceptionTest { class Program { static void Main(string[] args) { Console.WriteLine("TryParse (no exception) (ticks/iteration): {0:000.0000}", TestTryParse()); Console.WriteLine("Parse (exception) (ticks/iteration): {0:000.0000}", TestParse()); } const int Iterations = 10000; static double Test(Func<bool> test) { // Do a garbage collection, make them both start // from the same baseline. GC.Collect(); // Create the stopwatch. Stopwatch s = new Stopwatch(); s.Start(); // Iterate. for (int index = 0; index < Iterations; index++) { // Perform the action. test(); } // Return the result. This is the elapsed ticks divided // by the iterations. return s.ElapsedTicks / (double) Iterations; } // Invalid parse string. const string InvalidParseString = "10240223 This will not parse"; static double TestTryParse() { // Call the test method. return Test(() => { // The output value. int val; // Try and parse. return Int32.TryParse(InvalidParseString, out val); }); } static double TestParse() { // Call the test method. return Test(() => { // The value. int val; // Wrap in a try catch. try { // Parse. val = Int32.Parse(InvalidParseString); // Return true. return true; } catch (Exception) { return false; } }); } } } On my machine, using a return value over a caught exception takes ~000.5902 ticks per iteration. For throwing an exception and catching it, it takes ~168.4368 ticks per iteration. The return value is about 285 times faster, and you will see this on any machine (and qualifies the statement "orders of magnitude" since it is at least two orders of magnitude faster). Your experience probably tells you that in a one-off situation, it's not noticeable, and you are right, it isn't (noticeable that is), but that doesn't mean that it's not significantly faster than other options that are available. - Nick -----Original Message----- From: Granroth, Neal V. [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 3:37 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc Experience shows otherwise. - Neal -----Original Message----- From: Nicholas Paldino [.NET/C# MVP] [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 2:24 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc DIGY, In .NET it's more than small. Throwing and catching an exception (which is required here) is orders of magnitude slower than just returning a value. It has to do with the stack unwind and restoration, and I'm sure it's similar in Java. - Nick From: Digy [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 2:44 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc I also have thought many times why HitCollector.Collect doesn't return a boolean value indicating no more results are needed. Maybe, a small performance increment. DIGY From: Nicholas Paldino [.NET/C# MVP] [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 8:09 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc Wow, that's just... Horrible from a design perspective. Doesn't matter which language it's implemented in. - Nick From: Digy [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 12:22 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc The formal way is throwing exception in the HitCollector.Collect to stop iteration. DIGY From: Artem Chereisky [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 1:16 AM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: at least one doc Hi, Given a boolean query and/or a filter, what is the best way to see if there is at least one matching document? I tried a simple hit collector which sets a flag on the first Collect method. Ideally I would want to stop collecting at that point but I couldn't find a way of doing that. I also tried: TopDocs docs = _searcher.Search(query, filter, 1), but it seems to iterate through all matches as docs +
Digy 2010-01-14, 22:16
-
RE: at least one docNicholas Paldino [.NET/C#... 2010-01-14, 23:10
DIGY,
Yes, the problem is the cost of "throwing an exception" vs. "returning a value". The implication is that if you have to throw an exception to stop the collection of results, as opposed to a method that doesn't exist (which you indicated before, you wondered why it doesn't have some sort of way to indicate "stop collecting") which should return a value to indicate that you should stop collecting, then the design of the HitCollector interface is **wrong**. I definitely wouldn't suggest iterating through the entire result set; while throwing an exception is the best selection with what's available now, it doesn't mean it's correct. - Nick -----Original Message----- From: Digy [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 5:16 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc The problem is not the cost of "throwing an exception" vs. "returning a value". If you pass a MaxDoc# to HitCollector, you have an option "not to read" the docs from the index exceeding the Maxdoc#. The costly part of this process is "reader.Document(doc)". The problem arises if you have a millions of results: You have to iterate all over the results although you want to have top MaxDox# docs. The real performance problem is, which one is better: an empty iteration of millions of times or an exception (say at 100 or 500)? DIGY From: Nicholas Paldino [.NET/C# MVP] [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 11:35 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc Neal, With all due respect, you are wrong. Here is a program which demonstrates why (.NET 3.5/C# 3.0 required, but can easily be downconverted to 2.0 if necessary): using System; using System.Diagnostics; namespace ExceptionTest { class Program { static void Main(string[] args) { Console.WriteLine("TryParse (no exception) (ticks/iteration): {0:000.0000}", TestTryParse()); Console.WriteLine("Parse (exception) (ticks/iteration): {0:000.0000}", TestParse()); } const int Iterations = 10000; static double Test(Func<bool> test) { // Do a garbage collection, make them both start // from the same baseline. GC.Collect(); // Create the stopwatch. Stopwatch s = new Stopwatch(); s.Start(); // Iterate. for (int index = 0; index < Iterations; index++) { // Perform the action. test(); } // Return the result. This is the elapsed ticks divided // by the iterations. return s.ElapsedTicks / (double) Iterations; } // Invalid parse string. const string InvalidParseString = "10240223 This will not parse"; static double TestTryParse() { // Call the test method. return Test(() => { // The output value. int val; // Try and parse. return Int32.TryParse(InvalidParseString, out val); }); } static double TestParse() { // Call the test method. return Test(() => { // The value. int val; // Wrap in a try catch. try { // Parse. val = Int32.Parse(InvalidParseString); // Return true. return true; } catch (Exception) { return false; } }); } } } On my machine, using a return value over a caught exception takes ~000.5902 ticks per iteration. For throwing an exception and catching it, it takes ~168.4368 ticks per iteration. The return value is about 285 times faster, and you will see this on any machine (and qualifies the statement "orders of magnitude" since it is at least two orders of magnitude faster). Your experience probably tells you that in a one-off situation, it's not noticeable, and you are right, it isn't (noticeable that is), but that doesn't mean that it's not significantly faster than other options that are available. - Nick From: Granroth, Neal V. [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 3:37 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc Experience shows otherwise. - Neal From: Nicholas Paldino [.NET/C# MVP] [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 2:24 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc DIGY, In .NET it's more than small. Throwing and catching an exception (which is required here) is orders of magnitude slower than just returning a value. It has to do with the stack unwind and restoration, and I'm sure it's similar in Java. - Nick From: Digy [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 2:44 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc I also have thought many times why HitCollector.Collect doesn't return a boolean value indicating no more results are needed. Maybe, a small performance increment. DIGY From: Nicholas Paldino [.NET/C# MVP] [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 8:09 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc Wow, that's just... Horrible from a design perspective. Doesn't matter which language it's implemented in. - Nick From: Digy [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 12:22 PM To: lucene +
Nicholas Paldino [.NET/C#... 2010-01-14, 23:10
-
RE: at least one docFranklin Simmons 2010-01-14, 23:32
A HitCollector is the wrong class to use to solve the problem. The desired result is not to collect hits, but to know if there *are* any hits.
There is nothing incorrect with the HitCollector interface, rather, only infinite incorrect ways to use it. I have posted one (and surely not the best) way to get the desired information whilst not suffering the overhead a HitCollector incurs. It would be nice if others would approve/disapprove/improve upon my reply - which actually addresses the OP's question - and/or provide superior solutions. -----Original Message----- From: Nicholas Paldino [.NET/C# MVP] [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 6:10 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc DIGY, Yes, the problem is the cost of "throwing an exception" vs. "returning a value". The implication is that if you have to throw an exception to stop the collection of results, as opposed to a method that doesn't exist (which you indicated before, you wondered why it doesn't have some sort of way to indicate "stop collecting") which should return a value to indicate that you should stop collecting, then the design of the HitCollector interface is **wrong**. I definitely wouldn't suggest iterating through the entire result set; while throwing an exception is the best selection with what's available now, it doesn't mean it's correct. - Nick -----Original Message----- From: Digy [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 5:16 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc The problem is not the cost of "throwing an exception" vs. "returning a value". If you pass a MaxDoc# to HitCollector, you have an option "not to read" the docs from the index exceeding the Maxdoc#. The costly part of this process is "reader.Document(doc)". The problem arises if you have a millions of results: You have to iterate all over the results although you want to have top MaxDox# docs. The real performance problem is, which one is better: an empty iteration of millions of times or an exception (say at 100 or 500)? DIGY From: Nicholas Paldino [.NET/C# MVP] [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 11:35 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc Neal, With all due respect, you are wrong. Here is a program which demonstrates why (.NET 3.5/C# 3.0 required, but can easily be downconverted to 2.0 if necessary): using System; using System.Diagnostics; namespace ExceptionTest { class Program { static void Main(string[] args) { Console.WriteLine("TryParse (no exception) (ticks/iteration): {0:000.0000}", TestTryParse()); Console.WriteLine("Parse (exception) (ticks/iteration): {0:000.0000}", TestParse()); } const int Iterations = 10000; static double Test(Func<bool> test) { // Do a garbage collection, make them both start // from the same baseline. GC.Collect(); // Create the stopwatch. Stopwatch s = new Stopwatch(); s.Start(); // Iterate. for (int index = 0; index < Iterations; index++) { // Perform the action. test(); } // Return the result. This is the elapsed ticks divided // by the iterations. return s.ElapsedTicks / (double) Iterations; } // Invalid parse string. const string InvalidParseString = "10240223 This will not parse"; static double TestTryParse() { // Call the test method. return Test(() => { // The output value. int val; // Try and parse. return Int32.TryParse(InvalidParseString, out val); }); } static double TestParse() { // Call the test method. return Test(() => { // The value. int val; // Wrap in a try catch. try { // Parse. val = Int32.Parse(InvalidParseString); // Return true. return true; } catch (Exception) { return false; } }); } } } On my machine, using a return value over a caught exception takes ~000.5902 ticks per iteration. For throwing an exception and catching it, it takes ~168.4368 ticks per iteration. The return value is about 285 times faster, and you will see this on any machine (and qualifies the statement "orders of magnitude" since it is at least two orders of magnitude faster). Your experience probably tells you that in a one-off situation, it's not noticeable, and you are right, it isn't (noticeable that is), but that doesn't mean that it's not significantly faster than other options that are available. - Nick From: Granroth, Neal V. [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 3:37 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc Experience shows otherwise. - Neal From: Nicholas Paldino [.NET/C# MVP] [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 2:24 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc DIGY, In .NET it's more than small. Throwing and catching an exception (which is required here) is orders of magnitude slower than just returning a value. It has to do with the stack unwind and restoration, and I'm sure it's similar in Java. - Nick F +
Franklin Simmons 2010-01-14, 23:32
-
RE: at least one docMichael Garski 2010-01-15, 00:03
Well said Franklin - using a Collector (or the deprecated HitCollector) for this use case would be inappropriate. Your scorer suggestion is a good one that I will have to remember :)
This can also be done using a Filter. To see if there are any documents that match a filter use: DocIdSetIterator iter = filter.GetDocIdSet(reader).Iterator(); if(iter.Advance(0) == DocIdSetIterator.NO_MORE_DOCS) { // there are no documents that match the filter } The same approach could apply to a query as well using a QueryWrappingFilter. Filters can be combined logically, check out the BooleanFilter in the java contrib section. I have a port of BooleanFilter to c#, however the tests are in VS not NUnit which is why I have not posted them into JIRA just yet. Note that with using 2.9 that your filters are cached at the sub-reader level, so to use the existing filters you will have to iterate over the sub readers returned from IndexReader.GetSequentialSubReaders rather than use the top-level reader. Michael -----Original Message----- From: Franklin Simmons [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 3:33 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc A HitCollector is the wrong class to use to solve the problem. The desired result is not to collect hits, but to know if there *are* any hits. There is nothing incorrect with the HitCollector interface, rather, only infinite incorrect ways to use it. I have posted one (and surely not the best) way to get the desired information whilst not suffering the overhead a HitCollector incurs. It would be nice if others would approve/disapprove/improve upon my reply - which actually addresses the OP's question - and/or provide superior solutions. -----Original Message----- From: Nicholas Paldino [.NET/C# MVP] [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 6:10 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc DIGY, Yes, the problem is the cost of "throwing an exception" vs. "returning a value". The implication is that if you have to throw an exception to stop the collection of results, as opposed to a method that doesn't exist (which you indicated before, you wondered why it doesn't have some sort of way to indicate "stop collecting") which should return a value to indicate that you should stop collecting, then the design of the HitCollector interface is **wrong**. I definitely wouldn't suggest iterating through the entire result set; while throwing an exception is the best selection with what's available now, it doesn't mean it's correct. - Nick -----Original Message----- From: Digy [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 5:16 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc The problem is not the cost of "throwing an exception" vs. "returning a value". If you pass a MaxDoc# to HitCollector, you have an option "not to read" the docs from the index exceeding the Maxdoc#. The costly part of this process is "reader.Document(doc)". The problem arises if you have a millions of results: You have to iterate all over the results although you want to have top MaxDox# docs. The real performance problem is, which one is better: an empty iteration of millions of times or an exception (say at 100 or 500)? DIGY From: Nicholas Paldino [.NET/C# MVP] [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 11:35 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc Neal, With all due respect, you are wrong. Here is a program which demonstrates why (.NET 3.5/C# 3.0 required, but can easily be downconverted to 2.0 if necessary): using System; using System.Diagnostics; namespace ExceptionTest { class Program { static void Main(string[] args) { Console.WriteLine("TryParse (no exception) (ticks/iteration): {0:000.0000}", TestTryParse()); Console.WriteLine("Parse (exception) (ticks/iteration): {0:000.0000}", TestParse()); } const int Iterations = 10000; static double Test(Func<bool> test) { // Do a garbage collection, make them both start // from the same baseline. GC.Collect(); // Create the stopwatch. Stopwatch s = new Stopwatch(); s.Start(); // Iterate. for (int index = 0; index < Iterations; index++) { // Perform the action. test(); } // Return the result. This is the elapsed ticks divided // by the iterations. return s.ElapsedTicks / (double) Iterations; } // Invalid parse string. const string InvalidParseString = "10240223 This will not parse"; static double TestTryParse() { // Call the test method. return Test(() => { // The output value. int val; // Try and parse. return Int32.TryParse(InvalidParseString, out val); }); } static double TestParse() { // Call the test method. return Test(() => { // The value. int val; // Wrap in a try catch. try { // Parse. val = Int32.Parse(InvalidParseString); // Return true. return true; } catch (Exception) { return false; } }); } } } On my machine, using a return value over a caught exception takes ~000.5902 ticks per iteration. For throwing an excepti +
Michael Garski 2010-01-15, 00:03
-
Re: at least one docBen Martz 2010-01-14, 23:54
The use of exceptions for general code flow control is unfortunately
prevalent in the Lucene java code base and not something that we can easily get away from. I've toyed with the idea of creating a comprehensive "optimized" exception-fee Lucene.Net build just to instrument the overall performance difference but I've never had enough free time to give it a go. Even if anyone were to complete said build it would be of no use to the core Lucene.Net project since it would require changes to quite a number of methods to return useful status values and would require updating with every new release of Lucene.Net. +
Ben Martz 2010-01-14, 23:54
-
RE: at least one docMichael Garski 2010-01-14, 23:56
Versions prior to 2.3 had it in a few places (most notable
StandardAnalyzer) but in subsequent versions I am not aware of such flow control other than in QueryParser. What specifically are you referring to? Michael -----Original Message----- From: Ben Martz [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 3:54 PM To: [EMAIL PROTECTED] Subject: Re: at least one doc The use of exceptions for general code flow control is unfortunately prevalent in the Lucene java code base and not something that we can easily get away from. I've toyed with the idea of creating a comprehensive "optimized" exception-fee Lucene.Net build just to instrument the overall performance difference but I've never had enough free time to give it a go. Even if anyone were to complete said build it would be of no use to the core Lucene.Net project since it would require changes to quite a number of methods to return useful status values and would require updating with every new release of Lucene.Net. +
Michael Garski 2010-01-14, 23:56
-
Re: at least one docBen Martz 2010-01-15, 00:54
You're quite correct to call this out. I'm still not thinking in 2.9 terms
yet since by policy I can't use pre-release versions in my software and should have reserved my comments. On Thu, Jan 14, 2010 at 3:56 PM, Michael Garski <[EMAIL PROTECTED]>wrote: > Versions prior to 2.3 had it in a few places (most notable > StandardAnalyzer) but in subsequent versions I am not aware of such flow > control other than in QueryParser. What specifically are you referring > to? > > Michael > > -----Original Message----- > From: Ben Martz [mailto:[EMAIL PROTECTED]] > Sent: Thursday, January 14, 2010 3:54 PM > To: [EMAIL PROTECTED] > Subject: Re: at least one doc > > The use of exceptions for general code flow control is unfortunately > prevalent in the Lucene java code base and not something that we can > easily > get away from. I've toyed with the idea of creating a comprehensive > "optimized" exception-fee Lucene.Net build just to instrument the > overall > performance difference but I've never had enough free time to give it a > go. > Even if anyone were to complete said build it would be of no use to the > core > Lucene.Net project since it would require changes to quite a number of > methods to return useful status values and would require updating with > every > new release of Lucene.Net. > > -- 13:37 - Someone stole the precinct toilet. The cops have nothing to go on. 14:37 - Officers dispatched to a daycare where a three-year-old was resisting a rest. 21:11 - Hole found in nudist camp wall. Officers are looking into it. +
Ben Martz 2010-01-15, 00:54
-
Re: at least one docCiaran Roarty 2010-01-14, 21:16
I don't like the pattern - taken from Lucene - but I have never
encountered a major performance problem. C Sent from my iPhone On 14 Jan 2010, at 20:24, "Nicholas Paldino [.NET/C# MVP]" <[EMAIL PROTECTED] > wrote: > DIGY, > > In .NET it's more than small. Throwing and catching an exception > (which is required here) is orders of magnitude slower than just > returning a > value. It has to do with the stack unwind and restoration, and I'm > sure > it's similar in Java. > > - Nick > > -----Original Message----- > From: Digy [mailto:[EMAIL PROTECTED]] > Sent: Thursday, January 14, 2010 2:44 PM > To: [EMAIL PROTECTED] > Subject: RE: at least one doc > > I also have thought many times why HitCollector.Collect doesn't > return a > boolean value indicating no more results are needed. > Maybe, a small performance increment. > > DIGY > > -----Original Message----- > From: Nicholas Paldino [.NET/C# MVP] > [mailto:[EMAIL PROTECTED]] > Sent: Thursday, January 14, 2010 8:09 PM > To: [EMAIL PROTECTED] > Subject: RE: at least one doc > > Wow, that's just... Horrible from a design perspective. Doesn't > matter which language it's implemented in. > > - Nick > > -----Original Message----- > From: Digy [mailto:[EMAIL PROTECTED]] > Sent: Thursday, January 14, 2010 12:22 PM > To: [EMAIL PROTECTED] > Subject: RE: at least one doc > > The formal way is throwing exception in the HitCollector.Collect to > stop > iteration. > > DIGY > > -----Original Message----- > From: Artem Chereisky [mailto:[EMAIL PROTECTED]] > Sent: Thursday, January 14, 2010 1:16 AM > To: [EMAIL PROTECTED]; [EMAIL PROTECTED] > Subject: at least one doc > > Hi, > > Given a boolean query and/or a filter, what is the best way to see > if there > is at least one matching document? > > I tried a simple hit collector which sets a flag on the first Collect > method. Ideally I would want to stop collecting at that point but I > couldn't > find a way of doing that. > I also tried: TopDocs docs = _searcher.Search(query, filter, 1), but > it > seems to iterate through all matches as docs.totalHits is set the > the actual > number of matches. > > So, is there a better way > > Regards, > Art +
Ciaran Roarty 2010-01-14, 21:16
-
RE: at least one docTodd Carrico 2010-01-14, 21:18
I have, and it comes with load.
Try using a CLR profiler on an exception. tc -----Original Message----- From: Ciaran Roarty [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 3:16 PM To: [EMAIL PROTECTED] Subject: Re: at least one doc I don't like the pattern - taken from Lucene - but I have never encountered a major performance problem. C Sent from my iPhone On 14 Jan 2010, at 20:24, "Nicholas Paldino [.NET/C# MVP]" <[EMAIL PROTECTED] > wrote: > DIGY, > > In .NET it's more than small. Throwing and catching an exception > (which is required here) is orders of magnitude slower than just > returning a > value. It has to do with the stack unwind and restoration, and I'm > sure > it's similar in Java. > > - Nick > > -----Original Message----- > From: Digy [mailto:[EMAIL PROTECTED]] > Sent: Thursday, January 14, 2010 2:44 PM > To: [EMAIL PROTECTED] > Subject: RE: at least one doc > > I also have thought many times why HitCollector.Collect doesn't > return a > boolean value indicating no more results are needed. > Maybe, a small performance increment. > > DIGY > > -----Original Message----- > From: Nicholas Paldino [.NET/C# MVP] > [mailto:[EMAIL PROTECTED]] > Sent: Thursday, January 14, 2010 8:09 PM > To: [EMAIL PROTECTED] > Subject: RE: at least one doc > > Wow, that's just... Horrible from a design perspective. Doesn't > matter which language it's implemented in. > > - Nick > > -----Original Message----- > From: Digy [mailto:[EMAIL PROTECTED]] > Sent: Thursday, January 14, 2010 12:22 PM > To: [EMAIL PROTECTED] > Subject: RE: at least one doc > > The formal way is throwing exception in the HitCollector.Collect to > stop > iteration. > > DIGY > > -----Original Message----- > From: Artem Chereisky [mailto:[EMAIL PROTECTED]] > Sent: Thursday, January 14, 2010 1:16 AM > To: [EMAIL PROTECTED]; [EMAIL PROTECTED] > Subject: at least one doc > > Hi, > > Given a boolean query and/or a filter, what is the best way to see > if there > is at least one matching document? > > I tried a simple hit collector which sets a flag on the first Collect > method. Ideally I would want to stop collecting at that point but I > couldn't > find a way of doing that. > I also tried: TopDocs docs = _searcher.Search(query, filter, 1), but > it > seems to iterate through all matches as docs.totalHits is set the > the actual > number of matches. > > So, is there a better way > > Regards, > Art +
Todd Carrico 2010-01-14, 21:18
-
RE: at least one docNicholas Paldino [.NET/C#... 2010-01-14, 18:10
That being said, if you are going to throw an exception, you should
make sure that your exception class is a specific type that only you have access to (an internal class, perhaps), that way, you can be very specific with the catch clause and not have it get grouped into other valid exception classes. - Nick -----Original Message----- From: Digy [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 12:22 PM To: [EMAIL PROTECTED] Subject: RE: at least one doc The formal way is throwing exception in the HitCollector.Collect to stop iteration. DIGY -----Original Message----- From: Artem Chereisky [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 1:16 AM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: at least one doc Hi, Given a boolean query and/or a filter, what is the best way to see if there is at least one matching document? I tried a simple hit collector which sets a flag on the first Collect method. Ideally I would want to stop collecting at that point but I couldn't find a way of doing that. I also tried: TopDocs docs = _searcher.Search(query, filter, 1), but it seems to iterate through all matches as docs.totalHits is set the the actual number of matches. So, is there a better way Regards, Art +
Nicholas Paldino [.NET/C#... 2010-01-14, 18:10
-
RE: at least one docFranklin Simmons 2010-01-14, 19:02
HitCollector is the wrong tool to use because you're clearly not interested in collecting hits. Try the Scorer class. For example
Lucene.Net.Search.Weight weight = query.Weight(searcher); Lucene.Net.Search.Scorer scorer = weight.Scorer(searcher.GetIndexReader()); bool hasHits = scorer.Next(); -----Original Message----- From: Artem Chereisky [mailto:[EMAIL PROTECTED]] Sent: Wednesday, January 13, 2010 6:16 PM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: at least one doc Hi, Given a boolean query and/or a filter, what is the best way to see if there is at least one matching document? I tried a simple hit collector which sets a flag on the first Collect method. Ideally I would want to stop collecting at that point but I couldn't find a way of doing that. I also tried: TopDocs docs = _searcher.Search(query, filter, 1), but it seems to iterate through all matches as docs.totalHits is set the the actual number of matches. So, is there a better way Regards, Art +
Franklin Simmons 2010-01-14, 19:02
-
Re: at least one docTodd Carrico 2010-01-14, 17:23
Fail...
Just feels wrong. Tc ----- Original Message ----- From: Digy <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] <[EMAIL PROTECTED]> Sent: Thu Jan 14 11:21:44 2010 Subject: RE: at least one doc The formal way is throwing exception in the HitCollector.Collect to stop iteration. DIGY -----Original Message----- From: Artem Chereisky [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 14, 2010 1:16 AM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: at least one doc Hi, Given a boolean query and/or a filter, what is the best way to see if there is at least one matching document? I tried a simple hit collector which sets a flag on the first Collect method. Ideally I would want to stop collecting at that point but I couldn't find a way of doing that. I also tried: TopDocs docs = _searcher.Search(query, filter, 1), but it seems to iterate through all matches as docs.totalHits is set the the actual number of matches. So, is there a better way Regards, Art +
Todd Carrico 2010-01-14, 17:23
|