|
|
-
RE: [Lucene.Net] Lucene Document RetrievalMiller, Bill 2011-07-06, 12:54
Well, here's my 2 cents of performance comments...
My test index runs around 70k docs, 50 fields per doc, text size per doc maybe 4k on avg and no term vectors. Simple queries run about 50 msecs as well, but I can process 1000 results (pulling about 1k data (max) for each doc) in 125 msecs or less. However, all my stored field data I store in one non-indexed field and retrieve in one shot. Unfortunately I never tested with multiple stored fields but I'm guessing there may be a big savings there. My old indexing engine (AltaVista) saved data in one 'blob' per doc as well and I practically plopped Lucene in its place. (just moved to 2.9.4.2) Bill Miller, QuickWire Labs www.quickwire.com -----Original Message----- From: Moray McConnachie [mailto:[EMAIL PROTECTED]] Sent: Tuesday, July 05, 2011 4:14 AM To: [EMAIL PROTECTED] Subject: Re: [Lucene.Net] Lucene Document Retrieval Interesting. Our search and filtering approach involves calling the document function on every match to a search result, so frequently 5-10k results on a search run against a store of 175k documents. This is performant - I definitely don't notice document being a slow function However, we deliberately store only the contents of key metadata fields (typically 1k-3k per document), nor do we store term positioning data. Does the size of each document make a difference? I certainly second the approach which stores most of the document outside Lucene, but if you need it during filtering or the result-generation process (as in highlighting, for example) I suspect it may be highly case-dependent which approach will perform better. Our production system version of Lucene is considerably older than this one. We're running the newer version in dev, and haven't noticed any issues, but then we haven't run load testing yet. Is there performance data available for different functions in the different versions? M. ------------------------------------- Moray McConnachie Director of IT +44 1865 261 600 Oxford Analytica http://www.oxan.com ----- Original Message ----- From: Noel Lysaght [mailto:[EMAIL PROTECTED]] Sent: Monday, July 04, 2011 08:34 PM To: [EMAIL PROTECTED] <[EMAIL PROTECTED]> Subject: Re: [Lucene.Net] Lucene Document Retrieval HI Andrew, we're using Lucene.NET version 2.9.2. We found Lucene excellent at indexing and searching, but found that when using it like a database that retrieval of data that was stored while indexing was very slow. So to optimise we stored the minimal amount of data possible, by storing identifiers for data we wanted to retrieve, and then used those id's to return back the data by calling a database lookup to get all the data in a single trip to the DB server. That worked out very well for us, with both searching and data retrieval/ gathering taking about the same amount of time. It's a SQL2005 database that we get the data from. I don't know if that would work in your situation you may be dependant on Lucene for both storage and searching. Kind Regards Noel -----Original Message----- From: Andrew W. Sent: Monday, July 04, 2011 5:07 PM To: [EMAIL PROTECTED] Subject: [Lucene.Net] Lucene Document Retrieval Hi, I have a Lucene index with 70,000 documents. The size of the index is round 300MB - I have 32 fields, but I only retrieve values for 11 fields when I display results on the page (I use FieldSelector). The query time is great - most of the queries execute under 50ms, however when I loop through results to retrieve documents (I only retrieve 10 documents at once, because this is my pagination size), the loop takes sometimes even longer than 300ms I applied all recommendations mentioned on that page http://wiki.apache.org/lucene-java/ImproveSearchingSpeed to optimize query time. I was really surprised when I noticed that the bottleneck is in the loop below, especially becuase the index is not that big (only 300MB) and the hits array contains only 10 elements Is there anything else I can do to optimize document retrieval from lucene index? Here is the sample code: Dim FastFieldSelector As New FastFieldSelector(Fields) For i As Integer = 0 To hits.Length - 1 Dim DocId As Integer = hits(i).DocId * Dim doc As Document = reader.Document(DocId, FastFieldSelector) ' this line is taking long time* For Each f As Field In Fields ... Dim values() As Fieldable = doc.GetFieldables(f.Name) .... Next Next Thanks, Andrew Disclaimer This message and any attachments are confidential and/or privileged. If this has been sent to you in error, please do not use, retain or disclose them, and contact the sender as soon as possible. Oxford Analytica Ltd Registered in England: No. 1196703 5 Alfred Street, Oxford United Kingdom, OX1 4EH |