Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Lucene, mail # dev - Re: (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)


Copy link to this message
-
Re: (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
Michael McCandless 2012-08-11, 18:58
On Sat, Aug 11, 2012 at 10:31 AM, Robert Muir <[EMAIL PROTECTED]> wrote:
> I'm having a tough time remembering what these packed ints options do
> (I thought the perf boost from allowing overhead came from upgrading
> to the next byte boundary?)

Upgrading to the next byte boundary, or using PACKED_SINGLE_BLOCK when possible.

> Anyway: again I'm a little concerned about the wikipedia benchmark
> here for this purpose.

We should find another corpus/corpora to also test...

> For e.g. structured content from databases (tiny fields) where the
> numbers are much tinier on average the numbers could be different. I'm
> also worried about the fact
> that decode speed is over-emphasized in the wikipedia benchmark since
> all the I/O is hot.

True.

> So I think if its this ambiguous for wikipedia we should shoot for the
> most COMPACT form as a safe default.

+1

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------