Home | About | Sematext search-lucene.com search-hadoop.com
 Search Lucene and all its subprojects:

Switch to Threaded View
Lucene, mail # dev - anyone has interests about mg4j's new integer compression algorithm?


Copy link to this message
-
Re: anyone has interests about mg4j's new integer compression algorithm?
Dawid Weiss 2012-07-06, 09:53
That 4.0 is significantly faster than 3.6 for this benchmark and there
were minor glitches in the benchmarking code itself.

Dawid

On Fri, Jul 6, 2012 at 11:47 AM, Li Li <[EMAIL PROTECTED]> wrote:
> I can understand these quotes. what's the conclusion from your communication?
>
> On Fri, Jul 6, 2012 at 4:20 PM, Dawid Weiss
> <[EMAIL PROTECTED]> wrote:
>> I've repeated Sebastiano's experiments (and so did he). A few quotes
>> from the communication.
>>
>>> The index appears to be larger now--43.1GB. Probably they have better skipping structures that take more space.
>>>
>>> From what I can see the format is the same as before--the .frq file contains document pointers and positions. So my SearchFiles class still reads documents *and* counts.
>>>
>>> But the most interesting part I've read in a blog is that now Lucene has a pluggable index format. This means that someone can actually write a QS index for Lucene and test what happens in production. That's a most interesting change!
>>
>> and:
>>
>>> Well, they made a great job:
>>>
>>> trec-40-text    unscored        terms   result: 5511    494901
>>> trec-40-text    unscored        and     result: 2193 769110
>>> trec-40-text    unscored        phrase  result: 6615 148663
>>> trec-40-text    unscored        spans   result: 12407 545090
>>>
>>> So conjunction is still better, but by a really smaller margin. The worst part is term scanning--they are now significantly faster than QS indices.
>>
>> Dawid
>>
>>
>>
>> On Sun, Jun 24, 2012 at 9:31 AM, Dawid Weiss
>> <[EMAIL PROTECTED]> wrote:
>>> Fyi. I contacted Sebastiano and will get hold of the data set and
>>> benchmarks he used to repeat his experiment with current trunk
>>> (curiosity). Any hints on which configuration should be used will be
>>> welcome.
>>>
>>> Dawid
>>>
>>> On Sat, Jun 23, 2012 at 12:38 PM, Li Li <[EMAIL PROTECTED]> wrote:
>>>> http://mg4j.di.unimi.it/
>>>> http://vigna.di.unimi.it/papers.php#VigQSI
>>>>
>>>> sounds very interesting and attractive.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>

---------------------------------------------------------------------