|
|
+
Li Li 2010-11-24, 10:16
+
Michael McCandless 2010-11-24, 16:57
-
Re: need some advice for modifying index format which supporting PForDelta compression/decompressionLi Li 2010-11-25, 09:07
Because we did some modification to index files, migrating to trunk
4.0 is not an easy task. I think Partial decode is possible. To speed up decode, PForDelta first decode all the arrays including exceptions without judge whether it's exception or not. There can make cpu predict next instruction correctly. Then use the linkedlist to decode exceptions. Even in decode non-exceptions, it can be stoped when we get our needed. e.g. we consider implementation in http://code.google.com/p/integer-array-compress-kit/ for decode 5 bit frames, it decodes 5 integers for a loop. which contains 5*32/5=32 encoded integers it first decodes 6 integer from val1 then 6 integer from val2(and partial bits from val1) ... So we can stop if we only need the value of the 5th integer and store the status of the decoder and if next needed is the 12th integer, we can continue from current status exception is a linked list, it's also easy to stop and resume it. But I don't know the efficiency and need doing some experiments which comparing it with vint decoder. static int fastDeCompressFor5Bit( int offSet, int[] encodedValue, int dataNum, int decodeOffset, int[] decode ){ int maxBlocks = dataNum >> 5; int rest = dataNum % 32; // block process for( int block = 0 ; block < maxBlocks ; block++ ){ int val1 = encodedValue[ offSet ++ ]; int val2 = encodedValue[ offSet ++ ]; int val3 = encodedValue[ offSet ++ ]; int val4 = encodedValue[ offSet ++ ]; int val5 = encodedValue[ offSet ++ ]; decode[ decodeOffset ++ ] = ( val1 << 27 ) >>> 27 ; decode[ decodeOffset ++ ] = ( val1 << 22 ) >>> 27 ; decode[ decodeOffset ++ ] = ( val1 << 17 ) >>> 27 ; decode[ decodeOffset ++ ] = ( val1 << 12 ) >>> 27 ; decode[ decodeOffset ++ ] = ( val1 << 7 ) >>> 27 ; decode[ decodeOffset ++ ] = ( val1 << 2 ) >>> 27 ; decode[ decodeOffset ++ ] = ( ( val2 << 29 ) >>> 27 ) | ( val1 >>> 30 ) ; decode[ decodeOffset ++ ] = ( val2 << 24 ) >>> 27 ; decode[ decodeOffset ++ ] = ( val2 << 19 ) >>> 27 ; decode[ decodeOffset ++ ] = ( val2 << 14 ) >>> 27 ; decode[ decodeOffset ++ ] = ( val2 << 9 ) >>> 27 ; decode[ decodeOffset ++ ] = ( val2 << 4 ) >>> 27 ; decode[ decodeOffset ++ ] = ( ( val3 << 31 ) >>> 27 ) | ( val2 >>> 28 ) ; decode[ decodeOffset ++ ] = ( val3 << 26 ) >>> 27 ; decode[ decodeOffset ++ ] = ( val3 << 21 ) >>> 27 ; decode[ decodeOffset ++ ] = ( val3 << 16 ) >>> 27 ; decode[ decodeOffset ++ ] = ( val3 << 11 ) >>> 27 ; decode[ decodeOffset ++ ] = ( val3 << 6 ) >>> 27 ; decode[ decodeOffset ++ ] = ( val3 << 1 ) >>> 27 ; decode[ decodeOffset ++ ] = ( ( val4 << 28 ) >>> 27 ) | ( val3 >>> 31 ) ; decode[ decodeOffset ++ ] = ( val4 << 23 ) >>> 27 ; decode[ decodeOffset ++ ] = ( val4 << 18 ) >>> 27 ; decode[ decodeOffset ++ ] = ( val4 << 13 ) >>> 27 ; decode[ decodeOffset ++ ] = ( val4 << 8 ) >>> 27 ; decode[ decodeOffset ++ ] = ( val4 << 3 ) >>> 27 ; decode[ decodeOffset ++ ] = ( ( val5 << 30 ) >>> 27 ) | ( val4 >>> 29 ) ; decode[ decodeOffset ++ ] = ( val5 << 25 ) >>> 27 ; decode[ decodeOffset ++ ] = ( val5 << 20 ) >>> 27 ; decode[ decodeOffset ++ ] = ( val5 << 15 ) >>> 27 ; decode[ decodeOffset ++ ] = ( val5 << 10 ) >>> 27 ; decode[ decodeOffset ++ ] = ( val5 << 5 ) >>> 27 ; decode[ decodeOffset ++ ] = val5 >>> 27 ; } 2010/11/25 Michael McCandless <[EMAIL PROTECTED]>: > Are you sure you can't use trunk? Swapping in PForDelta as a codec is > much easier and we are wanting to do that, anyway (as a hybrid codec > w/ Pulsing as well), before releasing 4.0 :) > > Partial decode of PForDelta is challenging I think... because the > exceptions are a linked list, you have to walk the exceptions up > front? > > That's a nice idea to interleave the encoding of 128 docDeltas then > 128 docFreqs... in trunk's Sep codec we currently put them in separate > files. > > Mike |