Hello, I done several tests with MTF and qlfc post-processing. It seems when data are processed in a better way (before final entropy coding stage), MTF / qlfc transform can actually improve compression overall (even instead of deterioration by these transforms). And I think this could be used with DC or IF algorithms as well.
I have used a hybrid of Elias Gama code with following modifications below as the intermediate step, but I think it could be the last step as well., or part of it, which i think could be computed relatively fast.
So the post-processing looks as follows( stream1, stream2 ):
1
010
011
00100
00101
..
Well, so far this looks like a common Elias Gama Code : ) but notice all prefixes, which contain N zeros and foremost '1' character, will be encoded separately. The rest is a RLE-Bit encoding, with O(1) transformation, or modification I presented here http://encode.su/threads/546-Move-to...ll=1#post22183
This transform preceding Rle-Bit implies that RLE-Bit stream should contain less '1' symbols for more common input characters.Code:int rle0Trans[] = new int[] {1,2,3,4,5,6,7,8,9,11,10,13,12,14,15,16,17,19,23,18,21,27,20,25,24,28,26,29,22,30, 31,32,33,35,39,47,34,37,43,55,36,41,51,40,49,48,38,45,59,42,53,57,44,50,56,52,54,60,58,46,61,62, 63,64,65,67,71,79,95,66,69,75,87,111,68,73,83,103,72,81,99,80,97,96,70,77,91,119,74,85,107,76,89,115,82,101,88, 113,104,100,98,112,84,105,123,120,114,102,78,121,116,106,86,117,90,93,108,109,92,125,124,122,118,110,94,126}; if(rleBitChar <= 126) rleBitChar = rle0Trans[(int)rleBitChar - 1]; .. now do RLE-Bit encode
Some information on RLE-Bit:
http://www.juergen-abel.info/Preprin...BWT_Stages.pdf
Now compression workflow could look following:
- At first encode RLE on plain BWT output, with runs of identical characters, but basically the same way as all remaining characters in step 3. Anyway notice we need to encode both runs of identical characters AND their positions as distances against remaining characters (only in this step), which result in four separate streams (RLE, their distances and than both code prefixes & transformed RLE-Bit codes above, which I encoded by 4 separate EC models). I got some 6,358,486 from this RLE on enwik8bwt (RLE including distances).
- Second step is MTF, WFC or qlfc transform, on all remaining characters, but these now do not contain runs of identical adjacent characters, which means resulting alphabet will be one character smaller than input one.
- Proceed all MTF / qlfc output characters with post-processing code above, but notice for first character(most common one), we can code only prefix, no RLE-Bit stage needed. So now we create two streams for all characters, prefixes and transformed RLE-Bit, resulting in fully binary alphabet.
- Last step is binary entropy coding (= binary RC, Arithmetic coding).
I think this workflow could be bit different e.g. for DC or IF. but I haven't tested for these so far.
Also many of these steps can be made in parallel.
Results: I got some 20056570 on enwik8bwt with qlfc and binary rc, while following post-processing got me some 1MB improvement.
Anyway I don't have a working encoder or demo yet, I used a bunch of independent scripts and in-place code modifications. So the question is how this will behave with different data formats than text, and some changes might be needed.