LZ4a is based on LZ4X.
Improved long literal run and long matches encoding. But is no compatibility.
LZ4a is based on LZ4X.
Improved long literal run and long matches encoding. But is no compatibility.
Bulat Ziganshin (12th September 2016),comp1 (12th September 2016),encode (12th September 2016),knutsen (13th September 2016),Nania Francesco (12th September 2016),Stephan Busch (12th September 2016)
Do you mean logarithmic storage, 7-bits for content and 1 to flag whether it's the end of the number?
Bulat suggested such scheme before, I tested it (with a naive encoder) and the results were a mixed bug. Usually stronger, sometimes weaker. Slightly slower. Overall about as good, just a very slightly different point on the pareto-curve.
My encoder doesn't use 1bit flag. Max block size is 8MB, so extra bytes is 1 - 3 bytes.Do you mean logarithmic storage, 7-bits for content and 1 to flag whether it's the end of the number?
code extra byte(s)
253 : 1byte
254 : 2bytes
255 : 3bytes
I am currently testing most of your compression programs.
Can you please provide a 64-bit build of BMZFP 1.1 and BMGVC - then I can use a higher blocksize for compression.
Do you also plan to insert filters (x86, delta filter..) in your programs?
in LZ4a the progress (running bytes) does freeze after 8388608 Bytes: compression takes place until the end.
I also could not compile Xeloz 0.4.2.
My system is Windows XP/7 32bit only. I can't test 64bit programs. And I'm not yet proficient in C/C++.I am currently testing most of your compression programs.
Can you please provide a 64-bit build of BMZFP 1.1 and BMGVC - then I can use a higher blocksize for compression.
I haven't pant to do. However I want to.Do you also plan to insert filters (x86, delta filter..) in your programs?
I tested but stops short time. I don't know its reason.in LZ4a the progress (running bytes) does freeze after 8388608 Bytes: compression takes place until the end.
Xeloz 0.3.5.3 later has bug, sorry.I also could not compile Xeloz 0.4.2.
Stephan Busch (13th September 2016)
For LZ-based algorithms, you can use the ready-to-use filters here:
https://github.com/fusiyuan2010/CSC/...sc_filters.cpp
https://github.com/fusiyuan2010/CSC/.../csc_filters.h
For BWT-based algorithms, you can use the filters of Edgar Binder (not C++ but Pascal):
http://nishi.dreamhosters.com/u/filter.rar
or you look into LIBBSC
xezz (13th September 2016)
Interesting scheme. So on the 1st byte you waste 2 bits, compared bo 1 of Bulat and 0 of Yann. Then you waste nothing, while Bulat 1 bit/byte and Yann much more.
The first byte is the most critical, since sequence frequency drops fast with length, that's why in my tests Yann's variant was sometimes stronger.
So comparing to Bulat, you lose 1 byte on lengths of 64-127, win at most 1 byte on lengths of 2 MB and more. I don't think that's good.
Isn't it?Code:LZ4a length 1-15 : 1byte 16-268: 2bytes 268< : 2 + 1..3bytes original LZ4 length 1-15 : 1byte 16-270: 2bytes 270< : 2 + length/255 bytes Bulat's length 1-15 : 1byte 16-143: 2bytes 144< : 2 + log128(length) bytes
lz4:
1..255: 1 byte
256..510: 2 bytes
bulat:
1..128: 1 byte
129..16K: 2 bytes
16K..2M: 3 bytes
lz4a:
1..254: 1 byte
255..64K: 3 bytes
64K..16M: 4 bytes
your encoding is 1 byte shorter on 128..255 bytes, but 1 byte longer on 256..16K and 64K..2M, and doesn't support >16M at all
Oh, my bad, Bulat is right.
That's much better.
Check out the SQLite4 variable length integer encoding, it's pretty nice: https://sqlite.org/src4/doc/trunk/www/varint.wiki
Of course, you can adapt it depending on the distribution of the values to encode.
You can find other examples here: https://github.com/stoklund/varint
Bulat Ziganshin (13th September 2016),Stefan Atev (13th September 2016),xezz (14th September 2016)
Some years ago i made an not binary compatible lz4 codec and an lz4-like codec with 32kb window and an additional smaller format with 2 bit match length 2 bit literal length and 2kb distance(some similarities with lz5).They used bulat's encoding.
The last codec had higher compression ration but, because of a single branch, decompression was slower.
If you find them interesting i can publish code.
i think it's better call it "7+1" encoding or something like that![]()
Maybe we can call it log128 encoding.
But now as i think more about it ,encoding of smaller values is more important than larger ones as they are more frequent.
A better one may be 11xxxxxx encoding with the first two bits signalling additional bytes.
The correct name is ULEB128 https://en.wikipedia.org/wiki/LEB128
m^3 (14th September 2016)
Dear Xezz,
Would ich help to improve BWT compression by using SAIS instead of difsufsort in your BWT compressors?
According to https://github.com/y-256/libdivsufso..._Benchmarks.md it performs best in most cases.
ok, updated (added option)
Where I can find the updated code? (updated today?)
I've downloaded the 7z file from first post (updated 11/9). When I run the EXE, I get an error saying libgcc_s_dw2 lib missing.
I've recompiled the executable with -static and it works fine ( -static-libgcc should work too).
If my tests are correct, there seems to be no difference in compression when comparing difsufsort and Sais..
Sorry, LZ4a isn't updated. I mean BWT based compressor. code is bellow.Where I can find the updated code?
divsufsort is faster than sais in my test.there seems to be no difference in compression when comparing difsufsort and Sais
enwik7 result:
divsufsort 6.453s
sais 8.843s
This was yet another fancy LZ4-style compressor I've tested. It does what promised, I could swear! Somehow I've soon learned it heavily based on LZ4X, without even reading source of this part, it has been really easy to spot. Because it exposed properties verrrry similar to what I've described at http://encode.su/threads/2447-LZ4X-A...0602#post50602 - just doing somewhat better ratio, just as planned.
But damn, when I've decided to do some "real-world" tests, my crash test dummy has walked away from the test site for second time in a row, being the only survivor, as described in LZ4X thread mentioned above. That's how I've learned it based on LZ4X, which is the only thing I know to expose such properties. On side note, newer LZ4X version is a bit smarter about includes, making gcc more happy, you may want to re-take this part from newer LZ4X version.