Results 1 to 21 of 21

Thread: LZ4a

  1. #1
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    164
    Thanks
    31
    Thanked 64 Times in 40 Posts

    LZ4a

    LZ4a is based on LZ4X.
    Improved long literal run and long matches encoding. But is no compatibility.
    Attached Files Attached Files

  2. Thanks (6):

    Bulat Ziganshin (12th September 2016),comp1 (12th September 2016),encode (12th September 2016),knutsen (13th September 2016),Nania Francesco (12th September 2016),Stephan Busch (12th September 2016)

  3. #2
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Do you mean logarithmic storage, 7-bits for content and 1 to flag whether it's the end of the number?
    Bulat suggested such scheme before, I tested it (with a naive encoder) and the results were a mixed bug. Usually stronger, sometimes weaker. Slightly slower. Overall about as good, just a very slightly different point on the pareto-curve.

  4. #3
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    164
    Thanks
    31
    Thanked 64 Times in 40 Posts
    Do you mean logarithmic storage, 7-bits for content and 1 to flag whether it's the end of the number?
    My encoder doesn't use 1bit flag. Max block size is 8MB, so extra bytes is 1 - 3 bytes.
    code  extra byte(s)
    253 : 1byte
    254 : 2bytes
    255 : 3bytes

  5. #4
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    876
    Thanks
    474
    Thanked 175 Times in 85 Posts
    I am currently testing most of your compression programs.
    Can you please provide a 64-bit build of BMZFP 1.1 and BMGVC - then I can use a higher blocksize for compression.
    Do you also plan to insert filters (x86, delta filter..) in your programs?

    in LZ4a the progress (running bytes) does freeze after 8388608 Bytes: compression takes place until the end.

    I also could not compile Xeloz 0.4.2.

  6. #5
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    164
    Thanks
    31
    Thanked 64 Times in 40 Posts
    I am currently testing most of your compression programs.
    Can you please provide a 64-bit build of BMZFP 1.1 and BMGVC - then I can use a higher blocksize for compression.
    My system is Windows XP/7 32bit only. I can't test 64bit programs. And I'm not yet proficient in C/C++.
    Do you also plan to insert filters (x86, delta filter..) in your programs?
    I haven't pant to do. However I want to.
    in LZ4a the progress (running bytes) does freeze after 8388608 Bytes: compression takes place until the end.
    I tested but stops short time. I don't know its reason.
    I also could not compile Xeloz 0.4.2.
    Xeloz 0.3.5.3 later has bug, sorry.

  7. Thanks:

    Stephan Busch (13th September 2016)

  8. #6
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    876
    Thanks
    474
    Thanked 175 Times in 85 Posts
    For LZ-based algorithms, you can use the ready-to-use filters here:
    https://github.com/fusiyuan2010/CSC/...sc_filters.cpp
    https://github.com/fusiyuan2010/CSC/.../csc_filters.h

    For BWT-based algorithms, you can use the filters of Edgar Binder (not C++ but Pascal):
    http://nishi.dreamhosters.com/u/filter.rar
    or you look into LIBBSC

  9. Thanks:

    xezz (13th September 2016)

  10. #7
    Member
    Join Date
    Nov 2015
    Location
    ?l?nsk, PL
    Posts
    81
    Thanks
    9
    Thanked 13 Times in 11 Posts
    Quote Originally Posted by xezz View Post
    My encoder doesn't use 1bit flag. Max block size is 8MB, so extra bytes is 1 - 3 bytes.
    code  extra byte(s)
    253 : 1byte
    254 : 2bytes
    255 : 3bytes
    Interesting scheme. So on the 1st byte you waste 2 bits, compared bo 1 of Bulat and 0 of Yann. Then you waste nothing, while Bulat 1 bit/byte and Yann much more.
    The first byte is the most critical, since sequence frequency drops fast with length, that's why in my tests Yann's variant was sometimes stronger.
    So comparing to Bulat, you lose 1 byte on lengths of 64-127, win at most 1 byte on lengths of 2 MB and more. I don't think that's good.

  11. #8
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    164
    Thanks
    31
    Thanked 64 Times in 40 Posts
    Code:
    LZ4a
    length
    1-15  : 1byte
    16-268: 2bytes
    268<  : 2 + 1..3bytes
    
    
    original LZ4
    length
    1-15  : 1byte
    16-270: 2bytes
    270<  : 2 + length/255 bytes
    
    
    Bulat's
    length
    1-15  : 1byte
    16-143: 2bytes
    144<  : 2 + log128(length) bytes
    Isn't it?

  12. #9
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,564
    Thanks
    775
    Thanked 687 Times in 372 Posts
    lz4:
    1..255: 1 byte
    256..510: 2 bytes

    bulat:
    1..128: 1 byte
    129..16K: 2 bytes
    16K..2M: 3 bytes

    lz4a:
    1..254: 1 byte
    255..64K: 3 bytes
    64K..16M: 4 bytes

    your encoding is 1 byte shorter on 128..255 bytes, but 1 byte longer on 256..16K and 64K..2M, and doesn't support >16M at all

  13. #10
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Oh, my bad, Bulat is right.
    That's much better.

  14. #11
    Member
    Join Date
    May 2016
    Location
    Canada
    Posts
    6
    Thanks
    0
    Thanked 4 Times in 2 Posts
    Check out the SQLite4 variable length integer encoding, it's pretty nice: https://sqlite.org/src4/doc/trunk/www/varint.wiki

    Of course, you can adapt it depending on the distribution of the values to encode.

    You can find other examples here: https://github.com/stoklund/varint

  15. Thanks (3):

    Bulat Ziganshin (13th September 2016),Stefan Atev (13th September 2016),xezz (14th September 2016)

  16. #12
    Member
    Join Date
    Apr 2015
    Location
    Greece
    Posts
    100
    Thanks
    37
    Thanked 28 Times in 19 Posts
    Some years ago i made an not binary compatible lz4 codec and an lz4-like codec with 32kb window and an additional smaller format with 2 bit match length 2 bit literal length and 2kb distance(some similarities with lz5).They used bulat's encoding.
    The last codec had higher compression ration but, because of a single branch, decompression was slower.
    If you find them interesting i can publish code.

  17. #13
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,564
    Thanks
    775
    Thanked 687 Times in 372 Posts
    i think it's better call it "7+1" encoding or something like that

  18. #14
    Member
    Join Date
    Apr 2015
    Location
    Greece
    Posts
    100
    Thanks
    37
    Thanked 28 Times in 19 Posts
    Maybe we can call it log128 encoding.
    But now as i think more about it ,encoding of smaller values is more important than larger ones as they are more frequent.
    A better one may be 11xxxxxx encoding with the first two bits signalling additional bytes.

  19. #15
    Member
    Join Date
    May 2016
    Location
    Canada
    Posts
    6
    Thanks
    0
    Thanked 4 Times in 2 Posts
    Quote Originally Posted by algorithm View Post
    Maybe we can call it log128 encoding.
    The correct name is ULEB128 https://en.wikipedia.org/wiki/LEB128

  20. Thanks:

    m^3 (14th September 2016)

  21. #16
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    876
    Thanks
    474
    Thanked 175 Times in 85 Posts
    Dear Xezz,

    Would ich help to improve BWT compression by using SAIS instead of difsufsort in your BWT compressors?
    According to https://github.com/y-256/libdivsufso..._Benchmarks.md it performs best in most cases.

  22. #17
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    164
    Thanks
    31
    Thanked 64 Times in 40 Posts
    ok, updated (added option)

  23. #18
    Member
    Join Date
    Apr 2013
    Location
    IT
    Posts
    71
    Thanks
    22
    Thanked 13 Times in 12 Posts
    Where I can find the updated code? (updated today?)

    I've downloaded the 7z file from first post (updated 11/9). When I run the EXE, I get an error saying libgcc_s_dw2 lib missing.
    I've recompiled the executable with -static and it works fine ( -static-libgcc should work too).

  24. #19
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    876
    Thanks
    474
    Thanked 175 Times in 85 Posts
    If my tests are correct, there seems to be no difference in compression when comparing difsufsort and Sais..

  25. #20
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    164
    Thanks
    31
    Thanked 64 Times in 40 Posts
    Where I can find the updated code?
    Sorry, LZ4a isn't updated. I mean BWT based compressor. code is bellow.
    there seems to be no difference in compression when comparing difsufsort and Sais
    divsufsort is faster than sais in my test.
    enwik7 result:
    divsufsort 6.453s
    sais 8.843s
    Attached Files Attached Files
    • File Type: 7z bc.7z (47.4 KB, 62 views)

  26. #21
    Member
    Join Date
    Nov 2015
    Location
    boot ROM
    Posts
    95
    Thanks
    27
    Thanked 17 Times in 15 Posts
    This was yet another fancy LZ4-style compressor I've tested. It does what promised, I could swear! Somehow I've soon learned it heavily based on LZ4X, without even reading source of this part, it has been really easy to spot. Because it exposed properties verrrry similar to what I've described at http://encode.su/threads/2447-LZ4X-A...0602#post50602 - just doing somewhat better ratio, just as planned.

    But damn, when I've decided to do some "real-world" tests, my crash test dummy has walked away from the test site for second time in a row, being the only survivor, as described in LZ4X thread mentioned above. That's how I've learned it based on LZ4X, which is the only thing I know to expose such properties. On side note, newer LZ4X version is a bit smarter about includes, making gcc more happy, you may want to re-take this part from newer LZ4X version.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •