Results 1 to 29 of 29

Thread: balz v1.04 is here!

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    OK, a very special version of BALZ is here!

    Briefly what's new:
    + Enlarged window size to 4 MB, block size to 32 MB
    + Improved match finder
    + Improved parsing. Default "e" mode uses greedy parsing. An optimized "ex" mode uses an advanced lazy matching with two byte lookahead. During parsing, encoder checks some additional conditions like is current offset in Rep() state, is current offset good enough, etc.

    Enjoy!



    http://encode.su/balz/index.htm


  2. #2
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    SFC Test
    option [ex]
    13.077.423 B comp. 169,703 s. dec. 2,687 s.
    option [e]
    13.278.748 B comp. 68,060 s. dec. 2,705 s.

  3. #3
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Thanks Ilia!

    Mirror: Download

  4. #4
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    I/you/they have remained really interested Ilia by this BALZ - LZ77 compressor. Go down under the 100.000.000 Bs in Maximum compression for MFC test is not what from not too long! I am curious to know if you/he/she has inserted a pre-filter delta for BMP, TIFF, WAVE etc.?

  5. #5
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Nope, BALZ has exactly the same E8/E9 transformer as LZPM (an improved QUAD's one) and only. Just BALZ is much stronger on binary data, thanks to LZ77! I'm hoping that new BALZ v1.04 will have MUCH higher compression on ALL test sets, including MFC, Squeeze Chart, Black_Fox's and of course yours!

  6. #6
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quick test...

    BALZ [e]

    A10.jpg > 843,382
    AcroRd32.exe > 1,473,688
    english.dic > 872,448
    FlashMX.pdf > 3,751,136
    FP.LOG > 895,287
    MSO97.DLL > 1,916,423
    ohs.doc > 844,279
    rafale.bmp > 1,089,156
    vcfiu.hlp > 731,261
    world95.txt > 233,472

    Total = 12,650,532 bytes


    ENWIK8 > 30,279,021 bytes

    Elapsed Time: 00:45:14.517 (2714.517 Seconds)


    BALZ [ex]

    A10.jpg > 843,382
    AcroRd32.exe > 1,449,276
    english.dic > 962,560
    FlashMX.pdf > 3,738,823
    FP.LOG > 855,849
    MSO97.DLL > 1,885,008
    ohs.doc > 836,783
    rafale.bmp > 1,071,154
    vcfiu.hlp > 698,529
    world95.txt > 604,981

    Total = 12,946,345 bytes


    ENWIK8 > 29,230,841 bytes

    Elapsed Time: 02:04:27.840 (7467.840 Seconds)

  7. #7
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    LovePimple
    Re-check the results, there is something wrong!

  8. #8
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Results are correct for my machine. It seems that BALZ still fails to work correctly on my old P3 @750MHz machine.

  9. #9
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts

  10. #10
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Here are the results from the same test on my AMD Sempron 2400+ machine...

    BALZ [e]

    A10.jpg > 843,382
    AcroRd32.exe > 1,473,688
    english.dic > 1,095,449
    FlashMX.pdf > 3,751,136
    FP.LOG > 895,287
    MSO97.DLL > 1,916,423
    ohs.doc > 844,279
    rafale.bmp > 1,089,156
    vcfiu.hlp > 731,261
    world95.txt > 638,687

    Total = 13,278,748 bytes


    BALZ [ex]

    A10.jpg > 843,382
    AcroRd32.exe > 1,449,276
    english.dic > 1,093,638
    FlashMX.pdf > 3,738,823
    FP.LOG > 855,849
    MSO97.DLL > 1,885,008
    ohs.doc > 836,783
    rafale.bmp > 1,071,154
    vcfiu.hlp > 698,529
    world95.txt > 604,981

    Total = 13,077,423 bytes

    Compression of ENWIK8 is far too slow to keep retesting.

    EDIT: Here are the results for the fastest [e] setting...


    ENWIK8 > 30,279,021 bytes

    Elapsed Time: 00:42:10.449 (2530.449 Seconds)

  11. #11
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Thank you!

  12. #12
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts

  13. #13
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by LovePimple
    We had this problem before.
    Yep! Its a compiler-related problem... Anyway, BALZ is for modern PCs. P3 is for museums. For example, some time ago I get myself to the PC center to get a RAM for my sampler, the RAM type is equal to an old laptops type. Sellers said that such RAM type is from P3 era and should be placed at museum, finally Ive found one chip and purchase it at very high price - because its a museum-like, very rare RAM chip...
    I just dont know whats wrong... And since it works on ALL other machines, I still think its OK...

    As always, you may play with Visual Studio compile:
    balz104cl.zip


  14. #14
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Some testing results:

    textures.tar (Textures from the Doom 3 game, 604,218,368 bytes)

    PKZIP 2.50, -exx: 233,240,852 bytes
    TOR 0.4, -5: 216,888,608 bytes
    TOR 0.4, -11: 210,794,900 bytes
    CABARC 1.00, -m LZX:21: 193,234,553 bytes
    LZPM 0.15, 1: 187,609,193 bytes
    LZPM 0.15, 9: 185,038,816 bytes
    BALZ 1.04, e: 184,031,123 bytes
    BALZ 1.04, ex: 183,003,496 bytes


  15. #15
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by encode
    P3 is for museums.
    I dont agree. Just because something (or someone) is old, we should not dismiss them as "museum" pieces.


    Quote Originally Posted by encode
    As always, you may play with Visual Studio compile:
    Thank You!

  16. #16
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    BALZ 1.04 is now added to SFC and MFC tests.

    http://www.maximumcompression.com/

  17. #17
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    However, new version has lower compression even with 4 MB dictionary. Note, BALZ v1.03 has MINMATCH=3, BALZ v1.04 has MINMATCH=4. Newer version looks like loose too many short matches. Maybe BALZ v1.05 will have 1 MB dictionary, MINMATCH=3, and improved LZ-output coding.

  18. #18
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,503
    Thanks
    741
    Thanked 665 Times in 359 Posts
    in order to make compression fast and good, you need to use separate hash tables for short strings. say, tornado uses 3 separate tables: for 2-byte, 3-byte and 4+-byte strings. first two tables are rather large and addressed directly, without chains. lzma 4.43 used the same scheme and current versions uses separate table for 4-byte strings and last table only for 5+-byte strings. the same is true for rar. note that size of table should be much larger than max. distance for this type of matches. say, lzma uses one million entries for searchoing 4-byte strings while distances are probably limited to something about 50-200kb

  19. #19
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Yep, I will try to implement such multi-level hashing in BALZ.

  20. #20
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Just carefully tested such thing with BALZ. Well, it's works! However, I'll not hurry to add it.
    Code:
     
    int pos=head[HSIZE+gethash3(i)]; 
    if (pos) { 
      // search for short string 
    } 
    pos=head[gethash4(i)]; 
    while (pos) { 
      // do a hash chained search 
      pos=prev[pos]; 
    } 
    // ... 
    head[HSIZE+gethash3(i)]=i; 
    int h=gethash4(i) 
    prev[i]=head[h]; 
    head[h]=i; 
    // ...
    Even with large HSIZE match finder finds not all short strings. At the same time such thing may do slightly deeper search - since we limit a hash chain length to 8192, and 4 byte hash is better than 3 byte. Will do more tests...

  21. #21
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Another cool idea, which works, is to match MINMATCH from Rep() (recent offset) only. In this case we may encode MINMATCH WITHOUT offset, also MINMATCH freely can be even 2.

  22. #22
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,503
    Thanks
    741
    Thanked 665 Times in 359 Posts
    Quote Originally Posted by encode
    Even with large HSIZE match finder finds not all short strings.
    HSIZE is size of 4-byte hash here

    as i said before, with 3-byte strings whose offsets are limited to 4096, lzma used 64k entries

    btw, Kadach wrote that its better to check repdists first - before performing search in hashtables. look at lzma for implementation details

  23. #23
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by Bulat Ziganshin
    HSIZE is size of 4-byte hash here
    ...and of 3-byte hash as well...

    Quote Originally Posted by Bulat Ziganshin
    btw, Kadach wrote that its better to check repdists first - before performing search in hashtables. look at lzma for implementation details
    Will look again at Kadach.

  24. #24
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    New BALZ v1.05 comes out! What's new:
    + New match finder: HC5 - i.e. hash chains with 3-5-byte hashing!
    + Slightly improved parsing
    All in all, new version is *MUCH* faster and has higher compression, in some cases the compression improvement is really huge!

    It will be released within one or two weeks...

  25. #25
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Although, S&S parsing rules! Many times I compared various parsing schemes with S&S, S&S is the best - even with smaller dictionary it achieves higher compression than, say 2-byte lookahead lazy matching. Maybe I should combine new match finder (HC5) with such parsing...

  26. #26
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by encode
    All in all, new version is *MUCH* faster and has higher compression, in some cases the compression improvement is really huge!

  27. #27
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by encode
    Maybe I should combine new match finder (HC5) with such parsing...
    Tested the idea...
    Well, even such match finder is extremely slow with SS parsing. That means that with this kind of parsing we should use binary tree or similar stuff, maybe we may build a tree, like with some LZW implementations, and instead of a direct buffer search, just traverse thru this structure.
    Anyway, by now, let assume that BALZ is a fast LZ77 encoder. New BALZ v1.05 with "e" option is fast enough indeed.

  28. #28
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    As long as its still '*MUCH* faster' than previous versions!

  29. #29
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    The SS parsing's performance can't go out of my head. For example, BALZ with 1 MB window and SS parsing may beat BALZ with 4 MB window and lazy matching with 2-byte lookahead. Having said that with SS parsing encoder is dead slow (starting at 18X slower compared to my special lazy matching). Well, at least I can see how much "air" kept by current scheme. Note that in some cases and on some files the large dictionary make sense, even SS-based encoder with smaller dictionary may not compete with larger-dictionary brother with much simpler parsing strategy. Anyway, SS is still far from optimal, like I said in some cases like 'canterbury.tar' lazy matching provides significantly higher compression compared to SS. I tested LZMA with optimal and simple parsing schemes and I see how 'real' optimal parsing may help, with same settings (dict. size, match finder, and, the most important, simple parsing strategy) LZMA and BALZ are close together, of course, as they both utilize LZ77. Concluding, I will release what I currently have, and when we will see... something... Anyway, BALZ v1.05 is something special, believe me...


Similar Threads

  1. BALZ v1.12 is here!
    By encode in forum Data Compression
    Replies: 23
    Last Post: 10th June 2008, 17:02
  2. BALZ v1.11 is here!
    By encode in forum Data Compression
    Replies: 16
    Last Post: 30th May 2008, 17:48
  3. BALZ v1.05 is here!
    By encode in forum Data Compression
    Replies: 6
    Last Post: 9th May 2008, 00:34
  4. balz v1.05 is here!
    By encode in forum Forum Archive
    Replies: 1
    Last Post: 3rd May 2008, 01:34
  5. balz v1.03 is here!
    By encode in forum Forum Archive
    Replies: 43
    Last Post: 24th April 2008, 15:53

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •