Page 2 of 3 FirstFirst 123 LastLast
Results 31 to 60 of 62

Thread: balz v1.00 - new LZ77 encoder is here!

  1. #31
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by encode
    Are you sure?? On my both PC and laptop decompression went OK!
    Yes! I will try the test again after a reboot and post the results later.


    Here is a link to the compressed ENWIK8 file.

    http://rapidshare.com/files/98061712/enwik8.balz.html

  2. #32
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    Can't confirm either... However, I can confirm that compression on your PC went bad in some point - BALZ 1.02 compresses enwik8 to 29.2 MB -> 5 MB would be new Hutter prize record by a HUGE margin Your corrupted archive decompresses to 100,000,121 bytes also here.

    EDIT: I really HATE those kitten fonts at rapidshare! Why are they doing that to us?

    EDIT2: Fed original ENWIK8 and the LovePimple's decompressed one to file comparator - since byte 16775499 (including) everything is corrupted, I believe this is size of one block.

  3. #33
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Test failed again, but compressed size was 24.4 MB (25,622,851 bytes) this time. Decompressed file is once again longer than the original.

    BALZ does not report any errors during compression or decompression of the file.

  4. #34
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,583
    Thanks
    234
    Thanked 160 Times in 90 Posts
    Intel Core 2 duo E6600 Test OK.
    I have repeated the test on Enwik8 with my program of test for MOC and in file decompressed it is identical to the original .

  5. #35
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,023
    Thanks
    415
    Thanked 416 Times in 158 Posts
    The compressed size of ENWIK8 should be 30,634,726 bytes. CRC32 of the compressed file: 28DC2D3C

    Quote Originally Posted by Black_Fox
    16775499 (including) everything is corrupted, I believe this is size of one block.
    The block size is exactly 16,777,216 bytes (i.e. - 16 MB). In this case - the 5 MB is a compressed 16775498 bytes of ENWIK8 - after, the decompressor produces a garbage...

    Quote Originally Posted by LovePimple
    Test failed again, but compressed size was 24.4 MB (25,622,851 bytes) this time. Decompressed file is once again longer than the original.

    BALZ does not report any errors during compression or decompression of the file.
    The compressor and decompressor has no error checking - since the entire program so simple - nothing to be wrong... The larger decompressed file is possible only if youll try to decompress a corrupted file - since the original file size is kept in a small file header.

    The link to a valid compressed ENWIK8:
    ENWIK8.balz (29 MB)

    Try to decompress it on your machine!

    I think this may be the OS/hardware related problem... Maybe some compiler optimizations is not compatible with such old CPU... Get Core 2 Duo!


  6. #36
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,023
    Thanks
    415
    Thanked 416 Times in 158 Posts
    Analyzed the 5MB file provided by LovePimple. Well, looks like BALZ crashes on LovePimple's machine after compressing first block. I tried to reproduce such thing - kill the BALZ just after first 16 MB compression completes, without arithmetic encoder flushing and other stuff - and I got exactly the same file as LovePimple provided! Later, he says that he get 24 MB file, I think 24 MB is a compressed ENWIK8 without the last block. Concluding, the BALZ crashes on block boundaries on his CPU.

    The question - does BALZ displays 'done' message after compressing? If it is, the BALZ somehow crashes within the encode() function. Otherwise, the BALZ simple crashes on his machine...


  7. #37
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 798 Times in 489 Posts
    balz 1.02 was OK on LTCB. I verified decompression for both files.
    enwik8=30,634,726
    enwik9=268,552,062 (6 hours to compress, 1 minute to decompress)
    http://cs.fit.edu/~mmahoney/compression/text.html

  8. #38
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,583
    Thanks
    234
    Thanked 160 Times in 90 Posts
    Thanks Matt! Please test RZM is incredible compressor! Hi!

  9. #39
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by encode
    The link to a valid compressed ENWIK8:
    ENWIK8.balz (29 MB)

    Try to decompress it on your machine!
    Decompresses in about 30 seconds. Byte check (FC) reports the decompressed file to be exactly the same as the original.


    Quote Originally Posted by encode
    The question - does BALZ displays done message after compressing?
    YES!

  10. #40
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,023
    Thanks
    415
    Thanked 416 Times in 158 Posts
    Quote Originally Posted by Matt Mahoney
    balz 1.02 was OK on LTCB. I verified decompression for both files.
    enwik8=30,634,726
    enwik9=268,552,062 (6 hours to compress, 1 minute to decompress)
    http://cs.fit.edu/~mmahoney/compression/text.html
    Thank you, Matt!

    LovePimple
    A special DEBUG release for you:
    balz102d.zip (50 KB)
    - please report the compressors output!


  11. #41
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by encode
    LovePimple
    A special DEBUG release for you:
    balz102d.zip (50 KB)
    - please report the compressors output!
    Will run test now!

  12. #42
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    balz v1.02DEBUG by encode
    sizeof(head)=67108864
    sizeof(prev)=67108864
    sizeof(path)=201326592
    size=100000000
    ==================================================
    n=16777216
    pass 0 - exe transformation...
    exetransform(): no 0x4550 detected
    memset(head) - OK
    memset(path) - OK
    pass 1 - finding all matches...
    (optimizing 16384k block...)
    pass 2 - constructing path...
    smallest approx. cost=53408815
    pass 3 - encoding LZ output...
    encoding block - OK
    ==================================================
    n=16777216
    pass 0 - exe transformation...
    exetransform(): no 0x4550 detected
    memset(head) - OK
    memset(path) - OK
    pass 1 - finding all matches...
    (optimizing 16384k block...)
    37%


    Compressed file size is 5,177,344 bytes.
    Elapsed Time: 00:33:53.051 (2033.051 Seconds)

  13. #43
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Here is a link to the compressed file.

    http://rapidshare.com/files/98525223/enwik8.balz.html

  14. #44
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,023
    Thanks
    415
    Thanked 416 Times in 158 Posts
    Quote Originally Posted by LovePimple
    pass 1 - finding all matches...
    (optimizing 16384k block...)
    37%
    As you can see, BALZ falls out at the middle of the match searching... No done message displayed... Just have no idea what can be wrong...

  15. #45
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,023
    Thanks
    415
    Thanked 416 Times in 158 Posts
    Will think more deeply...

  16. #46
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,023
    Thanks
    415
    Thanked 416 Times in 158 Posts
    OK, check out a different compile (VS2005 SP1):
    balz102vs2005.zip (42 KB)


  17. #47
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by encode
    As you can see, BALZ falls out at the middle of the match searching... No done message displayed...
    I didnt take note on the first test, but it did display the done message after producing the 24.4 MB file on the second test.

    Quote Originally Posted by encode
    OK, check out a different compile (VS2005 SP1):
    balz102vs2005.zip (42 KB)
    This version seems to be working OK.

    Compression
    Compressed size: 30,634,726 bytes
    Elapsed Time: 02:46:44.181 (10004.181 Seconds)

    Decompression
    Elapsed Time: 00:00:37.520 (37.520 Seconds)
    Byte check shows decompressed file is identical to the original.

  18. #48
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,023
    Thanks
    415
    Thanked 416 Times in 158 Posts
    By the way, BALZ v1.03 is coming! It will be released within a week.
    What's new:
    + Now BALZ has two modes - fast (default) and max. Fast mode is very fast and crazily efficient, plus it uses less memory.
    You need
    ~144 MB for compression with fast mode and
    ~336 MB for compression with max mode.
    Fast mode is <u>50X times</u> faster in average than max mode!
    + Improved compression in max mode - new BALZ has slightly higher compression being fully compatible with BALZ v1.02
    New command line interface:
    balz ex test.dat test.balz - max mode
    balz e test.dat test.balz - fast mode

  19. #49
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,593
    Thanks
    801
    Thanked 698 Times in 378 Posts
    what's the dictionary size?

  20. #50
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,023
    Thanks
    415
    Thanked 416 Times in 158 Posts
    Same 512 KB!
    New fast mode uses unoptimized parsing (greedy). Plus, some tricks like discarding not good enough matches, for higher compression. I also played with Lazy Matching, well this trick makes BALZ 2X-4X times slower at the minor compression gain in most cases - thanks to discarding matches trick - in some cases it compensates such suboptimal parsing. Maybe in future versions I will add a Lazy Matching as an additional mode, but current Fast mode is a must have for BALZ!

  21. #51
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,593
    Thanks
    801
    Thanked 698 Times in 378 Posts
    it's hard to imagine how 300mb of memory may be used for 512kb dict

  22. #52
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,023
    Thanks
    415
    Thanked 416 Times in 158 Posts
    Quote Originally Posted by Bulat Ziganshin
    its hard to imagine how 300mb of memory may be used for 512kb dict
    Well, actually match finder can search matches within entire buffer (16 MB). If we find a match at longer distance than 512 KB - we stop the search. I use such approach to avoid circular buffer stuff.

  23. #53
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,023
    Thanks
    415
    Thanked 416 Times in 158 Posts
    At the same time I do SS parsing thru the entire buffer.

  24. #54
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,593
    Thanks
    801
    Thanked 698 Times in 378 Posts
    well, it seems like highly asymmetric solution

  25. #55
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,023
    Thanks
    415
    Thanked 416 Times in 158 Posts
    Yep!

  26. #56
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,023
    Thanks
    415
    Thanked 416 Times in 158 Posts
    Having said thad I will do additional experiments with Lazy Matching. Especially, with Deflate-like approach. Soon, I'll post the results and you will able to suggest which one to use as default.

  27. #57
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,593
    Thanks
    801
    Thanked 698 Times in 378 Posts
    are you read Kadach dissertation? he had a lot of interesting ideas including improved lazy matching. that's from my notebook:

    http://magicssoft.ru/content/docs/12/phd.zip

    в lzh части есть с полдюжины идей, которые не были известны на 97-й год,
    втч. четыре, представляющих для меня интерес и сейчас:

    - корзинная МЦ сортировка для построения дерева Хафмана
    - "оптимальные" алгоритмы парсинга
    - замена в хешируемой строке пробелов/нулевых символов на следующие за ней
    - приоритет REPDIST кодам перед обычными дистанциями

    (hope that now when i use Opera, russian chars will be rendered properly )

  28. #58
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,023
    Thanks
    415
    Thanked 416 Times in 158 Posts
    Of course, I'm familiar with this paper. Very hard to read, though - too many formulas and pseudo science language. I'm afraid that I'm not correctly understand the ideas listed there. Anyway, in my opinion nothing really new.

    BTW, (compression people should write this as BWT ), looks like magicssoft.ru is dead!
    Correct link to this paper (Russian):
    http://compression.ru/download/artic...d_1997_pdf.rar

  29. #59
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,023
    Thanks
    415
    Thanked 416 Times in 158 Posts
    Just found that new Greedy coder may outperform BALZ with "optimal" one:

    canterbury.tar
    BALZ v1.03, ex: 576,687 bytes
    BALZ v1.03, e: 533,254 bytes



    Testing new mode, I'm thinking about to throw away this SS stuff and keep Greedy and Lazy parsers only... If you'll test new BALZ you'll understand what I mean.


  30. #60
    Member Vacon's Avatar
    Join Date
    May 2008
    Location
    Germany
    Posts
    523
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hello everyone,

    Quote Originally Posted by encode
    It will be released within a week.
    Funny how time is flowing sometimes...

    Quote Originally Posted by encode
    Soon, Ill post the results and you will able to suggest which one to use as default.
    Quick decision, indeed. Long live democracy!
    Just kidding!
    Well done Ilia, waiting for testing-results.

    Best regards!

Page 2 of 3 FirstFirst 123 LastLast

Similar Threads

  1. PPMX - a new PPM encoder
    By encode in forum Data Compression
    Replies: 14
    Last Post: 30th November 2008, 17:03
  2. about files to test encoder
    By Krzysiek in forum Data Compression
    Replies: 3
    Last Post: 9th July 2008, 22:22
  3. fcm1 - open source order-1 cm encoder
    By encode in forum Data Compression
    Replies: 34
    Last Post: 5th June 2008, 00:16
  4. LZ77 speed optimization, 2 mem accesses per "round"
    By Lasse Reinhold in forum Forum Archive
    Replies: 4
    Last Post: 11th June 2007, 22:53
  5. Fast arithcoder for compression of LZ77 output
    By Bulat Ziganshin in forum Forum Archive
    Replies: 13
    Last Post: 15th April 2007, 18:40

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •