Page 1 of 2 12 LastLast
Results 1 to 30 of 41

Thread: Blizzard - Fast BWT file compressor!!!

  1. #1
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts

    Thumbs up Blizzard - Fast BWT file compressor!!!

    Blizzard - Fast BWT file compressor by Christian Martelock...

    Quote Originally Posted by Christian Martelock
    Blizzard is a fast BWT file compressor using ~6N memory (5N is easily possible, though). It uses an executable data filter and processes the transformed data with a simple context mixer.
    Blizzard has two modes of compression: a fast mode 'f' and a normal mode 'c'. The fast mode uses a simpler context mixer which provides a little bit less compression. Blizzard's main goal is to provide good (not best) BWT-compression at a reasonable speed in one single package.
    Christian's web site: http://christian.martelock.googlepages.com/
    Attached Files Attached Files

  2. #2
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    It's really fast and efficient. But, it has a poor compression on english.dic. Blizzard compress english.dic to 1,170,607 bytes. Most of compressor get it down to around 500-750kb.You know english.dic is a special file which a single word never occured. This made me think of about a new trick on common files introduced by christian

  3. #3
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    870
    Thanks
    47
    Thanked 105 Times in 83 Posts
    just a quick testing with Enwik9 and compared to CCMx (6) and RZM with and without using BWTlib as prefilter


    Code:
    enwik9		1.000.000.000 bytes
    enwik9.blz	  215.520.550 bytes
    
    enwik9.ccm	  175.238.339 bytes
    enwik9.rzm	  210.126.103 bytes
    
    enwik9.bwt.ccm	  186.057.176 bytes
    enwik9.bwt.rzm	  201.141.372 bytes
    Seems BWTlib help RZM but hurrs CCMx when compression Enwik9.


    it wold be nice to see if a BLizzard optimized for compression ratio would be as good as RZ;/CCMx in compression

  4. #4
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    870
    Thanks
    47
    Thanked 105 Times in 83 Posts
    just a quick update'

    i tried using bigger blocksize of 256000000 bytes

    enwik9.blz2 175.414.896 bytes

    Thats damn close to CCMx (6)

    this seem interesting and I will be back with some decompression time benchmarks

  5. #5
    Tester
    Black_Fox's Avatar
    Join Date
    May 2008
    Location
    [CZE] Czechia
    Posts
    471
    Thanks
    26
    Thanked 9 Times in 8 Posts
    Tested. It may not be the best within BMP sector, but on PSD file it kicks ass massively regarding speed/ratio. Overal it's pretty nice, Christian isn't genius just for nothing.
    I am... Black_Fox... my discontinued benchmark
    "No one involved in computers would ever say that a certain amount of memory is enough for all time? I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again." -- Bill Gates

  6. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,239
    Thanks
    192
    Thanked 968 Times in 501 Posts
    Code:
                 [BCM]   [o1rc9f] [o1rc9g] [Bliz f] [Bliz c] [Bliz' c]
    calgary.tar  791,435 785456   784037   800280   790491   787745
    book1        212,674 211047   210908   215982   212130   212076
    world95.txt  474,291 470390   469640   481578   474891   473631
    For last column the byte order in files was reversed before compression.
    Last edited by Shelwien; 3rd July 2008 at 06:33.

  7. #7
    Programmer
    Join Date
    Feb 2007
    Location
    Germany
    Posts
    420
    Thanks
    28
    Thanked 151 Times in 18 Posts
    Hi there!

    Thank you for good feedback and the tests!

    I did not announce Blizzard in here because it's a boring common implementation. Well, the context mixer directly after the BWT transform differs from most other BWT implementations, but we already have BBB and BCM doing the same.

    Quote Originally Posted by osmanturan View Post
    It's really fast and efficient. But, it has a poor compression on english.dic. Blizzard compress english.dic to 1,170,607 bytes.
    Try using blocksize 10000 - this should be a bit better. But after all, Blizzard is just a BWT. So, such performance is expected (OT: CCM does not use an LZ layer).

    Btw., there is a tiny update on my site. I just removed two lines of code from the exe-filter.

  8. #8
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Christian View Post
    CCM does not use an LZ layer
    Could you help us to point where does speed come from?

  9. #9
    Programmer
    Join Date
    Feb 2007
    Location
    Germany
    Posts
    420
    Thanks
    28
    Thanked 151 Times in 18 Posts
    Well, there isn't a particular place where CCM gains speed. e.g. on precompressed data where no LZ layer can help, CCM is still faster than BIT, CMM or LPAQ. On the other hand, CMM and LPAQ are stronger than CCM - LPAQ especially on text data. I often said, that CCM is different from PAQ. Well, I think this is the reason for its speed. PAQ is really great, but you should not use it as a template in order to write a fast context mixer.

  10. #10
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    (Well...becomes more out of topic. forgive me for this)

    Could you explain your statistics storage techniques? It's totally hashed or there is hashed list besides a small context tree for frequently occured contexts or anything else? I would be very happy if you point out some other details. Surely most of other people will be happy, too.

  11. #11
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts
    @christian

    your new bwt-compressor
    seems to be a very fast one for a bwt-class-compressor !

    congratulation !

    the cmdline "bliz c d:db.dmp c:db.bliz 100000000"

    requests RAM-memory up to 589.496 Kbytes

    ---
    Block size is 97656 KiB (587463 KiB allocated).

    Compressing (normal profile)...

    All done: 633136 KiB -> 31742 KiB (5.01%)
    ---

    on a win2003-system with 4 gbyte ram and two xeon - prozessors:

    compression:

    sourcefile db.dmp 648.331.264 bytes

    balz113 e 33.390.086 bytes 9 min
    7zip 35.151.362 bytes 19 min

    bcm 001 31.927.387 bytes 22 min
    bcm 002 31.022.446 bytes 17 min

    bliz 0.24 32.504.413 bytes 8 minutes

    ---

    this means

    1. bliz is two times faster then other bwt-compressors

    2. bliz can beat 7zip in compression ratio with lesser then half the time !

    very good work !

    what about to implement compression of a whole directory inclusive subdirs ?
    Last edited by joerg; 3rd July 2008 at 13:24.

  12. #12
    Programmer
    Join Date
    Feb 2007
    Location
    Germany
    Posts
    420
    Thanks
    28
    Thanked 151 Times in 18 Posts
    Quote Originally Posted by osmanturan View Post
    Could you explain your statistics storage techniques? It's totally hashed ...
    It is hashed and uses simple collision resolution. Of course, the order-1 submodel is not hashed.

    Quote Originally Posted by osmanturan View Post
    I would be very happy if you point out some other details.
    In the flagship-thread Przemyslaw quoted a post I wrote on the old forum. It's still valid. I think it is a good starting point.

    @joerg:
    Thanks you! Can you please test bliz with blocksize 16800000 and 33600000. This should be interesting, too. The smaller blocksize should be better on this file.
    Regarding the archiver functionality, I'm sorry. But it just takes too much time.
    Last edited by Christian; 3rd July 2008 at 13:40. Reason: typing error

  13. #13
    Programmer
    Join Date
    May 2008
    Location
    PL
    Posts
    307
    Thanks
    68
    Thanked 166 Times in 63 Posts
    Quote Originally Posted by Christian View Post
    In the flagship-thread Przemyslaw quoted a post I wrote on the old forum. It's still valid. I think it is a good starting point.
    Christian, for quite a long time I was inactive in compression and recently I've found your CCM. Congratulations as you've managed to beat my favourite compression artists: Dmitry (PPMd, PPMonstr) and Matt+Alexander (lpaq).

    http://www.maximumcompression.com/data/summary_mf.php :
    LPAQ8 78704186 bytes in 1312 s
    CCM 1.30c 78598980 bytes in 277 s
    PPMonstr J 78086417 bytes in 1628 s

    The results (speed) are impressive. I believe that you should release CCM as an open-source CM library (it can be commercial, but free for non-commercial use) to take place of widely-used PPMd.

    Please join us at http://ctxmodel.net/rem.pl?-21

  14. #14
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts
    -------------------------------------------------------------------------------
    Blizzard 0.24 (Jun 29 200 - Copyright (c) 2008 by Christian Martelock
    -------------------------------------------------------------------------------
    bliz c d:db.dmp c:bl1.bliz 100000000

    Block size is 97656 KiB (587463 KiB allocated).

    All done: 633136 KiB -> 31742 KiB (5.01%)
    ---
    bliz c d:db.dmp c:bl2.bliz 16800000

    Block size is 16406 KiB (98694 KiB allocated).

    All done: 633136 KiB -> 30765 KiB (4.86%)
    ---
    bliz c d:db.dmp c:bl3.bliz 33600000

    Block size is 32812 KiB (197387 KiB allocated).

    All done: 633136 KiB -> 31103 KiB (4.91%)
    ---

    compression result:

    sourcefile db.dmp 648.331.264 bytes

    7zip 35.151.362 bytes 19 min

    bcm 001 31.927.387 bytes 22 min
    bcm 002 31.022.446 bytes 17 min

    bliz 0.24

    bliz blocksize 100000000 -- 32.504.413 bytes 8 minutes
    bliz blocksize 16800000 -- 31.503.372 bytes 7 minutes
    bliz blocksize 33600000 -- 31.850.304 bytes 7,5 minutes

    @christian

    Thanks you!

    You are right:
    with lower blocksize=16800000 - compression ist faster and better.

    i understand:
    "Regarding the archiver functionality,
    I'm sorry. But it just takes too much time."

    but i think
    in the future
    if we want to use such new compressor for real purposes
    we need support for directory inclusive subdirectories
    ...
    thinking...
    ...
    maybe it would be possible
    to create a compressor-independed archiv-format
    like " *.ISO = CD-image-format ",
    which contains the directory-structure
    and all files within this structure are in a compressed format
    ..
    what do you thinking about this?...

    - may be another programer can implement
    such a compressor-independed archiv-format

    - would you use such a independed archiv-format?

    - it would be usable
    for you other wonderful compressors like ccm or slug too...

    - or would you tend to give some compression-code
    for such a implementation?

    - may be dual-license (commercial) or open-source?

    Thank you for your wonderful compressor programs !

    Have you any plans for the future in this way ?
    Last edited by joerg; 3rd July 2008 at 16:38.

  15. #15
    Member
    Join Date
    May 2008
    Location
    Earth
    Posts
    115
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Arrow

    Quote Originally Posted by joerg View Post
    -------------------------------------------------------------------------------
    Blizzard 0.24 (Jun 29 200 - Copyright (c) 2008 by Christian Martelock

    maybe it would be possible
    to create a compressor-independed archiv-format
    like " *.ISO = CD-image-format ",
    which contains the directory-structure
    and all files within this structure are in a compressed format
    ...
    - may be another programer can implement
    such a compressor-independed archiv-format
    ...
    - it would be usable
    for...other... compressors...
    FreeArc supports this already. You just need to configure it for all external compressors you need.

  16. #16
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts

  17. #17
    Programmer
    Join Date
    Feb 2007
    Location
    Germany
    Posts
    420
    Thanks
    28
    Thanked 151 Times in 18 Posts
    Thanks a lot Przemyslaw!

    Quote Originally Posted by inikep View Post
    LPAQ8 78704186 bytes in 1312 s
    CCM 1.30c 78598980 bytes in 277 s
    PPMonstr J 78086417 bytes in 1628 s
    Still, MFC favors CCM over LPAQ because of its data filters. I don't know about PPMonstr, though.

    Quote Originally Posted by inikep View Post
    I believe that you should release CCM as an open-source CM library (it can be commercial, but free for non-commercial use) to take place of widely-used PPMd.
    I'm thinking about it. But it's also a matter of time. And replacing PPMd with CCM is no win-win situation. PPMd beats CCM on some files like "world95.txt".

    @Joerg:
    Currently, I don't have any time for a new project. But this might change in a few months. So, once everything has settled in again, I might release some of my work as OSS.

  18. #18
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    735
    Thanked 660 Times in 354 Posts
    my own suggestions list:
    1) implement stdin/stdout interfaces and win32/linux versions for all your compressors. this makes it very easy to use them inside full-scale archivers

    2) remove 2gb limit in rzm. it's really disappointing since rzm has beatiful speed/compression ratio

    3) if rzm will allow to increase dictionary, and dictionary size may be set independent from hash size, it will be possible to outperform winrk

    ppmonstr doesn't include filters, durilca is ppmonstr+filers, while durilca'light is ppmd+filters

  19. #19
    Programmer
    Join Date
    May 2008
    Location
    PL
    Posts
    307
    Thanks
    68
    Thanked 166 Times in 63 Posts
    Quote Originally Posted by Christian View Post
    Still, MFC favors CCM over LPAQ because of its data filters.
    Interesting, because MFC it's over 400 files in 30+ formats and CCM gets on input all the files mixed in a single TAR archive.

    Still, CCM is better on average than LPAQ on:
    http://www.squeezechart.com/main.html
    http://www.winturtle.netsons.org/MOC/MOC.htm

    My conclusion is that LPAQ is better than CCM on TXT and EXE because of its filters or LPAQ is tuned for SFC.



    Quote Originally Posted by Christian View Post
    PPMd beats CCM on some files like "world95.txt".
    True, and this is a place for improvement. You should detect textual files (what is quite simple) and use different algorithm (models, weights, mixers) just like lpaq2+ does it.
    Last edited by inikep; 3rd July 2008 at 21:00.

  20. #20
    Programmer
    Join Date
    Feb 2007
    Location
    Germany
    Posts
    420
    Thanks
    28
    Thanked 151 Times in 18 Posts
    Quote Originally Posted by inikep View Post
    Interesting, because MFC it's over 400 files in 30+ formats and CCM gets on input all the files mixed in a single TAR archive.
    CCM does data filtering based on content, so the TAR archive does not matter.

    Quote Originally Posted by inikep View Post
    True, and this is a place to improvement. You should detect textual files (what is quite simple) and use different algorithm (models, weights, mixers) just like lpaq2+ does it.
    Yes, that's the way I'd do it, too. But at the time I wrote CCM, I was obsessed with using only one simple algorithm for everything.
    And then I moved on with other things like Slug, RZM and now Bliz. But I'm sure, that I'd be able to improve CCM a bit further without losing speed. Nonetheless, a rewrite of an improved CCM would take long. So, for the time being, I stick to lightweight projects like Slug and Bliz. They can be finished in several hours and are still fun (as long as I don't add archiver functionality).

  21. #21
    Tester
    Black_Fox's Avatar
    Join Date
    May 2008
    Location
    [CZE] Czechia
    Posts
    471
    Thanks
    26
    Thanked 9 Times in 8 Posts
    0.24b is faster and better
    I am... Black_Fox... my discontinued benchmark
    "No one involved in computers would ever say that a certain amount of memory is enough for all time? I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again." -- Bill Gates

  22. #22
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    735
    Thanked 660 Times in 354 Posts
    both contains MM data. you may find MM-cleaned comparison of ccmx 1.23 and lpaq1 at http://freearc.org/Maximal-Practical-Compression.aspx

    my own comparison on text data:
    Code:
     
    7-zip 4.58     6.847  0.239 12.058
    rzm 0.07h      7.366  0.122  3.588
    uharc -mx      7.404  0.239  0.262
    freearc -mx    7.699  0.514  0.990
    ccmx 1.30      8.098  0.196
    freearc -max   8.374  0.223  0.287
    lpaq8          8.705  0.058
    on mainly binary data
    Code:
     
    7-zip 4.58     5.133  0.292 10.259
    freearc -mx    5.252  0.340  1.891
    uharc -mx      5.322  0.168  0.191
    freearc -max   5.407  0.236  0.589
    rzm 0.07h      5.430  0.139  2.578
    ccmx 1.30      5.448  0.217
    lpaq8          5.684  0.059
    first column is compression ratio, second and third - compression/decompression speed
    Last edited by Bulat Ziganshin; 3rd July 2008 at 22:11.

  23. #23
    Programmer
    Join Date
    Feb 2007
    Location
    Germany
    Posts
    420
    Thanks
    28
    Thanked 151 Times in 18 Posts
    Quote Originally Posted by Black_Fox View Post
    0.24b is faster and better
    That's what I call a 'solid' improvement. Thank you for testing Blizzard BlackFox!

  24. #24
    Programmer
    Join Date
    May 2008
    Location
    PL
    Posts
    307
    Thanks
    68
    Thanked 166 Times in 63 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    MFC test, sorted by efficiency - FreeArc occupies 3 top places!
    Good work Bulat!

  25. #25
    Member
    Join Date
    May 2008
    Location
    Antwerp , country:Belgium , W.Europe
    Posts
    487
    Thanks
    1
    Thanked 3 Times in 3 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    both contains MM data. you may find MM-cleaned comparison of ccmx 1.23 and lpaq1 at http://freearc.org/Maximal-Practical-Compression.aspx

    my own comparison on text data:
    Code:
     
    7-zip 4.58     6.847  0.239 12.058
    rzm 0.07h      7.366  0.122  3.588
    uharc -mx      7.404  0.239  0.262
    freearc -mx    7.699  0.514  0.990
    ccmx 1.30      8.098  0.196
    freearc -max   8.374  0.223  0.287
    lpaq8          8.705  0.058
    on mainly binary data
    Code:
     
    7-zip 4.58     5.133  0.292 10.259
    freearc -mx    5.252  0.340  1.891
    uharc -mx      5.322  0.168  0.191
    freearc -max   5.407  0.236  0.589
    rzm 0.07h      5.430  0.139  2.578
    ccmx 1.30      5.448  0.217
    lpaq8          5.684  0.059
    first column is compression ratio, second and third - compression/decompression speed
    @Bulat : do you have an idea why FA -mx for binary data is 5x slower compared to 7zip ?
    For binary data, I would expect that both use LZMA....
    Are the FA preprocessors (exe, delta, rep) causing the difference ?
    Thanks in advance !

  26. #26
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    735
    Thanked 660 Times in 354 Posts
    because "mainly binary data" includes, may be, 20% of texts. actually, this compression profile is rather close to that of squeeze chart or mfc

  27. #27
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts
    as anticipated the results of bliz 0.24b

    accordingly to the file db.dmp

    are the same as with bliz 0.24

  28. #28
    Programmer
    Join Date
    Feb 2007
    Location
    Germany
    Posts
    420
    Thanks
    28
    Thanked 151 Times in 18 Posts
    Thanks for the further test, Joerg!

    Quote Originally Posted by joerg View Post
    as anticipated the results of bliz 0.24b

    accordingly to the file db.dmp

    are the same as with bliz 0.24
    Yep, the results should be the same. I only removed two or three lines of code from the executable filter - which is not relevant for a bitmap, of course.

  29. #29
    Programmer
    Join Date
    Jun 2008
    Location
    Japan
    Posts
    14
    Thanks
    0
    Thanked 2 Times in 2 Posts
    Compression speed test on Maniscalco's Corpus:

    Code:
            Timing results (in secs, blocksize = default)
                   bzip2  bliz24b    bcm002a
    abac            0.15     0.04      45.81
    abba           11.21  2144.79   11303.42
    book1x20        2.59     0.54   24616.59
    fib_s14930352  15.07     0.07   20739.16
    fss9            2.84     0.03    give up
    fss10          12.01     0.09
    houston         3.31     0.06
    paper5x80       0.89     0.01
    test1           1.89     0.01
    test2           1.89     0.03
    test3           2.04     0.03
    bliz0.24b is very fast, excluding 'abba'.

  30. #30
    Member
    Join Date
    May 2008
    Location
    Earth
    Posts
    115
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Arrow

    Quote Originally Posted by pat357 View Post
    @Bulat : do you have an idea why FA -mx for binary data is 5x slower compared to 7zip ?
    For binary data, I would expect that both use LZMA....
    Are the FA preprocessors (exe, delta, rep) causing the difference ?
    Thanks in advance !
    The issue is caused not by preprocessors itself, but by lack of memory.
    FreeArc compression algorithms are actully a chain of 2 to 4 algorithm, first one is applied to the data, second one is applied to the output of 1st algortithm and so on, the output of the last algorithm is written to archive.
    In -mx mode, some algorithms use too much memory and thus have to be applied sequentially: first algorithm output is written to the temporary file, then it is read by second and so on. That's too slow. So, -mx mode is usually impractical.
    EXE(BCJ) & Delta are definitely not the reason: they use too little memory. Probably that's the REP.
    And text shouldn't be thae reason, because text usually compresses faster than binaries.

Page 1 of 2 12 LastLast

Similar Threads

  1. BCM v0.09 - The ultimate BWT-based file compressor!
    By encode in forum Data Compression
    Replies: 22
    Last Post: 6th March 2016, 09:26
  2. BCM v0.08 - The ultimate BWT-based file compressor!
    By encode in forum Data Compression
    Replies: 78
    Last Post: 12th August 2009, 10:14
  3. BCM v0.01 - New BWT+CM-based compressor
    By encode in forum Data Compression
    Replies: 81
    Last Post: 9th February 2009, 15:47
  4. TURTLE incoming... Fast PPM file compressor.
    By Nania Francesco in forum Forum Archive
    Replies: 104
    Last Post: 8th August 2007, 20:40
  5. Fast File Compare
    By LovePimple in forum Forum Archive
    Replies: 21
    Last Post: 5th February 2007, 17:13

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •