Results 1 to 30 of 82

Thread: BCM v0.01 - New BWT+CM-based compressor

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    406
    Thanked 403 Times in 153 Posts

    Thumbs up BCM v0.01 - New BWT+CM-based compressor

    Well, I decided to release the very first version of my BWT+CM compressor called BCM. No segmenter or filters included. It's still draft and I release it for users of my forum only, so it kind of non-public!

    Enjoy!

    BTW, during development I've found some CM compression-related improvements, so, it might be that I will continue work on my BALZ file compressor...
    Attached Files Attached Files

  2. #2
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts

    Thumbs up

    Thanks Ilia!

  3. #3
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    406
    Thanked 403 Times in 153 Posts

    Question

    Any comments, impressions? How it works for you? For example, how this compressor, in your opinion, compared to BALZ?


  4. #4
    Member
    Join Date
    Jan 2007
    Location
    Moscow
    Posts
    239
    Thanks
    0
    Thanked 3 Times in 1 Post
    Made some quick tests... BCM is 2-4 times faster than BBB and compression is sometimes worser, sometimes better. But, newer the less, on my relatively small files GRZip and UHBC mostly beat both BCM and BBB

  5. #5
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Some timings for ENWIK8 on my AMD Sempron 2400+ machine...


    Compressed size: 20.8 MB (21,824,948 bytes)


    Compression Time: 451.24 Seconds

    00 Days 00 Hours 07 Minutes 31.24 Seconds


    Decompression Time: 159.13 Seconds

    00 Days 00 Hours 02 Minutes 39.13 Seconds

  6. #6
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    MC SFC test:

    A10.jpg > 830,651
    AcroRd32.exe > 1,595,953
    english.dic > 1,180,338
    FlashMX.pdf > 3,761,095
    FP.LOG > 557,444
    MSO97.DLL > 1,927,126
    ohs.doc > 844,340
    rafale.bmp > 759,576
    vcfiu.hlp > 659,158
    world95.txt > 479,001

    Total = 12,594,682 bytes

  7. #7
    Programmer
    Join Date
    Jul 2008
    Location
    Finland
    Posts
    102
    Thanks
    0
    Thanked 1 Time in 1 Post
    I'm a bit confused as to what was the latest on your sorting Ilia? Are you still using std::sort? If I may humbly suggest couple of basic tips which you can explore. First take order1 stats of the block, then make the pointers based on these statistics. Then sort each of two byte buckets with custom the sedgewick quicksort variant. The sedgewick variant should work on dwords (like strcmp with dwords instead of bytes), directly comparing the input in backward direction (if we presume we are working on little endian platform), and the comparison function should be inlined. These are fairly easy modifications, as you can find sedgewick sorting source code online, and see bzip2 source for the initial order1 stats. If LZP is used before compression, with these settings you can match (or get close to) blizzard in compression speed, while requiring only 5n memory.

  8. #8
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    406
    Thanked 403 Times in 153 Posts
    Well, like I said, the main goal of BCM is just to test BWT-output coding with my CM - no more no less...

  9. #9
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    412
    Thanks
    38
    Thanked 64 Times in 38 Posts
    encode wrote:

    Well, now ("After finalizing BALZ")
    I have some time to release BCM with LZP preprocessing - to be in the game!

    pat357 wrote:

    "I guess it is possible to implement BWT for 4 cores..."
    "(Winrk might be even doing this already, not sure though..)"

    @encode:

    Do you think - would it be possible to modify your algorithm in direction using 2 or more threads ?

    the compression can done much faster ...

  10. #10
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    406
    Thanked 403 Times in 153 Posts
    Quote Originally Posted by joerg View Post
    encode wrote:
    Well, now ("After finalizing BALZ")
    I have some time to release BCM with LZP preprocessing - to be in the game!
    Well, as you know, I spent that time for a new version of my PIM file archiver... Just completely have no spare time these days...

    Quote Originally Posted by joerg View Post
    pat357 wrote:
    "I guess it is possible to implement BWT for 4 cores..."
    "(Winrk might be even doing this already, not sure though..)"
    I'm not sure that Malcolm is that advanced in BWT. He's about cool LZ and PPM-based stuff a la ROLZ and some PAQ snips like FPW and PWCM.

    Quote Originally Posted by joerg View Post
    @encode:
    Do you think - would it be possible to modify your algorithm in direction using 2 or more threads ?

    the compression can done much faster ...
    Not sure about the CM part. The sorting part, which makes compression so slow, can be, I guess. Since I use stable_sort() there is no way to do such thing. However, with custom sorting routines we may do that. Hm, I think it is possible to compress each block separately and independently with different threads - for example, 2-cores = we compress two blocks simultaneously, 4-cores = 4 blocks, etc.

  11. #11
    Member
    Join Date
    May 2008
    Location
    Antwerp , country:Belgium , W.Europe
    Posts
    487
    Thanks
    1
    Thanked 3 Times in 3 Posts
    Quote Originally Posted by encode View Post
    I'm not sure that Malcolm is that advanced in BWT. He's about cool LZ and PPM-based stuff a la ROLZ and some PAQ snips like FPW and PWCM.
    His current Mcomp V2.0 seems to use 4 cores for BWT :
    http://www.msoftware.biz/blog/2008/0...cs#attachments

    Code:
    > mcomp_x32
    LibMComp Demo Compressor (v2.00).
    Copyright (c) 2008 M Software Ltd.
    mcomp [options] pofile(s)
    
    Options:
        -m[..]    Compression method:
                  b    - BZIP2.
                  c    - Experimental DMC codec.
                  d    - Optimised deflate (df - fast, dx - max)
                 d64  - Optimised deflate64 (d64f - fast, d64x - max)
                  lz   - Optimised LZ (lzf - fast, lzx - max)
                  f    - Optimised ROLZ (ff - fast, fx - max)
                  f3   - Optimised ROLZ3 (f3f - fast, f3x - max)
                  p    - PPMd var.J.
                  sl   - Bitstream (LSB first).
                  sm   - Bitstream (MSB first).
                  w    - Experimental BWT codec.
        -MNN[k,m] Model size (in kb (default) or Mb, default 64M).
        -oNN      Order (for Bitstream and PPMd).
        -np       Display no progress information.
    
    > mcomp_x32 -mw fp.log  fplog_mw.mcomp
    
    LibMComp Demo Compressor (v2.00).
    Copyright (c) 2008 M Software Ltd.
    Thread pool size = 4
    Using 1280Mb (4 threads)
    100%
    From 20617071 => To 559960
    Elapsed time: 5.355s      (3759.82kps)
    Last edited by pat357; 10th September 2008 at 18:37.

Similar Threads

  1. BCM v0.09 - The ultimate BWT-based file compressor!
    By encode in forum Data Compression
    Replies: 22
    Last Post: 6th March 2016, 09:26
  2. BALZ - An Open-Source ROLZ-based compressor
    By encode in forum Data Compression
    Replies: 60
    Last Post: 6th March 2015, 16:47
  3. PPMX v0.05 - new PPM-based compressor
    By encode in forum Data Compression
    Replies: 49
    Last Post: 28th July 2010, 02:47
  4. BCM v0.08 - The ultimate BWT-based file compressor!
    By encode in forum Data Compression
    Replies: 78
    Last Post: 12th August 2009, 10:14
  5. DARK - a new BWT-based command-line archiver
    By encode in forum Forum Archive
    Replies: 138
    Last Post: 23rd September 2006, 21:42

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •