Page 1 of 3 123 LastLast
Results 1 to 30 of 82

Thread: BCM v0.01 - New BWT+CM-based compressor

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts

    Thumbs up BCM v0.01 - New BWT+CM-based compressor

    Well, I decided to release the very first version of my BWT+CM compressor called BCM. No segmenter or filters included. It's still draft and I release it for users of my forum only, so it kind of non-public!

    Enjoy!

    BTW, during development I've found some CM compression-related improvements, so, it might be that I will continue work on my BALZ file compressor...
    Attached Files Attached Files

  2. #2
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts

    Thumbs up

    Thanks Ilia!

  3. #3
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts

    Question

    Any comments, impressions? How it works for you? For example, how this compressor, in your opinion, compared to BALZ?


  4. #4
    Member
    Join Date
    Jan 2007
    Location
    Moscow
    Posts
    239
    Thanks
    0
    Thanked 3 Times in 1 Post
    Made some quick tests... BCM is 2-4 times faster than BBB and compression is sometimes worser, sometimes better. But, newer the less, on my relatively small files GRZip and UHBC mostly beat both BCM and BBB

  5. #5
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Some timings for ENWIK8 on my AMD Sempron 2400+ machine...


    Compressed size: 20.8 MB (21,824,948 bytes)


    Compression Time: 451.24 Seconds

    00 Days 00 Hours 07 Minutes 31.24 Seconds


    Decompression Time: 159.13 Seconds

    00 Days 00 Hours 02 Minutes 39.13 Seconds

  6. #6
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    MC SFC test:

    A10.jpg > 830,651
    AcroRd32.exe > 1,595,953
    english.dic > 1,180,338
    FlashMX.pdf > 3,761,095
    FP.LOG > 557,444
    MSO97.DLL > 1,927,126
    ohs.doc > 844,340
    rafale.bmp > 759,576
    vcfiu.hlp > 659,158
    world95.txt > 479,001

    Total = 12,594,682 bytes

  7. #7
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts

    Hi Encode!

    Have done a good job with BCM but seems anchors slow, needs to work a lot on the speed! Does it also use a conversion MTF?

  8. #8
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    Quote Originally Posted by Nania Francesco View Post
    Have done a good job with BCM but seems anchors slow, needs to work a lot on the speed! Does it also use a conversion MTF?
    No it has no MTF or other things like DC/IF/WFC...

  9. #9
    Tester
    Black_Fox's Avatar
    Join Date
    May 2008
    Location
    [CZE] Czechia
    Posts
    471
    Thanks
    26
    Thanked 9 Times in 8 Posts
    compression 21 kB/s, decompression 1495 kB/s, total size 13.791.746 bytes

    pht.psd took over 20 minutes to sort! (I almost thought about guru meditation, as would Francesco say )
    I am... Black_Fox... my discontinued benchmark
    "No one involved in computers would ever say that a certain amount of memory is enough for all time? I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again." -- Bill Gates

  10. #10
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Tested on AMD Athlon 64 X2 Dual 4200+ (2.2 GHz), 1 GB RAM, WinXP SP3. Here is the results:

    SFC
    A10.jpg -> 830,651 (1.105 seconds)
    AcroRd32.exe -> 1,595,953 (6.545 seconds)
    english.dic -> 1,180,338 (4.382 seconds)
    FlashMX.pdf -> 3,761,095 (6.484 seconds)
    FP.log -> 557,444 (42.401 seconds)
    MSO97.dll -> 1,927,126 (5.069 seconds)
    ohs.doc -> 844,340 (33.651 seconds)
    rafale.bmp -> 759,576 (5.124 seconds)
    vcfiu.hlp -> 659,158 (5.933 seconds)
    world95.txt -> 479,001 (4.054 seconds)

    ENWIK8
    21,824,948 (180.592 seconds)

    Note: All tests' timing was measured by AcuTimer.

    Do you think ohs.doc is a worst-case example for BWT transform? I have never tested a BWT compressor on SFC.

    Could you explain your CM back-end implementation? Kind of approximation or low-order CM or anything else?

  11. #11
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    Quote Originally Posted by osmanturan View Post
    Do you think ohs.doc is a worst-case example for BWT transform?
    Seems to be. But the real worst case is for sure "pht.psd".
    Actually, the most (or ALL) BWT compressors have the so called worst case. Even UHBC is really unstable on some kind of data...
    Quote Originally Posted by osmanturan View Post
    Could you explain your CM back-end implementation? Kind of approximation or low-order CM or anything else?
    It is low order CM. Some things are simplified to get better speed. We dynamically mix together various models with various contexts, including models with various update speeds. BCM v0.01 has five models and four dynamic mixers...
    To get the basic idea what we can mix - check BBB source...

    Well, I'm thinking to move to more advanced things like unary coder...

  12. #12
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts
    bcm v0.01

    requests for different functions - different amount of ram

    on a system with 4 gbyte ram:

    sorting 460.752 MBytes long time run
    encoding 329.548 MBytes short time run
    begin sorting 460.552 MBytes
    sorting 460.752 MBytes long time run

    on a system with only 512 MBytes ram
    bcm requests 80 up to 460 MBytes

    compression:

    sourcefile db.dmp 648.331.264 bytes

    balz113 e 33.390.086 bytes 9 min
    bcm 31.927.387 bytes 22 min
    7zip 35.151.362 bytes 19 min

    that means

    1. bcm compresses better then balz113
    but needs definitively more memory and more time

    2. bcm can beat 7zip in compression ratio

    Good work !

  13. #13
    Member
    Join Date
    May 2008
    Location
    Antwerp , country:Belgium , W.Europe
    Posts
    487
    Thanks
    1
    Thanked 3 Times in 3 Posts
    BCM v0.01 on enwik9 :

    timer bcm e enwik9 enwik9_bcm001.bcm
    --> 188.432.286 bytes in 1330.501s

    System C2 Quad E6600 @2.2 Mhz / 4GB RAM

    @encode :
    Keep up the good work!
    Do you have any plans to implement WFC/DWFC /MTF/Fragmentation in BCM ?

  14. #14
    Member
    Join Date
    May 2008
    Location
    Antwerp , country:Belgium , W.Europe
    Posts
    487
    Thanks
    1
    Thanked 3 Times in 3 Posts
    Quote Originally Posted by osmanturan View Post
    Do you think ohs.doc is a worst-case example for BWT transform? I have never tested a BWT compressor on SFC.
    An interesting link to compare different sort algoritmes in "worst case" :
    http://www.michael-maniscalco.com/msufsort.htm

    (look below at "The Gauntlet (Universal Robustness Corpus)"

  15. #15
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    Quote Originally Posted by pat357 View Post
    BCM v0.01 on enwik9 :
    Keep up the good work!
    Do you have any plans to implement WFC/DWFC /MTF/Fragmentation in BCM ?
    No reason to do WFC or other things and especially cheap MTF with my CM, further CM improvements are still possible!
    Fragmentation already implemented, but it in some cases it do wrong job - I need more advanced detection, that's why I not included it in the first release...

  16. #16
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts

    Cool

    OK, the development of the next BCM in progress...

    2 Shelwien:
    The catch was in if (abs(p2-p1)>LIMIT)... New version still uses my simple and division-less mixers, but with abs() stuff.

    Some testing results, again with no segmentation:

    calgary.tar -> 792,883 bytes

    world95.txt -> 474,647 bytes

    book1 -> 213,132 bytes

    Well, I'm onto something with such results...

  17. #17
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,239
    Thanks
    192
    Thanked 968 Times in 501 Posts
    I think its still too much...

    1. Do you use forward comparisons for BWT?

    2. What about inheritance for higher orders?
    (like what I had to do in o2rc to get some effect from order2)

    3. What about rangecoder precision? If you're still using
    Matt's rc or something like that, then you might be losing
    like 2-3k to redundancy.

    ---------

    Also I tested more things with these coders recently

    1. A simple weight increment/decrement (depending on p1<p2)
    was added in a "else" branch of that limit check.
    In the end it allowed to win like 10 bytes, and after a very long
    tuning too.

    2. Mask lengths for mixer limits were expanded to whole 15 bits -
    gained like 200 bytes in o1rc and even less in o2rc with that.

    3. The test with remixing the o0 & o1 estimation after mixer
    updates and using the new estimations to update the w01
    mixer... failed, even though it helped on some other data

    ---------

    Next plans are:

    1. Switching to 64-bit bytewise rangecoder, then to 64-bit
    bitwise arithmetic coder. Also maybe rising the counter
    and mixer precision too.

    2. Generating the wr-tables (update tables for Node2i)
    instead of using the precalculated ones. Well, I could
    just use the other update function, but it contains a division,
    and the slower coders are harder to tune.

    3. Still haven't tried using a quantized weights in w01 context.

    4. Have to think about replacing these fast counters with
    "patch counters" working with strings of (bit^(p>0.5)) bits.

    5. More detailed mixer/counter initialization; order1 inheritance;
    rising the update speed on a sequence of mispredictions; etc etc

    ---------

    Actually it might be useless to mess with a bitwise coder with
    its inherent redundancy, but still I'd like to try out at least
    some of these things.

  18. #18
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    Shelwien - please check PM!

  19. #19
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts

    BCM results in MOC Test

    BCM in MOC test:
    414.721.456 Bytes
    COMP. TIME =4.989,256 sec. (Over time limit)
    DEC. TIME=395,678 sec.

  20. #20
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    In next version, possibly, I'll reduce block size and/or add a file segmenter, which may and should lead to faster and much faster compression/decompression...

  21. #21
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    I move on...

    calgary.tar -> 791,622 bytes

    book1 -> 212,844 bytes

    world95.txt -> 474,906 bytes


  22. #22
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    calgary.tar -> 791,435 bytes

    book1 -> 212,674 bytes

    world95.txt -> 474,291 bytes


  23. #23
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    Testing results with various block sizes:

    sfc.7z:

    1M -> 12,859,246 bytes
    2M -> 12,770,368 bytes
    3M -> 12,821,424 bytes
    4M -> 12,716,255 bytes
    5M -> 12,804,151 bytes
    6M -> 12,899,078 bytes
    7M -> 12,826,038 bytes
    8M -> 12,935,605 bytes
    16M -> 13,018,028 bytes
    32M -> 13,167,873 bytes
    64M -> 13,183,156 bytes

    So anyways, I think I should keep, say, 8 MB or 16 MB block and do descent file segmentation. Note I have a few additional ideas about good auto-block size.
    Well, if you have some experience or ideas about segmentation - please share!

  24. #24
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    BCM v0.02 is here!
    Attached Files Attached Files

  25. #25
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts
    bcm v0.02

    on a system with 4 gbyte ram:

    sorting... request 58.640 kilobytes
    encoding... request 42.236 kilobytes

    compression:

    sourcefile db.dmp 648.331.264 bytes

    balz113 e 33.390.086 bytes 9 min
    7zip 35.151.362 bytes 19 min
    bcm 001 31.927.387 bytes 22 min
    bcm 002 31.022.446 bytes 17 min

    that means

    1. bcm 002 compresses a little bit better then bcm001
    and needs only a small amount of memory - wow !

    2. bcm beat 7zip in compression ratio

    very good work !

    what about to implement compression of a whole directory incluive subdirs ?

  26. #26
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,239
    Thanks
    192
    Thanked 968 Times in 501 Posts
    The question was what's better,

    const DWORD xmid=x1+(( ((unsigned __int64)(x2-x1))*DWORD(predictor.P()<<16))>>32);

    or

    const DWORD xmid=x1+(( ((unsigned __int64)(x2-x1))*DWORD(predictor.P()<<15))>>31);

    And here's what some compilers generate...

    http://shelwien.googlepages.com/mul64test.rar

    Of course, IntelC wins
    Though even for it I had to rewrite the expression
    so that there won't be any ambiguity.

  27. #27
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    Quote Originally Posted by joerg View Post
    what about to implement compression of a whole directory incluive subdirs ?
    These days I will decide what kind of project I will work on.
    If it will be BCM - I have some plans:
    + Make Linux version
    + Do further CM improvement
    + Add segmenter/filters
    + Add mulltiple files/folders compression

  28. #28
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts

    Thumbs up

    Quote Originally Posted by encode View Post
    BCM v0.02 is here!
    Thanks Ilia!

  29. #29
    Tester
    Black_Fox's Avatar
    Join Date
    May 2008
    Location
    [CZE] Czechia
    Posts
    471
    Thanks
    26
    Thanked 9 Times in 8 Posts
    Thanks! Is 0.02 eligible for public table entry or still something more non-public?

    compression 21 kB/s, decompression 983 kB/s, total size 13.315.201 bytes
    (0.01 = 21 kB/s, 1495 kB/s, 13.791.746)
    I am... Black_Fox... my discontinued benchmark
    "No one involved in computers would ever say that a certain amount of memory is enough for all time? I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again." -- Bill Gates

  30. #30
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts

    Wink

    Quote Originally Posted by Black_Fox View Post
    Thanks! Is 0.02 eligible for public table entry or still something more non-public? )
    Well, this version is still for forum readers only...
    But feel free to add results for both versions!


Page 1 of 3 123 LastLast

Similar Threads

  1. BCM v0.09 - The ultimate BWT-based file compressor!
    By encode in forum Data Compression
    Replies: 22
    Last Post: 6th March 2016, 09:26
  2. BALZ - An Open-Source ROLZ-based compressor
    By encode in forum Data Compression
    Replies: 60
    Last Post: 6th March 2015, 16:47
  3. PPMX v0.05 - new PPM-based compressor
    By encode in forum Data Compression
    Replies: 49
    Last Post: 28th July 2010, 02:47
  4. BCM v0.08 - The ultimate BWT-based file compressor!
    By encode in forum Data Compression
    Replies: 78
    Last Post: 12th August 2009, 10:14
  5. DARK - a new BWT-based command-line archiver
    By encode in forum Forum Archive
    Replies: 138
    Last Post: 23rd September 2006, 21:42

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •