View Poll Results: Should I release BCM v2.00 (not comparible with v1.xx)?

Voters
29. You may not vote on this poll
  • Yes

    28 96.55%
  • No

    1 3.45%
Page 2 of 4 FirstFirst 1234 LastLast
Results 31 to 60 of 111

Thread: BCM - The ultimate BWT-based file compressor

  1. #31
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    When I wrote an inverse BWT in ZPAQL, I stuffed an extra byte in the BWT for the EOF. Then you can eliminate the (i>=p) everywhere.

  2. The Following User Says Thank You to Matt Mahoney For This Useful Post:

    encode (27th February 2015)

  3. #32
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts

  4. The Following 12 Users Say Thank You to encode For This Useful Post:

    AnthonyThrashD (5th March 2015),Bulat Ziganshin (2nd March 2015),Fu Siyuan (2nd March 2015),ivan2k2 (3rd March 2015),Mat Chartier (2nd March 2015),Matt Mahoney (4th March 2015),m^2 (5th March 2015),ne0n (3rd March 2015),nemequ (3rd March 2015),quadro (24th May 2015),spark (3rd March 2015),xezz (3rd March 2015)

  5. #33
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts

  6. The Following User Says Thank You to Matt Mahoney For This Useful Post:

    encode (4th March 2015)

  7. #34
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts
    Code:
    C:\Test>bcm -b100 -f enwik8
    enwik8: 100000000 -> 20792796 in 12.766s
    
    C:\Test>bcm -d -f enwik8.bcm e8
    enwik8.bcm: 20792796 -> 100000000 in 11.922s
    
    C:\Test>bcm -b1000 -f enwik9
    enwik9: 1000000000 -> 164251284 in 147.229s
    
    C:\Test>bcm -d -f enwik9.bcm e9
    enwik9.bcm: 164251284 -> 1000000000 in 142.072s
    BTW, the 64-bit build can be slightly faster - I have to remove 32-bit related optimizations - will add a couple of #ifdef _WIN32 in next versions of BCM.

  8. #35
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Thanks. Updated http://mattmahoney.net/dc/text.html#1639

    My webpage was broken yesterday due to uploading to my website not working. It seems to be fixed now.

  9. #36
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    150
    Thanks
    30
    Thanked 59 Times in 35 Posts
    It is very small improvement for encoding block size and pidx.
    Last edited by xezz; 8th March 2015 at 14:05.

  10. #37
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts
    Thanks, but it's not an improvement - it may really hurt compression in some cases with tiny blocks - I tested this approach many times...
    In addition, you are wrong here - p(0.5) for Encoder::Encode is 1<<17, not 32768. To not be mistaken by Counter's scale - the final probability scale is 1<<18 - p+ssep+ssep+ssep
    Also, be aware of changing the output format - all public versions of BCM must be compatible now!
    The thing I really would love to improve is inverse BWT - many compilers output really inefficient code - it is even slower than actual BWT transform. Only with PGO it becomes faster!

  11. #38
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    150
    Thanks
    30
    Thanked 59 Times in 35 Posts
    In addition, you are wrong here - p(0.5) for Encoder::Encode is 1<<17, not 32768.
    I'm sorry. corrected it.

  12. #39
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts
    The second and the very important purpose of such slightly redundant size/pidx coding is error detection - since BCM has no CRC or other checksum checking...

  13. #40
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts
    bcm-w64 -b1000 source archive

    seems to be (under win server 2008 R2) faster and compress better as

    bcm-w32 source archive

    can this be true ?

    i will run further tests

    best regards

  14. #41
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts
    In many cases - the larger block size, the better. And yes, the 64-bit version must be faster. However, with the current inverse BTW, the 32-bit version is faster at decompression.

  15. #42
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts
    thank you for your answer.

    PGO for further x64-optimization ..
    Does it mean that we need a special intel-x64-version and another special AMD-x64-version?

    best regards

  16. #43
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    I didn't look at the code, but in zpaq I have 2 versions of the inverse BWT, one for blocks up to 16 MB and one for blocks up to 4 GB. For the smaller blocks, I pack the linked list pointers and the character it points to into a single 32 bit variable. This speeds up the algorithm because there is only one cache miss per character instead of two. (Building the list like this only requires sequential memory access). For larger blocks you could have a 5 byte structure to do this, but since I was writing it in ZPAQL it wouldn't be easy to do.

    I think I recall BSC has an inverse BWT that outputs 2 bytes at a time to reduce cache misses even further.

  17. #44
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts
    ICL's PGO (Profile Guided Optimization) didn't help with a 64-bit compile. Can't say about modern AMD CPUs - I don't have one.

    I know about this trick - will test it more carefully. I didn't include it right away because the default block size is 20 MB (To keep the memory usage slightly over 100 MB)

  18. #45
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts
    running under win server 2008 R2 (virtual machine AMD processors)

    bcm-w64 -b1000 e:backup.dat c:bac-bcm64
    e:backup.dat: 54801567744 -> 5349330866 in 35905.593s

    bcm-w32 e:backup.dat c:bac-bcm32
    e:backup.dat: 54801567744 -> 6670331585 in 18204.265s

    for now: bcm-w64 has a good potential!

    bcm-w64 compress nearly 20% better then bcm-w32

    but in my test
    bcm-w64 takes nearly twice as much time as bcm-w32

    if bcm-w64 can run so quick as bcm-w32
    than it maybe can beat 7zip in efficiency

    i will run further tests - best regards

  19. #46
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts
    As a note, you're comparing 20 MB block BWT with 1000 MB block BWT...
    You may use -b2047 or even -b2097151k as well!

  20. #47
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts
    A memo for myself
    Code:
    BCM - A BWT-based file compressor, v1.01
    
    Usage: bcm [options] infile [outfile]
    
    Options:
      -b#[k] Set block size to # MB or KB (default is 20 MB)
      -d     Decompress
      -f     Force overwrite of output file

  21. #48
    Member
    Join Date
    Apr 2010
    Location
    CZ
    Posts
    81
    Thanks
    5
    Thanked 7 Times in 5 Posts
    Good that the source code is so small, so at least it build real fast and real nice results overall.

    At the few tests I done, it seem not to use much above the 5n space, for a specified block size, which is also good point.

    (Tested on a netbook: Aspire One Intel(R) Atom(TM) CPU N2600 @ 1.60GHz, 2 cores Ubuntu 64-bit )
    ./bcm -b100 -f enwik8 enwik8.bcm
    enwik8: 100000000 -> 20792796 in 118.390s
    C.Time: 116.62 s user, 1.78 s system, 1:59.84 total; Used 98% CPU, memory: 489404 kB

    ./bcm -d -f enwik8.bcm enwik8.dec
    enwik8.bcm: 20792796 -> 100000000 in 94.042s
    D.Time: 90.92 s user, 3.13 s system, 1:35.38 total; Used 98% CPU, memory: 489184 kB
    Last edited by quadro; 24th May 2015 at 18:55.

  22. #49
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts
    BCM v1.01 @ GitHub:
    https://github.com/encode84/bcm


  23. The Following 3 Users Say Thank You to encode For This Useful Post:

    comp1 (9th April 2016),jibz (10th April 2016),m^2 (9th April 2016)

  24. #50
    Member
    Join Date
    Nov 2014
    Location
    California
    Posts
    124
    Thanks
    40
    Thanked 35 Times in 25 Posts
    Ilya,

    BCM is a fantastic combination of speed, code simplicity and compression efficiency.
    Thanks for sharing the code.

    I adapted it to my project:

    https://github.com/flanglet/kanzi/blob/master/java/src/kanzi/entropy/CMPredictor.java
    https://github.com/flanglet/kanzi/bl...CMPredictor.go

    java -cp kanzi.jar kanzi.app.BlockCompressor -input=enwik8 -output=none -block=100m -transform=bwt -entropy=cm -verbose=1
    Encoding ...
    Encoding: 26546 ms
    Input size: 100000000
    Output size: 20792267
    Ratio: 0.20792268
    Throughput (KB/s): 3678

    go run BlockCompressor.go -input=enwik8 -output=none -block=100m -transform=bwt -entropy=cm -verbose=1
    Encoding ...
    Encoding: 47726 ms
    Input size: 100000000
    Output size: 20792267
    Ratio: 0.207923
    Throughput (KB/s): 2046
    Last edited by hexagone; 9th April 2016 at 23:15. Reason: Formatting

  25. The Following User Says Thank You to hexagone For This Useful Post:

    encode (10th April 2016)

  26. #51
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts
    I see some code from BCM here... And I see no project contributors list...
    Well, just a quick note on SLOW_RATE/FAST_RATE:
    SLOW_RATE is 6 here, fast is 2.

  27. #52
    Member
    Join Date
    Nov 2014
    Location
    California
    Posts
    124
    Thanks
    40
    Thanked 35 Times in 25 Posts
    Quote Originally Posted by encode View Post
    And I see no project contributors list...
    Well, just a quick note on
    What do you mean ? Your name is at the top of the file.

    Quote Originally Posted by encode View Post
    Well, just a quick note on SLOW_RATE/FAST_RATE:
    SLOW_RATE is 6 here, fast is 2.
    Yeah, I renamed the constants at some point and messed up the names. Will change.

  28. The Following User Says Thank You to hexagone For This Useful Post:

    encode (10th April 2016)

  29. #53
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts
    Uh, I have searched for detailed README with algos description... You have from A to Z compression algos here!

  30. #54
    Member
    Join Date
    Nov 2014
    Location
    California
    Posts
    124
    Thanks
    40
    Thanked 35 Times in 25 Posts
    I am not claiming it is super detailed but you can start with the WIKI: https://github.com/flanglet/kanzi/wiki

  31. The Following User Says Thank You to hexagone For This Useful Post:

    encode (10th April 2016)

  32. #55
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts
    Updated to BCM v1.02. Sort of a final version that features long awaited optimizations!

    https://github.com/encode84/bcm

  33. The Following 4 Users Say Thank You to encode For This Useful Post:

    Alexander Rhatushnyak (13th April 2016),comp1 (13th April 2016),Cyan (13th April 2016),Stephan Busch (13th April 2016)

  34. #56
    Member Alexander Rhatushnyak's Avatar
    Join Date
    Oct 2007
    Location
    Canada
    Posts
    232
    Thanks
    38
    Thanked 80 Times in 43 Posts
    A makefile or a toCompile.bat would be helpful.

    Also, why not (k<<12)-(k==16) instead of (k-(k==16))<<12 in CM() ? Seems more logical, doesn't it?

    This newsgroup is dedicated to image compression:
    http://linkedin.com/groups/Image-Compression-3363256

  35. The Following User Says Thank You to Alexander Rhatushnyak For This Useful Post:

    encode (13th April 2016)

  36. #57
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts
    You are right! It's just the SSE initialization I'm using for years (if not decades)!
    Collecting ideas for BCM v2.00...

  37. #58
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts
    Should I release a slightly changed version of BCM, breaking compatibility (BCM is not that widespread I presume)?
    Maybe with some extra features you suggest? (CRC32 checking? Higher compression at slower speed? ...?)

  38. #59
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    858
    Thanks
    450
    Thanked 255 Times in 103 Posts
    I believe you have no issue with "installed base".

    But current BCM source version has a great property : simplicity.
    This is a very worthy characteristic, which is worth more than little gains and features.

    Now, of course, adding feature / performance while preserving simplicity, this would a great way forward.

  39. #60
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    468
    Thanks
    203
    Thanked 81 Times in 61 Posts
    Can't you just add the old uncompressor code to the new release? Or make something like SQX, capable to write 2 different versions of the standard, 1.1 and 2.0 I think.
    Last edited by Gonzalo; 13th April 2016 at 22:16.

Page 2 of 4 FirstFirst 1234 LastLast

Similar Threads

  1. BCM v0.09 - The ultimate BWT-based file compressor!
    By encode in forum Data Compression
    Replies: 22
    Last Post: 6th March 2016, 09:26
  2. BCM 0.11 - A high performance BWT compressor
    By encode in forum Data Compression
    Replies: 44
    Last Post: 29th October 2010, 22:45
  3. BCM v0.08 - The ultimate BWT-based file compressor!
    By encode in forum Data Compression
    Replies: 78
    Last Post: 12th August 2009, 10:14
  4. BCM v0.01 - New BWT+CM-based compressor
    By encode in forum Data Compression
    Replies: 81
    Last Post: 9th February 2009, 15:47
  5. Blizzard - Fast BWT file compressor!!!
    By LovePimple in forum Data Compression
    Replies: 40
    Last Post: 6th July 2008, 14:48

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •