Page 1 of 2 12 LastLast
Results 1 to 30 of 45

Thread: BCM 0.11 - A high performance BWT compressor

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts

    Exclamation BCM 0.11 - A high performance BWT compressor

    Hi all! I proudly present a new version of my BWT file compressor called BCM! Each new version I introduce something new, making small or big steps to the perfect compression. Well, here is yet another step! Since it's a complete rewrite, I improved many things making BCM far more efficient and interesting. OK, stop talking, enjoy!

    Attached Files Attached Files

  2. #2
    Member evg's Avatar
    Join Date
    May 2009
    Location
    Austria
    Posts
    23
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Thumbs up quick test

    Hello

    Source handbrake-cli ELF64 executable 12754056 bytes, system AMD Athlon X2 5600+, 4GB, Linux x64.

    Compression of small blocksizes is slightly slower, probably due to more overhead. No precise timings beause tested under linux/wine, the trend should be obvious though. Well done encode.

    bcm09 (avg. 19s)
    No code has to be inserted here.

    bcm10 (avg. 7s)
    No code has to be inserted here.

    bcm11 (avg. 7s)
    No code has to be inserted here.

    best regards
    evg

  3. #3
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts

  4. #4
    Tester
    Black_Fox's Avatar
    Join Date
    May 2008
    Location
    [CZE] Czechia
    Posts
    471
    Thanks
    26
    Thanked 9 Times in 8 Posts
    my freearc dir tarred, original size 31736320.

    No code has to be inserted here.
    I am... Black_Fox... my discontinued benchmark
    "No one involved in computers would ever say that a certain amount of memory is enough for all time? I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again." -- Bill Gates

  5. #5
    Programmer Gribok's Avatar
    Join Date
    Apr 2007
    Location
    USA
    Posts
    159
    Thanks
    0
    Thanked 1 Time in 1 Post

    Silesia Corpus

    Code:
        file       size   |    bcm         bcm     |    bsc         bsc    
      mozilla |  51220480 |  15924454 | 21.800 sec |  15948277 | 11.919 sec
      webster |  41458703 |   6405101 | 19.800 sec |   6432894 |  8.658 sec
          nci |  33553445 |   1235736 | 13.200 sec |   1204389 |  4.290 sec
        samba |  21606400 |   4020530 |  8.200 sec |   3925261 |  3.619 sec
      dickens |  10192446 |   2237183 |  4.500 sec |   2254916 |  2.231 sec
         osdb |  10085684 |   2260102 |  4.400 sec |   2203400 |  2.262 sec
           mr |   9970564 |   2121384 |  4.000 sec |   2195307 |  1.997 sec
        x-ray |   8474240 |   3663817 |  3.300 sec |   3747896 |  2.543 sec
          sao |   7251944 |   4674671 |  3.300 sec |   4669768 |  2.745 sec
      reymont |   6627202 |    980815 |  2.600 sec |    978277 |  1.155 sec
      ooffice |   6152192 |   2534688 |  2.300 sec |   2555019 |  1.544 sec
          xml |   5345280 |    393902 |  1.700 sec |    379577 |  0.561 sec
        total   211938580 |  46452383   84.600 sec |  46494981   43.524 sec
    * bsc was tested with disabled preprocessing and multi-core systems support.
    Enjoy coding, enjoy life!

  6. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,423
    Thanks
    223
    Thanked 1,052 Times in 565 Posts
    Is that bwt or ST in bsc?
    Afaik BWT should be the same (divsufsort), so do you think its all due to your qlfc implementation?

  7. #7
    Programmer Gribok's Avatar
    Join Date
    Apr 2007
    Location
    USA
    Posts
    159
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by Shelwien View Post
    Is that bwt or ST in bsc?
    Afaik BWT should be the same (divsufsort), so do you think its all due to your qlfc implementation?
    This is BWT only based on divsufsort implementation. Preprocessing and multi-core systems support is turned off.
    Enjoy coding, enjoy life!

  8. #8
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,423
    Thanks
    223
    Thanked 1,052 Times in 565 Posts
    I meant that the same applies to bcm as well, so there can't be much difference due to BWT.

  9. #9
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts

    Lightbulb

    Yet another comparison. BCM 0.11 vs BSC 2.2.0 32-bit. Tested under Core 2 Quad, 4 GB RAM, W7.

    No code has to be inserted here.
    BCM has a higher compression compared to BSC (even with all preprocessing techniques enabled), especially it has a higher binary and bitmap compression. Anyway, BSC sometimes shows a nice results on a high compressible data.
    Not sure how you can really disable the multi-threading with OpenMP libraries linked to bsc.exe. If you compile libdivsufsort with OpenMP enabled, OpenMP will be there in any case. Anyway, only bsc with no OpenMP libraries inside might be considered as a single threaded program! I also did some experiments with OpenMP - BCM nicely boosts the compression, anyway, the executable becomes really fat (~400 KB) and decompression got slightly slower, so I dropped the idea...

  10. #10
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    BTW, BCM is not based on libdivsufsort. A similar things of my own used to build a suffix array. And after BCM produces own BWT output with unchanged symbols. Since BSC is based on libdivsufsort, divbwt() rearranges input bytes - to minimize the alphabet (kind of a cheating - sorting becomes faster even without multi-threaded stuff), so BWT output becomes really different...

  11. #11
    Tester
    Black_Fox's Avatar
    Join Date
    May 2008
    Location
    [CZE] Czechia
    Posts
    471
    Thanks
    26
    Thanked 9 Times in 8 Posts
    If it keeps output bit-to-bit identical and general compression ratio stays (most files still get compressed well), then cheat all you can
    Last edited by Black_Fox; 23rd June 2010 at 23:43.
    I am... Black_Fox... my discontinued benchmark
    "No one involved in computers would ever say that a certain amount of memory is enough for all time? I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again." -- Bill Gates

  12. #12
    Programmer Gribok's Avatar
    Join Date
    Apr 2007
    Location
    USA
    Posts
    159
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by encode View Post
    BCM has a higher compression compared to BSC (even with all preprocessing techniques enabled), especially it has a higher binary and bitmap compression. Anyway, BSC sometimes shows a nice results on a high compressible data.
    Not sure how you can really disable the multi-threading with OpenMP libraries linked to bsc.exe. If you compile libdivsufsort with OpenMP enabled, OpenMP will be there in any case. Anyway, only bsc with no OpenMP libraries inside might be considered as a single threaded program! I also did some experiments with OpenMP - BCM nicely boosts the compression, anyway, the executable becomes really fat (~400 KB) and decompression got slightly slower, so I dropped the idea...
    Use parameters "-b128 -p -T" to disable OpenMP, preprocessing and enable 128MB block. BWT decompression can be easy paralleled, so using OpenMP you can get significant speed boost.
    Enjoy coding, enjoy life!

  13. #13
    Programmer Gribok's Avatar
    Join Date
    Apr 2007
    Location
    USA
    Posts
    159
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by encode View Post
    BTW, BCM is not based on libdivsufsort. A similar things of my own used to build a suffix array. And after BCM produces own BWT output with unchanged symbols. Since BSC is based on libdivsufsort, divbwt() rearranges input bytes - to minimize the alphabet (kind of a cheating - sorting becomes faster even without multi-threaded stuff), so BWT output becomes really different...
    There is no cheating in libdivsuffsort. divbwt just adds terminal symbol to the end of input string. It is simplify sort, but BWT output is slightly different.
    Enjoy coding, enjoy life!

  14. #14
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Code:
    /* Inverse Burrows-Wheeler transform. */
    saint_t
    inverse_bw_transform(const sauchar_t *T, sauchar_t *U, saidx_t *A,
                         saidx_t n, saidx_t idx) {
      saidx_t C[ALPHABET_SIZE];
      sauchar_t D[ALPHABET_SIZE];
      saidx_t *B;
      saidx_t i, p;
      saint_t c, d;
    
      /* Check arguments. */
      if((T == NULL) || (U == NULL) || (n < 0) || (idx < 0) ||
         (n < idx) || ((0 < n) && (idx == 0))) {
        return -1;
      }
      if(n <= 1) { return 0; }
    
      if((B = A) == NULL) {
        /* Allocate n*sizeof(saidx_t) bytes of memory. */
        if((B = (saidx_t *)malloc((size_t)n * sizeof(saidx_t))) == NULL) { return -2; }
      }
    
      /* Inverse BW transform. */
      for(c = 0; c < ALPHABET_SIZE; ++c) { C[c] = 0; }
      for(i = 0; i < n; ++i) { ++C[T[i]]; }
      for(c = 0, d = 0, i = 0; c < ALPHABET_SIZE; ++c) {
        p = C[c];
        if(0 < p) {
          C[c] = i;
          D[d++] = (sauchar_t)c;
          i += p;
        }
      }
      for(i = 0; i < idx; ++i) { B[C[T[i]]++] = i; }
      for( ; i < n; ++i)       { B[C[T[i]]++] = i + 1; }
      for(c = 0; c < d; ++c) { C[c] = C[D[c]]; }
      for(i = 0, p = idx; i < n; ++i) {
        U[i] = D[binarysearch_lower(C, d, p)];
        p = B[p - 1];
      }
    
      if(A == NULL) {
        /* Deallocate memory. */
        free(B);
      }
    
      return 0;
    }
    So, it's not about adding EOF symbol, it's kinda alphabet reordering. It not affects compression in this case, but BWT output is completely different - symbols just remapped - and, talking about fair BWT or SA construction, it's a cheating! Since we transform string with remapped symbols! Some time ago I talked with the author of Archon and Dark - kvark (Dmitry Malyshev) and he pointed me that such things are cheating indeed... It's about times with fair SA constructors battle... I hope such great BWT author knows how EOF with suffix array based BWT organized.

  15. #15
    Programmer Gribok's Avatar
    Join Date
    Apr 2007
    Location
    USA
    Posts
    159
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by encode View Post
    So, it's not about adding EOF symbol, it's kinda alphabet reordering. It not affects compression in this case, but BWT output is completely different - symbols just remapped - and, talking about fair BWT or SA construction, it's a cheating! Since we transform string with remapped symbols! Some time ago I talked with the author of Archon and Dark - kvark (Dmitry Malyshev) and he pointed me that such things are cheating indeed... It's about times with fair SA constructors battle... I hope such great BWT author knows how EOF with suffix array based BWT organized.
    I can only say that you do not understand this source code. This is not an alphabet reordering. This is an alphabet compression by removing unused symbols. This helps to do binary-search for less symbols instead of full alphabet. Since order of symbols is not changed this code produces identical output.
    Enjoy coding, enjoy life!

  16. #16
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    I can only say that you do not understand my English. Please read carefully. As I wrote, divbwt()'s tricks & flicks doesn't affect compression. But I wouldn't call it slightly diffrent output - synbols are remapped - so, as example, context masks will work uncorrectly.

  17. #17
    Programmer Gribok's Avatar
    Join Date
    Apr 2007
    Location
    USA
    Posts
    159
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by encode View Post
    I can only say that you do not understand my English. Please read carefully. As I wrote, divbwt()'s tricks & flicks doesn't affect compression. But I wouldn't call it slightly diffrent output - synbols are remapped - so, as example, context masks will work uncorrectly.
    No comments. Just write sample app and check.
    Enjoy coding, enjoy life!

  18. #18
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Already did that... OK, we should stop that flaming - this thread about BCM - not about BSC or libdivsufsort!

  19. #19
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,423
    Thanks
    223
    Thanked 1,052 Times in 565 Posts
    Anyway, the speed difference is significant (looks like 2x), so I think it should make sense to time bsc's BWT and qlfc separately,
    and compare to BWT/postcoder in bcm. I still think that most of the gap is due to the average number of rc calls per symbol being
    too different... (likely fixed 8 in bcm vs runlength and ranking in bsc).

  20. #20
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Yep, BCM encodes each byte of input = 8 RC calls per input byte... Anyway, Shelwien, read my previous post. If serious, we should compare BCM with BBB, M03, etc.

  21. #21
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,423
    Thanks
    223
    Thanked 1,052 Times in 565 Posts
    Why? You certainly have something to learn from in bsc postcoder - and you don't have to copy it for that, or use qlfc -
    just do something about that fixed 8 bits per symbol (and putc output in rc if you still have that).
    I'd not say that if you aimed for compression, but its not the case.

  22. #22
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    OK, can you or Ilya provide detailed description of QLFC? Is it something like DC? I have a small paper on QLFC, but it's impossible to understand something from it. Does this scheme adds some redundancy?

  23. #23
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,423
    Thanks
    223
    Thanked 1,052 Times in 565 Posts
    QLFC sounds like a DC/MTF hybrid - its MTF basically, but it encodes a "reverse" mtf rank
    ie number of unique symbols until _next_ occurrence of current symbol.
    Thus, its main benefit over MTF is that it allows to use the current symbol as context
    in the rank model (you can't with MTF), which apparently gives a nice compression boost.

    But as I said, qlfc and postcoder speed are two different things here.
    Unlike you, he has runlength coding there, and rank unary coding,
    so there're _much_ fewer rc calls per symbol on average.

    Also note that he still has an option of further improving the postcoder speed
    (by 10-20%) using parallel rc or probability sorting ideas.

  24. #24
    Programmer Gribok's Avatar
    Join Date
    Apr 2007
    Location
    USA
    Posts
    159
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by encode View Post
    OK, can you or Ilya provide detailed description of QLFC? Is it something like DC? I have a small paper on QLFC, but it's impossible to understand something from it. Does this scheme adds some redundancy?
    In general the BWT output is encoded by pairs: qlfc rank of current symbol and run length of current symbol. It is important that current symbol is known and I use this symbol as context. rank and run length is unary coded by PAQ mixer based on three independent contexts. Check in function "bsc_qlfc_decode_fast" main "for (int i = 0; i < n ; )" loop.
    Last edited by Gribok; 24th June 2010 at 21:15.
    Enjoy coding, enjoy life!

  25. #25
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts

    Lightbulb

    Yet another comparison. Same machine - Core 2 Quad, 4 GB RAM, W7.

    Test file: butterfly4.bmp (18,048,054 bytes)

    No code has to be inserted here.

    Test file: bookstar (35,594,240 bytes)

    No code has to be inserted here.

    Test file: ut3.exe (28,064,848 bytes)

    No code has to be inserted here.


  26. #26
    Programmer Gribok's Avatar
    Join Date
    Apr 2007
    Location
    USA
    Posts
    159
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by encode View Post
    Yet another comparison. Same machine - Core 2 Quad, 4 GB RAM, W7.
    bsc is too fast in decompression. Are you sure that you disable OpenMP. Use ""-b128 -p -T"" for both compression and decompression.
    Enjoy coding, enjoy life!

  27. #27
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    I tested your program as regular user - changing block size only. And it's mostly compression ratio comparison. Like you see, CM based compressors more stable at compression, QLFC in some cases really screw up! Will add new comparison soon - will test all of the compressors on game resourses...

  28. #28
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts

    Exclamation

    LTCB comparison (Intel(R) Core(TM)2 Quad Q9300 @ 2.50GHz 2.50GHz, 4 GB DDR2, W7)

    Test file: ENWIK9 (1,000,000,000 bytes)

    No code has to be inserted here.

    A 32-bit version of bsc failed with "Not enough memory!" (bsc e enwik9 enwik9.bsc -p -T -b477). Ilya, probably you should compile the 32-bit version with /LARGEADDRESSAWARE... Attempted to download a 64-bit version - and at your site faced with "Unable to connect to database. Check database connection information and make sure the database server is running."

    Test file: ENWIK8 (100,000,000 bytes)

    No code has to be inserted here.


  29. #29
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts

  30. #30
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Thanks a lot!

Page 1 of 2 12 LastLast

Similar Threads

  1. BCM v0.09 - The ultimate BWT-based file compressor!
    By encode in forum Data Compression
    Replies: 22
    Last Post: 6th March 2016, 10:26
  2. BCM v0.08 - The ultimate BWT-based file compressor!
    By encode in forum Data Compression
    Replies: 78
    Last Post: 12th August 2009, 11:14
  3. BCM v0.01 - New BWT+CM-based compressor
    By encode in forum Data Compression
    Replies: 81
    Last Post: 9th February 2009, 16:47
  4. Blizzard - Fast BWT file compressor!!!
    By LovePimple in forum Data Compression
    Replies: 40
    Last Post: 6th July 2008, 15:48
  5. dcs-bwt - Experimental Burrows-Wheeler Compressor
    By Arkanosis in forum Data Compression
    Replies: 2
    Last Post: 25th June 2008, 04:53

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •