Page 2 of 2 FirstFirst 12
Results 31 to 45 of 45

Thread: BCM 0.11 - A high performance BWT compressor

  1. #31
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,010
    Thanks
    399
    Thanked 398 Times in 152 Posts

    Lightbulb

    BCM 0.11 on OSHO.TXT (206,908,949 bytes)

    No code has to be inserted here.

  2. #32
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    979
    Thanks
    96
    Thanked 395 Times in 275 Posts
    I did some quick tests at metacompressor.com:

    file16 binary exe binary executable 7,882,752 bytes:
    comp: decomp:
    761,368 bytes 9.837 sec. 9.762 sec. nanozip 0.08a 64bit nz -cc -m1750m
    1,068,904 bytes 0.256 sec. 0.222 sec. bsc 2.2.0 64bit bsc e
    1,097,550 bytes 1.626 sec. 1.448 sec. bcm 0.11 bcm
    1,110,210 bytes 1.232 sec. 1.197 sec. bcm 0.10 bcm
    1,146,156 bytes 0.247 sec. 0.164 sec. csc 3.2a5 csc32a5 -m2
    1,148,585 bytes 0.351 sec. 0.165 sec. csc 3.2a5 csc32a5 -m3
    1,263,193 bytes 0.216 sec. 0.167 sec. csc 3.2a5 csc32a5 -m1
    1,617,113 bytes 0.046 sec. 0.029 sec. Etincelle RC2 etincelle -c

    file29 image bmp image 2592x1944 pixels 15,116,598 bytes:
    5,179,743 bytes 3.042 sec. 2.924 sec. bcm 0.11 bcm
    5,214,043 bytes 2.575 sec. 2.602 sec. bcm 0.10 bcm
    5,291,336 bytes 13.755 sec. 11.320 sec. nanozip 0.08a 64bit nz a -cc -m1750m
    5,464,152 bytes 0.929 sec. 0.703 sec. bsc 2.2.0 64bit bsc e
    7,374,670 bytes 2.024 sec. 0.560 sec. csc 3.2a5 csc32a5 -m3
    7,430,910 bytes 1.360 sec. 0.575 sec. csc 3.2a5 csc32a5 -m2
    7,459,324 bytes 1.078 sec. 0.578 sec. csc 3.2a5 csc32a5 -m1
    8,886,413 bytes 0.193 sec. 0.122 sec. Etincelle RC2 etincelle -c

    news text text 377,109 bytes:
    92,603 bytes 1.046 sec. 1.017 sec. nanozip 0.08a 64bit nz a -cc -m1750m
    110,644 bytes 0.037 sec. 0.020 sec. bsc 2.2.0 64bit bsc e
    111,313 bytes 0.078 sec. 0.062 sec. bcm 0.11 bcm
    111,567 bytes 0.065 sec. 0.050 sec. bcm 0.10 bcm
    127,072 bytes 0.032 sec. 0.014 sec. csc 3.2a5 csc32a5 -m3
    130,211 bytes 0.027 sec. 0.014 sec. csc 3.2a5 csc32a5 -m2
    134,753 bytes 0.022 sec. 0.015 sec. csc 3.2a5 csc32a5 -m1
    143,934 bytes 0.008 sec. 0.005 sec. Etincelle RC2 etincelle -c

    pic binary binary 513,216 bytes:
    40,543 bytes 0.990 sec. 1.171 sec. nanozip 0.08a 64bit nz a -cc -m1750m
    44,949 bytes 0.082 sec. 0.060 sec. bcm 0.10 bcm
    45,064 bytes 0.086 sec. 0.074 sec. bcm 0.11 bcm
    47,081 bytes 0.050 sec. 0.017 sec. bsc 2.2.0 64bit bsc e
    48,751 bytes 0.017 sec. 0.009 sec. csc 3.2a5 csc32a5 -m2
    49,127 bytes 0.018 sec. 0.009 sec. csc 3.2a5 csc32a5 -m3
    51,201 bytes 0.013 sec. 0.009 sec. csc 3.2a5 csc32a5 -m1
    65,329 bytes 0.007 sec. 0.004 sec. Etincelle RC2 etincelle -c

    trans text text 93,695 bytes:
    13,693 bytes 5.329 sec. 4.304 sec. nanozip 0.08a 64bit nz a -cc -m1750m
    16,907 bytes 0.022 sec. 0.010 sec. bsc 2.2.0 64bit bsc e
    17,604 bytes 0.019 sec. 0.013 sec. bcm 0.10 bcm
    17,668 bytes 0.023 sec. 0.017 sec. bcm 0.11 bcm
    19,626 bytes 0.009 sec. 0.003 sec. csc 3.2a5 csc32a5 -m3
    20,543 bytes 0.008 sec. 0.003 sec. csc 3.2a5 csc32a5 -m2
    21,384 bytes 0.007 sec. 0.003 sec. csc 3.2a5 csc32a5 -m1
    22,474 bytes 0.005 sec. 0.003 sec. Etincelle RC2 etincelle -c

  3. #33
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,010
    Thanks
    399
    Thanked 398 Times in 152 Posts
    A small BCM's Page & Mirror/Archive:
    http://encode.narod.ru

  4. #34
    Tester

    Join Date
    May 2008
    Location
    St-Petersburg, Russia
    Posts
    182
    Thanks
    3
    Thanked 0 Times in 0 Posts
    Ilia, what about make small page with all your nice programs in one place (balz, quad, lzpm, ppmx, pim, pimple, tc...)?

  5. #35
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts

    Thumbs up

    Thanks Ilia!

    Mirror: http://lovepimple.110mb.com/

  6. #36
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,010
    Thanks
    399
    Thanked 398 Times in 152 Posts
    Quote Originally Posted by squxe View Post
    Ilia, what about make small page with all your nice programs in one place (balz, quad, lzpm, ppmx, pim, pimple, tc...)?
    DONE!

  7. #37
    Member Vacon's Avatar
    Join Date
    May 2008
    Location
    Germany
    Posts
    523
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hello everyone,

    Quote Originally Posted by encode View Post
    DONE!
    and (as everytime) well done

    Best regards!

  8. #38
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,010
    Thanks
    399
    Thanked 398 Times in 152 Posts
    Working on BCM v0.12... I dislike the fact that BCM is out of the board currently. I've found a few things that can be easily improved in current version - like a faster arithmetic encoder. The thing with CM is - we can add new models/modeling techniques to improve compression at the cost of processing speed and visa versa. Examples are BCM 0.09 - a high compression BWT+strong CM and BCM 0.10 - BWT+light CM. BCM can be notable faster than v0.10 and v0.11. Thinking about something new - a new CM structure - i.e. still not DC or QLFC. CM has a higher compression potential than DC or QLFC, at the cost of speed. My new fast CM is really fast and its speed is really close to the best DC and QLFC implementations. Still can't figure out about the best CM configuration and model set. What is good balance between good compression and fast compression/decompression speed? As example, how fast the ENWIK8 should be decompressed?

  9. #39
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by encode View Post
    What is good balance between good compression and fast compression/decompression speed? As example, how fast the ENWIK8 should be decompressed?
    It all depends on what applications you want to target. If you don't know, see where you can push the edge of what's currently possible the farthest and shoot there - unless this makes the codec unuseably slow, there will likely be some application for it.

  10. #40
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 795 Times in 488 Posts
    Of course there are things you can do to improve both compression and speed, like alphabet reordering, high order LZP preprocessing, and coding run lengths in your context model after BWT or use single bits to code repeats. Don't know if you already do this stuff. (BBB doesn't). Large memory models like BBB would help on enwik9. You could speed up forward BBB/BWT by replacing quicksort() + merge with libDivSufSort + merge or something.

  11. #41
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,909
    Thanks
    291
    Thanked 1,271 Times in 718 Posts
    > I've found a few things that can be easily improved in current
    > version - like a faster arithmetic encoder.

    Sure, there're several tricks which you could use... like
    16-bit or 32-bit rc i/o, renorm skipping, branch removal,
    keeping rc state in local vars etc.
    But these don't really improve speed that much, and its
    always harder to optimize decoding speed - its normally slower
    than encoding, and less tricks apply.

    > The thing with CM is - we can add new models/modeling techniques to
    > improve compression at the cost of processing speed and visa versa.

    Yes, but thats true for any BWT postcoders. Anyway, BWT results
    are inherently limited, and for most users a difference between
    210k and 220k on book1 doesn't really matter.

    Well, I mean, DC or MTF or QLFC may hide some dependencies
    (thus limiting the compression potential), but they still
    have the same tradeoff options - its not like there's one
    way to implement a QLFC postcoder.

    > CM has a higher compression potential than DC or QLFC, at the cost
    > of speed.

    Yes, and I have doubts about speed too.
    DC/MTF/QLFC all have to do that complex symbol ranking,
    so it might be possible to design a simpler and faster direct
    coder with similar compression.

    > Still can't figure out about the best CM configuration and model
    > set.

    Mix2(o0,o1) with logistic mixing seems to show results near bsc's.
    Then there're obvious improvements (counters->fsm and mix2->SSE2),
    so overall I guess that should be about right.

    > What is good balance between good compression and fast
    > compression/decompression speed?

    It only makes sense if you'd manage to make it faster than bsc,
    and that's not so easy even if you'd use a plain o0 coder,
    so I think it may be a bit early to look further atm?

    > As example, how fast the ENWIK8 should be decompressed?

    1. bsc is multithreaded so you'd need either to make a hacked
    version for comparison (eg add your file i/o to its qlfc), or
    consider making your coder MT as well.
    2. getc/putc i/o is really problematic for a simple coder,
    but bsc's pure blockwise approach isn't perfect either...
    its hard to buffer enough blocks to smooth it out.
    Actually, a direct CM may have benefits here, like how it
    can encode the BWT symbols without having the whole BWT output in memory,
    and write down the code right away.
    2a. Note that bsc doesn't use much winapi, so things like VirtualAlloc
    with MEM_LARGE_PAGES flag, windows threading API, i/o API (eg. overlapped or mapped i/o)
    might allow to beat it.
    3. You really need RLE and efficient symbol decomposition in postcoder.
    4. In benchmarks, preprocessing (like LIPT and table reordering) would be more helpful
    than anything you can do to the model.

  12. #42
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,010
    Thanks
    399
    Thanked 398 Times in 152 Posts
    2 Shelwien:
    Strange thing about SSE2 on BWT data. In all of my tests O0+O1 (static mixing)+SSE always better than O0+O1+SSE2. Both SSEs have no linear/bilinear interpolation. Both have been optimized with my brute-force optimizer. Probably, I'm doing something wrong. SSE2 for sure properly initialazed - sse2(p0, p1) = (p0+p1)/2. Mixing original probability with SSE output usually helps (p+3ssep), what should we do with SSE2? Is interpolation a must have for SSE2?

  13. #43
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,909
    Thanks
    291
    Thanked 1,271 Times in 718 Posts
    Well, I actually talked about SSE2 with logistic inputs there... linear mixing didn't peform that well in mix_test experiments,
    so SSE2 simulating linear mixing probably won't either. And yes, interpolation is very helpful there.
    I didn't really try SSE2(o0,o1) on bwt data though, maybe I should.

    As to your optimization btw, what kind of parameters are you talking about?
    Something like shift counts? I mean, its better to have more precision at this point.

  14. #44
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,010
    Thanks
    399
    Thanked 398 Times in 152 Posts
    I have 17-bit counters. Parameters are counters' update rate, sse's quantization parameters, etc

  15. #45
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,010
    Thanks
    399
    Thanked 398 Times in 152 Posts
    Just for reference, CM part from BCM 0.11 compresses ENWIK8.BWT in 30 sec. i.e. new BCM 0.12 is about two times faster (~17-18 sec) and its compression is on par with BCM 0.11.

Page 2 of 2 FirstFirst 12

Similar Threads

  1. BCM v0.09 - The ultimate BWT-based file compressor!
    By encode in forum Data Compression
    Replies: 22
    Last Post: 6th March 2016, 09:26
  2. BCM v0.08 - The ultimate BWT-based file compressor!
    By encode in forum Data Compression
    Replies: 78
    Last Post: 12th August 2009, 10:14
  3. BCM v0.01 - New BWT+CM-based compressor
    By encode in forum Data Compression
    Replies: 81
    Last Post: 9th February 2009, 15:47
  4. Blizzard - Fast BWT file compressor!!!
    By LovePimple in forum Data Compression
    Replies: 40
    Last Post: 6th July 2008, 14:48
  5. dcs-bwt - Experimental Burrows-Wheeler Compressor
    By Arkanosis in forum Data Compression
    Replies: 2
    Last Post: 25th June 2008, 03:53

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •