Results 1 to 11 of 11

Thread: BWTmix

  1. #1
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,362
    Thanks
    212
    Thanked 1,016 Times in 540 Posts

    BWTmix

    http://ctxmodel.net/files/mix_test/BWTmix_v0.rar

    BWTmix v0 = BWT+o1rc+unBWT from mix_test_v8

    Usage:

    BWTmix c book1 book1.ari -- compress
    BWTmix d book1.ari book1.unp -- decompress

    BWTmix c7 book1 book1.ari -- compress book1 with block size of 7*100000=700k

    Code:
    comp.size enc.time   dec.time
     20621695  177.453s  102.437s // enwik8
     20744613   46.000s   35.468s // enwik8 + bcm008
    167978527 2104.094s 1019.938s // enwik9
    Times are measured with Q9450 @ 3.52Ghz = 440x8, ramdrive

    Well, I guess my qsort is slow comparing to a proper BWT sort
    And as to CM, its not really that hopeless, and could be probably
    optimized to the same speed as bcm008 - like with vectorized mixing.

    Also, this is open source, and bcm is not

  2. #2
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    Mountain View, CA, US
    Posts
    176
    Thanks
    10
    Thanked 17 Times in 2 Posts

    Thumbs up

    Thank you! I think it's a good tutorial for me to enter the BW world!

  3. #3
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    afair, it's the first bwt+cm open-source algorithm. so, my thanks too

  4. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,362
    Thanks
    212
    Thanked 1,016 Times in 540 Posts
    Considering tutorials, I'd suggest using http://ctxmodel.net/files/mix_test/mix_test_v8.rar
    as it contains all the same stuff as separate utilities,
    which should be more readable and easier to tweak.

    I only integrated it because someone wanted to benchmark it.

  5. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,362
    Thanks
    212
    Thanked 1,016 Times in 540 Posts
    Also, isn't BBB kinda open-source BWT+CM too?

    Btw, seems that BWTmix really got tested: http://compressionratings.com/bwt.html#book1

  6. #6
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Congrats! Good work! And seems to be the BCM v0.09 release will be delayed again. Well, can you describe what's going on inside, the mode structure?

    SSE? How many SSE stages? Model set? order-0+order-1+order-? Dynamic mixing?


  7. #7
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,362
    Thanks
    212
    Thanked 1,016 Times in 540 Posts
    > SSE? How many SSE stages?

    No SSE at all.

    > Model set? order-0+order-1+order-?

    5x order0, 1x order1,
    and then contexts with masks DF00 and 1FFF.
    Based on my usual linear counters, no
    delayed updates or anything.

    > Dynamic mixing?

    Well, yeah, it was made for evaluation of paq mixer
    originally. So the model looks like
    mix2( mix2(mix2(p1,p2),mix2(p3,p4)), mix2(mix2(p5,p6),mix2(p7,p) )
    And that's all.

  8. #8
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts

    Thumbs up

    Thanks Shelwien!

    Here's a GCC 4.4 build. Its slightly faster than the original on my AMD Sempron 2400+ machine.

    Edit: Uploaded a second slightly improved build.
    Attached Files Attached Files

  9. #9
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,362
    Thanks
    212
    Thanked 1,016 Times in 540 Posts
    http://ctxmodel.net/files/mix_test/BWTmix_v1.rar
    BWTmix v1
    = BWT+o1rc+unBWT from mix_test_v9
    Code:
    comp.size enc.time   dec.time
     20608793  143.671s   67.344s // enwik8
     20744613   46.000s   35.468s // enwik8 + bcm008
    167852106 1794.406s  690.906s // enwik9
    - compression improvement (0.65% at enwik8, 0.075% at enwik9)
    - 15% faster compression
    - 32% faster decompression

    ---------------
    http://ctxmodel.net/files/mix_test/mix_test_v9.rar

    book1 enwik6 enwik8
    208199 250633 20615390 // 09g2a
    208205 250789 20608788 // 09h2

    book1r enwik6r enwik8r
    207130 248634 20555960 // 09g2a
    207176 248656 20548503 // 09h2

    v9
    - mixer contexts added
    - separate rates for different bits tested (09g2a)
    - separate contexts for different bits tested
    - idx2inc updates
    - improved support of masked components
    - support for sliding mask components
    - added comments with mask values to generated code
    - index components import fix
    - speed optimizations
    - coding functions rewritten using templates
    - negative rounding fix removed from mixer

    ToDo:
    - cache-friendly element allocation (via sliding mask lookup tables)
    - delayed counters with runlengths
    - conscious bit history modelling
    - logistic SSE2 with interpolation
    - unary coder optimization

  10. #10
    Member
    Join Date
    Oct 2007
    Location
    Germany, Hamburg
    Posts
    408
    Thanks
    0
    Thanked 5 Times in 5 Posts
    Nice results. Who cares about compression time as long as it isn't exorbitant high . You come close to bcm while compression ratio is much better (at least on enwik8 dunno other data).

  11. #11
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,362
    Thanks
    212
    Thanked 1,016 Times in 540 Posts
    As to compression time, of course I can use some of available BWT libraries too.
    But I think that actually I should use a new rewrite of http://encode.su/forum/showthread.php?t=379
    there, so won't bother for now.

    And as to decompression speed, there's a long list of just already tested
    applicable speed optimizations, not even taking into account stuff like
    vectorization of mixing and counter updates.
    So I think that beating bcm008 in decoding speed too should be rather easy -
    though most of the same techniques would benefit bcm as well.

    However, I don't really want to apply many of these potential techniques
    (especially vectorization), as they would nearly block all the further
    development.
    But still, there's a lot of stuff which would be "transparent" to the model.
    For example, it seems that BWTmix would be a good platform for a test
    of my multiplication-free rangecoding idea.

Similar Threads

  1. USING BWTS in BWTmix
    By biject.bwts in forum Data Compression
    Replies: 8
    Last Post: 9th March 2010, 03:00

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •