Results 1 to 25 of 25

Thread: BWMonstr

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts

    Cool BWMonstr

    Some interesting BWT-based file compressor by Sami:
    http://nanozip.net/bwmonstr.html



    Seems to be the strongest BWT available, but it's slow as hell...

    I hate it, because I need to do something with my BCM to be on the top. Anyway, such direct comparison will activate developers for further BWT research and improvement. I'm already very mad and activated!!!

    BWT comparison:
    http://compressionratings.com/bwt.html


  2. #2
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Code:
    Enwik5m
    Prog            Ver     Arg	Size         CT      DT	     CM   DM
    BWMonstr        0.00            1 175 778    117.50  117.77  9    9
    Do we really need 4x speed (~42 KiB/sec) of paq8o8 compression from a BWT compressor? It's memory usage is good though.
    BIT Archiver homepage: www.osmanturan.com

  3. #3
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Yep, it's too slow for practical usage. But, as Sami showed a higher BWT compression is possible...

  4. #4
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by encode View Post
    Yep, it's too slow for practical usage. But, as Sami showed a higher BWT compression is possible...
    So? Does it matter? It's useless anyway.

  5. #5
    Member
    Join Date
    Aug 2008
    Location
    Saint Petersburg, Russia
    Posts
    215
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by encode View Post
    Yep, it's too slow for practical usage. But, as Sami showed a higher BWT compression is possible...
    How come a BWT can be so symmetric?..

  6. #6
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    it looks like bbb sorting + paq-class cm compressor. most time spent on second stage, of course

  7. #7
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by nanoflooder View Post
    How come a BWT can be so symmetric?..
    Yet another question, is this a BWT?

    Due to such a special sorting routine, I suppose, it is possible that BWT can be symmetric...

  8. #8
    Member
    Join Date
    May 2008
    Location
    Antwerp , country:Belgium , W.Europe
    Posts
    487
    Thanks
    1
    Thanked 3 Times in 3 Posts
    I wouldn't call it useless - maybe you mean too slow for daily usage ?
    That's true, just like PAQ, ZPAQ, Slim, Durilca, WinRK,... and most other top compressors are also useless for daily usage.

    It's a good proof of concept from what BWT is capable of.

    I wonder if this BWMonstr has any CM-like stuff in its code ... I guess if any at all, it can't be much with this low memory usage.
    (I could be wrong about this ,though)

    I guess encode has got a hell of a challenge here : make BCM compress with a better ratio then this BWMonstr.
    Nono , I wish he could make it faster *without* loosing it's current strength !

  9. #9
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    afair, BCM use just 700k for CM part. so, by combining with bbb sort, it can easily beat bwtmonstr

    i think bwmonstr just automatically allocs for CM datastructures the amount of memory equal to block size

  10. #10
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts

    HI!

    Good compressor but at moment slow !

  11. #11
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    IMO, Sami's BWT test is unfair. I told Sami that BCM v0.05 has no filters. No X86, none of any text trick, no dictionary, nothing of them - pure BWT. Anyway, BCM described as
    "Unknown if the programs use preprocessing."
    At the same time BWMonstr is closed-source as well, but we didn't know is it uses preprocessing or even is it BWT-based.
    If it is BWT-based encoder indeed, then it has a stronger BWT-output coding than BCM. So, if BCM will have BBB's sorting it will not beat it at large ENWIK, but BBB might be easily beaten...
    Second thing, why Sami done such test on such stone-age machine... Just wondering... It's a museum PC!
    Of course his program is numero uno here!

    Anyway, I already have an idea for faster BCM's decompression. Just did some decoder optimizations...

  12. #12
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by pat357 View Post
    I wouldn't call it useless - maybe you mean too slow for daily usage ?
    That's true, just like PAQ, ZPAQ, Slim, Durilca, WinRK,... and most other top compressors are also useless for daily usage.
    I did mentioned about it's BWT side. Because, BWT coders have a nature that tend to compress text-like stationary data easily. I understood that a bit-wise CM could be slow. Because, every complex upgrading also gain significant amount ratio. But, I can't understand for a BWT coder. Because, adding complexity to a BWT compressor "always" bounded by Burrows-Wheeler transform. So, adding paq8o8 back-end to the BWT coder has no sense. It's same as PAQ9A's story. LZP+(strong CM) is not powerful as pure strong CM. And in here, I mean BWT+(strong CM) is not powerful as pure strong CM too.
    BIT Archiver homepage: www.osmanturan.com

  13. #13
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,375
    Thanks
    214
    Thanked 1,023 Times in 544 Posts
    I think there's filtering (its expected actually), just compare its book1 result to http://ctxmodel.net/rem.pl?-13
    Also, I actually tried to compress book1.rbwt with paq8p1 -7 and got 209102,
    so there's certainly some space for improvement for bcm's postcoder, but not
    that much.

  14. #14
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    I think we should test BWMonstr with XORed book1, BWT-transformed book1, etc. to find out what's going on.

  15. #15
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,375
    Thanks
    214
    Thanked 1,023 Times in 544 Posts
    I managed to capture bwmonstr's BWT output
    http://shelwien.googlepages.com/book1_bwm.rar
    And, surprisingly, it doesn't seem filtered - its actually very similar to my usual book1.rbwt.
    So I guess, there's really some smart postcoder, maybe involving partial context
    reconstruction.

  16. #16
    Member
    Join Date
    May 2008
    Location
    Antwerp , country:Belgium , W.Europe
    Posts
    487
    Thanks
    1
    Thanked 3 Times in 3 Posts
    Quote Originally Posted by osmanturan View Post
    So, adding paq8o8 back-end to the BWT coder has no sense. It's same as PAQ9A's story. LZP+(strong CM) is not powerful as pure strong CM. And in here, I mean BWT+(strong CM) is not powerful as pure strong CM too.
    I don't agree 100% with this :
    I guess BWT + CM1 can be more powerful than CM2 only, IF the CM1 can take advantage of the BWT stream.
    I agree that it is probably useless to use a PAQ -alike CM to process the BWT-output, but I'm fairly convinced that BWT + a "dedicated" CM for BWT (not PAQ) can give a better ratio than "pure" CM.

    Would the CM part only from BCM give the same ratio as BWT + CM (from BCM) ?

  17. #17
    Member
    Join Date
    May 2008
    Location
    Antwerp , country:Belgium , W.Europe
    Posts
    487
    Thanks
    1
    Thanked 3 Times in 3 Posts
    Quote Originally Posted by Shelwien View Post
    I managed to capture bwmonstr's BWT output
    http://shelwien.googlepages.com/book1_bwm.rar
    And, surprisingly, it doesn't seem filtered - its actually very similar to my usual book1.rbwt.
    So I guess, there's really some smart postcoder, maybe involving partial context
    reconstruction.
    Thanks.
    Why would anybody first use BWT (and changing the original context) and after that trying to "partially" restore the context ?

  18. #18
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,375
    Thanks
    214
    Thanked 1,023 Times in 544 Posts
    > Why would anybody first use BWT (and changing the original context)
    > and after that trying to "partially" restore the context ?

    Actually there're some fairly sensible reasons:
    1. BWT has relatively low memory requirements - 5N at most, while eg.
    ash normally shows something like ~30N.
    So its possible for BWT with like 200M+ block sizes to win over PPM/CM
    on large stationary files.
    2. You can advertise your program as "the best BWT compressor" =
    no need to compete with paq8

  19. #19
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    bwmonstr enwik8 -> 20,401,888, 1439 sec, 1350 sec, using 123 MB. That beats BBB and it will probably win on enwik9 too, but it will take 8 hours to test with 1250 MB. I guess Sami doesn't have enough memory to do the test himself.

  20. #20
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Sami added the transformation test.
    1. book1
    2. book1 (std::reverse)
    3. book1 (x[i]=255-x[i])
    4. book1 (x[i]=0x55^x[i])
    5. book1 (DC v0.98b -n -s -d -e)
    Reverse, is OK. NOT and XOR is OK as well. BCM shows different compression on such transformations due to its bitwise nature.

    The fifth test is strange.
    First of all, why Sami uses an obsolete DC version, while the newer DC 0.99.307 is available?
    As my tests shown, an obsolete DC have very strage filters, that might not work with some compressors. Shelwien may confirm that.
    With new version of DC, BCM compresses filtered book1 much better, that confirms that BCM have no filters.

  21. #21
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    bwmonstr enwik9 -> 161,249,951 in about 4 hours 20 minutes. My computer auto-updated and rebooted during decompression while running overnight so I need to repeat the test. But this would make it the top ranked BWT compressor beating BBB.

  22. #22
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts

    Smile

    Matt would it not be the best compressor on your page of tests
    for actually size. It would not just beat bbb would it not beat
    all of them. I wonder how much it would compress if the guy
    used BWTS instead of BWT?

  23. #23
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts

    Talking

    Whoops sorry looked at wrong column I need more coffee in
    the morning

  24. #24
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    788
    Thanks
    64
    Thanked 274 Times in 192 Posts
    Quote Originally Posted by Matt Mahoney View Post
    bwmonstr enwik8 -> 20,401,888, 1439 sec, 1350 sec, using 123 MB. That beats BBB and it will probably win on enwik9 too, but it will take 8 hours to test with 1250 MB. I guess Sami doesn't have enough memory to do the test himself.
    I had tested at http://www.metacompressor.com/uploads.aspx :

    enwik9:

    Comp:
    161,249,951 bytes
    7461 sec.
    02:04:10 CPU time
    1,256,833,024 bytes memory

    Decomp:
    6411 sec.
    01:46:49 CPU time
    1,256,787,968 bytes memory

    Compare:
    Ok

  25. #25
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Looks like your machine is about twice as fast.
    http://cs.fit.edu/~mmahoney/compression/text.html

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •