Page 2 of 3 FirstFirst 123 LastLast
Results 31 to 60 of 82

Thread: BCM v0.01 - New BWT+CM-based compressor

  1. #31
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,982
    Thanks
    298
    Thanked 1,309 Times in 745 Posts
    o1rc9f uploaded and BCM 0.02 results are in the table.

  2. #32
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    406
    Thanked 403 Times in 153 Posts
    Thank you!

  3. #33
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    406
    Thanked 403 Times in 153 Posts
    Tested an LZP preprocessor with BCM. Well, it not only eliminates the worst case, additionally, it heavily improves processing speed and compression ratio!

  4. #34
    Member
    Join Date
    May 2008
    Location
    Antwerp , country:Belgium , W.Europe
    Posts
    487
    Thanks
    1
    Thanked 3 Times in 3 Posts
    Testing BCM v0.02 versus BCM v0.01

    Enwik8 :
    BCM v0.02 : 23.761.415 bytes / 101 sec /decomp 62,4 sec
    BCM v0.01 : 21.824.948 bytes / 126 sec /decomp 49.1 sec

    Scribble.wav (9.249.480 bytes) :
    BCM v0.02 : 11.654 bytes / 637 sec /decomp 4,63 sec
    BCM v0.01 : 12.473 bytes / 1253 sec /decomp 3,49 sec

    Scribble.wav seems to be a typical "worst case" file for BCM sorting algorithm

    for comparison :
    DARK v0.51 p-b10mf : 9.638 bytes / 3,5 sec !!!
    DARK v0.51 p-b10mfsr : 9.587 bytes / 2,3 sec !!!
    (10m = block size / f = disable fragmentation / s = disable solid / r = disable reverse)
    BBB v1 (cfm10) : 13.014 bytes / 1170 sec

    For the next test I've used TAR to put the same data that Shelwien uses together :
    Date/time Size Filename
    ----------------------------------------
    2008-06-17 12:13:24 35.594.240 ./shelwdata/bookstar.txt
    2008-06-17 11:45:02 52.459.520 ./shelwdata/cyg_bin.tar
    2008-06-17 11:46:22 95.436.800 ./shelwdata/cyg_lib.tar
    2006-05-19 15:51:12 100.000.000 ./shelwdata/enwik8.txt
    2007-02-21 10:23:02 31.851.425 ./shelwdata/finn_lst.txt
    2006-09-19 12:14:38 64.368.429 ./shelwdata/gits2op_mkv.mkv
    ----------------------------------------
    6 files, total 379.710.414 bytes

    Shelwdata.tar (379.710.414 bytes)
    BCM v0.02 : 141.751.386 bytes / 503,5 sec /decomp 259,52 sec
    BCM v0.01 : 138.700.039 bytes / 596,6 sec /decomp 198,39 sec

    Based on these results, it seems that compression is worse on some files while speed is better. Decompression speed seems to be decreased in 0.02.

    @ encode
    I wonder why the sorting takes so long... what algorithm does BCM use for sorting ?
    I've tested BWT v1.10 that was posted here : it outputs the BWT in a fraction of time compared to BCM. Is the BWT output the same for al diffrent algoritms ?
    Maybe you could build a version using BWT 1.10 for sorting and then feeding this to your CM ? It would be over 10x faster on the "worst case" files !!
    What do you think about this idea ?
    Last edited by pat357; 2nd July 2008 at 01:31. Reason: Decompression times included for BCM v0.02 & BCM v0.01

  5. #35
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    406
    Thanked 403 Times in 153 Posts
    Quote Originally Posted by pat357 View Post
    Is the BWT output the same for al diffrent algoritms ?
    Correct me if I'm wrong, BWT is BWT - the output is the same, the difference is in actual sorting algorithm.
    Quote Originally Posted by pat357 View Post
    What do you think about this idea ?
    Look at my post above. Simple LZP filter solves all problems...

  6. #36
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    406
    Thanked 403 Times in 153 Posts
    BTW, can anyone test BCM v0.01 and v0.02 for the decompression speed?

    With BCM v0.01 I used VS2005 and with v0.02 I use VS2008!

  7. #37
    Programmer michael maniscalco's Avatar
    Join Date
    Apr 2007
    Location
    Boston, Massachusetts, USA
    Posts
    141
    Thanks
    26
    Thanked 94 Times in 31 Posts
    Quote Originally Posted by encode View Post
    Tested an LZP preprocessor with BCM. Well, it not only eliminates the worst case, additionally, it heavily improves processing speed and compression ratio!
    The problem with LZP pre-processing is that you destroy any chance of using context to assist with compression. There are several good suffix sorts available to provide fast BWT including MSufSort and DivSufSort.

  8. #38
    Programmer michael maniscalco's Avatar
    Join Date
    Apr 2007
    Location
    Boston, Massachusetts, USA
    Posts
    141
    Thanks
    26
    Thanked 94 Times in 31 Posts
    Quote Originally Posted by pat357 View Post
    Testing BCM v0.02 versus BCM v0.01

    Enwik8 :
    BCM v0.02 : 23.761.415 bytes / 101s
    BCM v0.02 : 21.824.948 bytes / 126s

    Scribble.wav (9.249.480 bytes) :
    BCM v0.02 : 11.654 bytes / 637s
    BCM v0.01 : 12.473 bytes / 1253s

    Scribble.wav seems to be a typical "worst case" file for BCM sorting algorithm

    for comparison :
    DARK v0.51 p-b10mf : 9.638 bytes / 3,5s !!!
    DARK v0.51 p-b10mfsr : 9.587 bytes / 2,3s !!!
    (10m = block size / f = disable fragmentation / s = disable solid / r = disable reverse)
    BBB v1 (cfm10) : 13.014 bytes / 1170s
    M03: 8,721 bytes

  9. #39
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    406
    Thanked 403 Times in 153 Posts
    Quote Originally Posted by michael maniscalco View Post
    The problem with LZP pre-processing is that you destroy any chance of using context to assist with compression. There are several good suffix sorts available to provide fast BWT including MSufSort and DivSufSort.
    On text files too short MINMATCH of LZP may hurt compression, but starting at MINMATCH=64 the compression loss is negligible.
    Furthermore, as I said, LZP may improve compression, for example:
    calgary.tar:
    BCM -> 791,434 bytes, 6 sec
    BCM+LZP -> 786,640 bytes, 2 sec

    3X times speed boost in pair with higher compression...

  10. #40
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Short matches replaced by LZP introduces distortion, since the coded match lengths insert artifical symbols, which destroy the context. And BWT heavily relies on the context, since it is context sorting. LZP might only improve compression on very homogenous data, where LZP almost always relpaces the same sequences with the same symbols (match lengths), like: aaaa aaaa -> (by lzp) B B.
    Why don't you try a static string substitution? This is more likely to improve compression.

    I wonder how your BCM will do, when you remove PIC from calgary.tar... I bet there's no compression gain at all, for the reason i said above.

    And you really shouldn't test against such a tar with inhomogenous data.

  11. #41
    Member
    Join Date
    May 2008
    Location
    Antwerp , country:Belgium , W.Europe
    Posts
    487
    Thanks
    1
    Thanked 3 Times in 3 Posts
    Quote Originally Posted by michael maniscalco View Post
    M03: 8,721 bytes
    WOW ! that's impressive !
    Can we test this version ? I tried the version available from your website :

    scribble.wav
    9.249.480 --> 15.742 bytes ....

  12. #42
    Programmer michael maniscalco's Avatar
    Join Date
    Apr 2007
    Location
    Boston, Massachusetts, USA
    Posts
    141
    Thanks
    26
    Thanked 94 Times in 31 Posts
    Quote Originally Posted by pat357 View Post
    WOW ! that's impressive !
    Can we test this version ? I tried the version available from your website :

    scribble.wav
    9.249.480 --> 15.742 bytes ....

    It's not my implementation. Just my algorithm.
    There was a 32MB blocksize variation at:

    http://mij4x.datacompression.jp/junk...2005-02-15.zip

    You have to right click in the app GUI to get the context menu which offers different blocksizes. I think he released his source code to a short while back.

    - Michael

  13. #43
    Member
    Join Date
    May 2008
    Location
    Antwerp , country:Belgium , W.Europe
    Posts
    487
    Thanks
    1
    Thanked 3 Times in 3 Posts
    Quote Originally Posted by encode View Post
    BTW, can anyone test BCM v0.01 and v0.02 for the decompression speed?

    With BCM v0.01 I used VS2005 and with v0.02 I use VS2008!
    I've included the decompression times in my previous post :

    http://encode.su/forum/showpost.php?p=1453&postcount=34

    You might have a look there.
    It seems that v0.02 is slower in decompressing than 0.01. Could this be due the smaller block-size ?

  14. #44
    Member Fallon's Avatar
    Join Date
    May 2008
    Location
    Europe - The Netherlands
    Posts
    164
    Thanks
    15
    Thanked 11 Times in 6 Posts

    Thumbs up

    Quote Originally Posted by michael maniscalco View Post
    I think he released his source code to a short while back.
    Yep, it's there.
    http://mij4x.datacompression.jp/?date=20040720
    http://mij4x.datacompression.jp/junk/m03-2005.01.23.zip

  15. #45
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,982
    Thanks
    298
    Thanked 1,309 Times in 745 Posts
    o1c9g uploaded

    Code:
                 [BCM]     [o1rc9f] [o1rc9g] [BCM+LZP]
    calgary.tar  791,435   785456   784037   786,640
    book1        212,674   211047   210908
    world95.txt  474,291   470390 	469640
    o1rc9f uses 3 mixers, o1rc9g uses 4 (still order1 max contexts)
    Also o1rc can be tuned for use with LZP preprocessor (which one, flzp?).
    if anybody is interested, though you can test it yourself as is.

  16. #46
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    406
    Thanked 403 Times in 153 Posts
    Can you please test and compare attached compile to the original BCM v0.01?
    Like I said this one was compiled using VS2008, original BCM v0.01 was compiled using VS2005...

  17. #47
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    406
    Thanked 403 Times in 153 Posts
    On my machine VS2008 version is crazily slower:

    ENWIK8 decompression:

    VS2005 - 44 sec
    VS2008 - 107 sec

    Will try to recompile with new profile...

  18. #48
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    406
    Thanked 403 Times in 153 Posts
    Properly recompiled BCM v0.02a:
    Attached Files Attached Files

  19. #49
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    412
    Thanks
    38
    Thanked 64 Times in 38 Posts
    bcm v0.02a

    has for me the same compression result as version 0.02

    but seems to be a little bit slower tehn version 0.02

    on a system with 4 gbyte ram:

    bcm 002a memory request:

    sorting.. 58.644 kbytes
    encoding. 42.240 kbytes
    42.244 kbytes
    sorting.. 58.644 kbytes
    ...
    ...

    compression:

    sourcefile db.dmp 648.331.264 bytes

    balz113 e 33.390.086 bytes 9 min
    7zip 35.151.362 bytes 19 min
    bcm001 31.927.387 bytes 22 min
    bcm002 31.022.446 bytes 17 min
    bcm002a 31.022.446 bytes 19 min

    where the news in this version ?
    this version is more stable in "worst case" ?

  20. #50
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    406
    Thanked 403 Times in 153 Posts
    Quote Originally Posted by joerg View Post
    where the news in this version ?
    this version is more stable in "worst case" ?
    Read my posts above - it's just a recompiled v0.02...

  21. #51
    Member
    Join Date
    May 2008
    Location
    Antwerp , country:Belgium , W.Europe
    Posts
    487
    Thanks
    1
    Thanked 3 Times in 3 Posts
    Quote Originally Posted by encode View Post
    Properly recompiled BCM v0.02a:
    New timing test :

    Enwik8 :
    BCM v0.02 : 23.761.415 bytes / 101 sec /decomp 62,4 sec
    BCM v0.02a / 102,2 sec /deomp 49,4 sec

    Shelwdata.tar (379.710.414 bytes)
    BCM v0.02 : 141.751.386 bytes / 503,5 sec /decomp 259,52 sec
    BCM v0.02a : / 534,0 sec /decomp 195.9 sec

    Tested on C2Q E6600 (quadcore @ 2.2 Mhz / 4GB RAM /Vista Prem (32 bit)

    It seems compression is a bit slower. while decomp. is FASTER !!
    Did you expect this ?
    Last edited by pat357; 3rd July 2008 at 04:27.

  22. #52
    Member
    Join Date
    May 2008
    Location
    Antwerp , country:Belgium , W.Europe
    Posts
    487
    Thanks
    1
    Thanked 3 Times in 3 Posts
    Some of us are looking forward to the next version of BCM with LZP preprocessing !
    Can you tell us when you're going to release this version ?

    - I think the sorting algorithm is the next first thing to look at because it is the reason why BCM is very slow for some files. Maybe a complete different sorting algorithm, like Msufsort, DivSufsort, ....). Memory from 7N to 6N - 5N.
    - Second imho would be fragmentation, reversed bytes, ...
    - 3rd step would be further optimizing the CM
    - 4th step could be introducing filters based on content (not only extension) : exe, wav, images,...

    BTW about segmentation/fragmentation : someone posted here the "hidden" switch in Durilca v0.2 but I forgot it. How was again ?
    durilca e -farchive file.tar -s or -f or -t.. .???.
    The seg_file works only for smaller files.
    Last edited by pat357; 5th July 2008 at 01:45.

  23. #53
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    406
    Thanked 403 Times in 153 Posts
    Quote Originally Posted by pat357 View Post
    Some of us are looking forward to the next version of BCM with LZP preprocessing !
    Can you tell us when you're going to release this version ?
    Well, currently I fully switched to BALZ project. I need to write some README, design a new homepage, plus do some cosmetic source changes.
    After, I might release new BCM. Anyway, mostly, BCM is just BWT testbed for me. Anyway, for now you may test BCM with Dmitry Shkarin's LZP:
    http://compression.ru/ds/lzp.rar


  24. #54
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,982
    Thanks
    298
    Thanked 1,309 Times in 745 Posts

  25. #55
    Member
    Join Date
    May 2008
    Location
    Antwerp , country:Belgium , W.Europe
    Posts
    487
    Thanks
    1
    Thanked 3 Times in 3 Posts
    thanks !!

  26. #56
    Programmer
    Join Date
    Jul 2008
    Location
    Finland
    Posts
    102
    Thanks
    0
    Thanked 1 Time in 1 Post
    I'm a bit confused as to what was the latest on your sorting Ilia? Are you still using std::sort? If I may humbly suggest couple of basic tips which you can explore. First take order1 stats of the block, then make the pointers based on these statistics. Then sort each of two byte buckets with custom the sedgewick quicksort variant. The sedgewick variant should work on dwords (like strcmp with dwords instead of bytes), directly comparing the input in backward direction (if we presume we are working on little endian platform), and the comparison function should be inlined. These are fairly easy modifications, as you can find sedgewick sorting source code online, and see bzip2 source for the initial order1 stats. If LZP is used before compression, with these settings you can match (or get close to) blizzard in compression speed, while requiring only 5n memory.

  27. #57
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    406
    Thanked 403 Times in 153 Posts
    Well, like I said, the main goal of BCM is just to test BWT-output coding with my CM - no more no less...

  28. #58
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    412
    Thanks
    38
    Thanked 64 Times in 38 Posts
    encode wrote:

    Well, now ("After finalizing BALZ")
    I have some time to release BCM with LZP preprocessing - to be in the game!

    pat357 wrote:

    "I guess it is possible to implement BWT for 4 cores..."
    "(Winrk might be even doing this already, not sure though..)"

    @encode:

    Do you think - would it be possible to modify your algorithm in direction using 2 or more threads ?

    the compression can done much faster ...

  29. #59
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    406
    Thanked 403 Times in 153 Posts
    Quote Originally Posted by joerg View Post
    encode wrote:
    Well, now ("After finalizing BALZ")
    I have some time to release BCM with LZP preprocessing - to be in the game!
    Well, as you know, I spent that time for a new version of my PIM file archiver... Just completely have no spare time these days...

    Quote Originally Posted by joerg View Post
    pat357 wrote:
    "I guess it is possible to implement BWT for 4 cores..."
    "(Winrk might be even doing this already, not sure though..)"
    I'm not sure that Malcolm is that advanced in BWT. He's about cool LZ and PPM-based stuff a la ROLZ and some PAQ snips like FPW and PWCM.

    Quote Originally Posted by joerg View Post
    @encode:
    Do you think - would it be possible to modify your algorithm in direction using 2 or more threads ?

    the compression can done much faster ...
    Not sure about the CM part. The sorting part, which makes compression so slow, can be, I guess. Since I use stable_sort() there is no way to do such thing. However, with custom sorting routines we may do that. Hm, I think it is possible to compress each block separately and independently with different threads - for example, 2-cores = we compress two blocks simultaneously, 4-cores = 4 blocks, etc.

  30. #60
    Member
    Join Date
    May 2008
    Location
    Antwerp , country:Belgium , W.Europe
    Posts
    487
    Thanks
    1
    Thanked 3 Times in 3 Posts
    Quote Originally Posted by encode View Post
    I'm not sure that Malcolm is that advanced in BWT. He's about cool LZ and PPM-based stuff a la ROLZ and some PAQ snips like FPW and PWCM.
    His current Mcomp V2.0 seems to use 4 cores for BWT :
    http://www.msoftware.biz/blog/2008/0...cs#attachments

    Code:
    > mcomp_x32
    LibMComp Demo Compressor (v2.00).
    Copyright (c) 2008 M Software Ltd.
    mcomp [options] pofile(s)
    
    Options:
        -m[..]    Compression method:
                  b    - BZIP2.
                  c    - Experimental DMC codec.
                  d    - Optimised deflate (df - fast, dx - max)
                 d64  - Optimised deflate64 (d64f - fast, d64x - max)
                  lz   - Optimised LZ (lzf - fast, lzx - max)
                  f    - Optimised ROLZ (ff - fast, fx - max)
                  f3   - Optimised ROLZ3 (f3f - fast, f3x - max)
                  p    - PPMd var.J.
                  sl   - Bitstream (LSB first).
                  sm   - Bitstream (MSB first).
                  w    - Experimental BWT codec.
        -MNN[k,m] Model size (in kb (default) or Mb, default 64M).
        -oNN      Order (for Bitstream and PPMd).
        -np       Display no progress information.
    
    > mcomp_x32 -mw fp.log  fplog_mw.mcomp
    
    LibMComp Demo Compressor (v2.00).
    Copyright (c) 2008 M Software Ltd.
    Thread pool size = 4
    Using 1280Mb (4 threads)
    100%
    From 20617071 => To 559960
    Elapsed time: 5.355s      (3759.82kps)
    Last edited by pat357; 10th September 2008 at 18:37.

Page 2 of 3 FirstFirst 123 LastLast

Similar Threads

  1. BCM v0.09 - The ultimate BWT-based file compressor!
    By encode in forum Data Compression
    Replies: 22
    Last Post: 6th March 2016, 09:26
  2. BALZ - An Open-Source ROLZ-based compressor
    By encode in forum Data Compression
    Replies: 60
    Last Post: 6th March 2015, 16:47
  3. PPMX v0.05 - new PPM-based compressor
    By encode in forum Data Compression
    Replies: 49
    Last Post: 28th July 2010, 02:47
  4. BCM v0.08 - The ultimate BWT-based file compressor!
    By encode in forum Data Compression
    Replies: 78
    Last Post: 12th August 2009, 10:14
  5. DARK - a new BWT-based command-line archiver
    By encode in forum Forum Archive
    Replies: 138
    Last Post: 23rd September 2006, 21:42

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •