Page 2 of 3 FirstFirst 123 LastLast
Results 31 to 60 of 90

Thread: BCM 0.12 is here!

  1. #31
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    So time ago I did some FreeArc tests where I replaced LZP with REP in text mode and it worked about as well.

  2. #32
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,532
    Thanks
    755
    Thanked 674 Times in 365 Posts
    But the sort order for huffman codes is different from order
    of original symbol codes,
    may be i'm wrong, but you don't need any special algorithm to make it sorted. just an example:
    A - 1 (counter)
    B - 2
    C - 1

    canonical hiffman algorithm assigns the following codes
    A - 10 (2 bits)
    B - 0 (1 bit)
    C - 11 (2 bits)

    order-preseving huffman algorithm assigns the following codes
    A - 00 (2 bits)
    B - 01 (1 bit)
    C - 11 (2 bits)

    only two problems:
    first, as i already said - either large decodign tables or smaller tables + search
    second - we need to return 1 bit back after decoding of B. or, said other way, after dedoding of B we should sutract from bit buffer 0100000..00 (01 << 30 for 32-bit buffer) and then pop just 1 bit

    but i was wrong - it isn't huffman code and even not the prefix code at all. and this probably means that it cannot be sorted?

  3. #33
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,716
    Thanks
    271
    Thanked 1,185 Times in 656 Posts
    > If using higher than order-0 prefix codes, then we cannot sort
    > without decoding, I think (ie when using Huffman codes or something
    > like that).

    As a workaround, we just need to store context bytes with the pointers.
    Ie only if contexts match, we have to compare the actual data.

    Sure, if we'd use a full pointer table, there won't be any memory savings
    from the compression.
    But compressed data are still better for indexing/radix sorting/comparison.
    And its possible to use partial tables and multiple scans
    (which is necessary for MT anyway).

    > I have lots of ideas on how to improve my QRotSort algorithm, so
    > I'll postpone my master thesis to the end of holidays, so I'll have
    > time to implement those ideas.

    Well, as I mentioned, I'm not a BWT expert at all.
    I'm just trying to apply tricks from areas which I know.

    > May or may not. I'm pretty sure that LZP stage is order of magnitude
    > faster than BWT + QLFC stage, so one thread could do LZPing and
    > other threads/ cores could do further stages.

    Its not so easy, because memory interface is not as parallel as CPU,
    so if LZP would do a lot of random memory accesses in parallel with BWT,
    that won't be any good for overall speed.

  4. #34
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,716
    Thanks
    271
    Thanked 1,185 Times in 656 Posts
    Yes, generally huffman codes are out-of-order.
    And that's the only difference from "optimal ordered" actually,
    because when we can't reorder the symbols, we can compute the optimal codelength for any splitting of range X..Y
    into two branches with dynamic programming.

    There's also a simple way to build ordered bitcodes using a bit-aligned arithmetic coder, but these
    are less efficient than "optimal ordered".

  5. #35
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,532
    Thanks
    755
    Thanked 674 Times in 365 Posts
    we can compute the optimal codelength for any splitting of range X..Y
    into two branches with dynamic programming.
    isn't that an original Shannon-Fano algorithm? afaik it done exactly that - build codetree topdown, splitting all symbols into two almost equal subparts

  6. #36
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,532
    Thanks
    755
    Thanked 674 Times in 365 Posts
    I meant that LZP is relatively slow and symmetric - it
    has to do hash lookup on each data byte basically.
    i've improved Shkarin's algo a bit, making decompression 2x faster - it don't need to lookup memory when there is no match anyway

  7. #37
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,716
    Thanks
    271
    Thanked 1,185 Times in 656 Posts
    > isn't that an original Shannon-Fano algorithm?
    > afaik it done exactly that - build codetree topdown, splitting all symbols into two almost equal subparts

    SF splits the symbols near half of total freq, yeah.
    But its far from optimal - SF code for book1 is 490730 and "optimal ordered" - 461084.

    > i've improved Shkarin's algo a bit, making decompression 2x faster -
    > it don't need to lookup memory when there is no match anyway

    Cool I guess, but normally you either have to do lookups at each symbol
    (and read a len on a match), or encode match flags, which adds redundancy.

  8. #38
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,532
    Thanks
    755
    Thanked 674 Times in 365 Posts
    Cool I guess, but normally you either have to do lookups at each symbol
    (and read a len on a match), or encode match flags, which adds redundancy.
    and original LZP does both

  9. #39
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,716
    Thanks
    271
    Thanked 1,185 Times in 656 Posts
    Well, LZP is not Shkarin's idea (afaik its Bloom's - http://www.cbloom.com/src/index_lz.html)
    And as to Shkarin's LZP (http://www.ctxmodel.net/files/LZP-DS_v0.rar), I think that for BWT preprocessing
    its better to keep metainfo (match lens etc) separately from the block data.

  10. #40
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    869
    Thanks
    470
    Thanked 261 Times in 108 Posts
    Indeed, LZP was first published by Charles Bloom, quite some time ago.
    Btw, he recently wrote an article about all the LZP versions he coded :
    http://cbloomrants.blogspot.com/2011...-variants.html

  11. #41
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    287
    Thanks
    9
    Thanked 33 Times in 21 Posts
    -wrong info, can't remember-

    15 years are a long time
    Last edited by Sebastian; 5th June 2011 at 16:44.

  12. #42
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,000
    Thanks
    387
    Thanked 365 Times in 145 Posts

    Cool

    Finally, found an idea that works - special RLE model. As a note, all BCMs since v0.09 (year 2009) have the same CM structure, the only difference in a number of components and some CM tuning. BCM 0.13 is a next step. A have two basic configurations:

    One with BCM 0.11 complexity (about the same speed as BCM 0.11 or faster)
    And more complex one - a la BCM 0.08 (Faster than BCM 0.0

    Both versions now beat BBB and BSC at ENWIK9 compression. The only question is how fast new BCM should be. BCM 0.11-like is fast enough, but a more complex version achieves slightly higher compression at the cost of an extra compression time, anyway, it's still faster than BCM 0.08!

    Will post exact numbers later.

  13. #43
    Programmer
    Join Date
    May 2008
    Location
    PL
    Posts
    309
    Thanks
    68
    Thanked 173 Times in 64 Posts
    IMHO, BCM should have about 5 compression levels (from the fastest to the best compression)

  14. #44
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,479
    Thanks
    26
    Thanked 122 Times in 96 Posts
    IMHO, BCM should have about 5 compression levels (from the fastest to the best compression)
    +1
    Additionally, those levels should not only differ by used constants but also by algorithms.

  15. #45
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,716
    Thanks
    271
    Thanked 1,185 Times in 656 Posts
    So is that "RLE" just a runlength length context, or a real RLE?
    ie do you still process 8*N bits with rangecoder?

  16. #46
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,000
    Thanks
    387
    Thanked 365 Times in 145 Posts
    Quote Originally Posted by Shelwien View Post
    So is that "RLE" just a runlength length context, or a real RLE?
    ie do you still process 8*N bits with rangecoder?
    It's kind of a state machine. Ans yes, I process 8 bits per input byte - i.e. pure CM. My goal was to achieve such result with an improved CM. I've done lots of experiments with straight RLE-schemes. RLE has some benefits and drawbacks... Just continue experimenting...

    5 different compression modes means 5 different encoders and decoders - kind of 5 different programs. And CM is not an LZ77 algorithm - different constants will not affect the speed. For different performance, people already have BZIP2 and BSC. BCM must introduce something new - a high BWT compression in reasonable time - i.e. better compression compared to BSC and still fast enough to be practical...

  17. #47
    Programmer
    Join Date
    May 2008
    Location
    PL
    Posts
    309
    Thanks
    68
    Thanked 173 Times in 64 Posts
    Quote Originally Posted by encode View Post
    5 different compression modes means 5 different encoders and decoders - kind of 5 different programs.
    It's not true. You can use the "switch" command to encode bits using from 1 (fastest) to 5 (slowest) contexts (it works well for BCM 0.07

  18. #48
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,000
    Thanks
    387
    Thanked 365 Times in 145 Posts
    For many months just fighting for more efficient CM model making it as simple as possible.

    Current results on book1:

    1-counter -> 216,334 bytes
    2-counters -> 212,765 bytes

    No SSE, just static mixing and new counters.
    The speed is many times faster than original BCM and probably faster than BSC. I just need more compression! The massive research goes on...

  19. #49
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,000
    Thanks
    387
    Thanked 365 Times in 145 Posts

    Cool

    I noticed that nearly two years passed by since the last version of BCM has been released.

    Well, all that time I'm keep searching...

    Now I have more CPU power - even for brute-force tuning on ENWIK8. I collected the brute-force optimized results for nearly all model combinations for many files. Finding weak and strong sides of my CM, keeping the things that work and getting rid of redundant and useless model parts... Currently my PC working 24/7...

    During this time I have found a few interesting things and improved what I've got. The tuned FSM-based model is a nice one! Anyway, I'm planning to release a new BCM/BWT-based compressor this summer! My program MUST beat BBB and BSC on ENWIK9! As a note, my current FSM has complexity of one counter and the ENWIK9 result is 165,xxx,xxx bytes - pretty close to the original BCM that has complexity of 4 to 12 counters! This means it is 4 to 12 times faster...


  20. #50
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,479
    Thanks
    26
    Thanked 122 Times in 96 Posts
    Beware of overtuning

  21. #51
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 795 Times in 488 Posts
    Isn't the whole point to be tops on the benchmarks?

  22. #52
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,000
    Thanks
    387
    Thanked 365 Times in 145 Posts
    Got something interesting and I guess I'll release a new program soon.

    Despite it's CM (it processes 8-bits per input byte) new program is about two times faster than BSC (in fastest mode)

    Compression is nice, but generally lower than BSC - it's the only thing that holds me to release this new beast.

    Current results:
    * These results are not about tuning for each file, the tuning was performed on number of files and these are results of one exact build

    ENWIK8 -> 20,952,954 bytes
    ENWIK9 -> 165,519,859 bytes

    book1 -> 212,536 bytes
    calgary.tar -> 795,632 bytes

    world95.txt -> 473,778 bytes
    fp.log -> 557,883 bytes

    dickens -> 2,257,430 bytes
    bookstar -> 9,241,794 bytes

    An unoptimized Visual Studio build compresses ENWIK8 within 6 seconds, BSC in fast mode compress ENWIK8 in 12 seconds (with or without multi-threading, since for a single block compression multi-threading not helps here)...

    And this was expected, since new program is so simple - no SSE, no Adaptive Mixing - just "secret formula" FSM!

  23. #53
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 795 Times in 488 Posts
    Dictionary preprocessing?

  24. #54
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    869
    Thanks
    470
    Thanked 261 Times in 108 Posts
    I guess it might be an FSM with a small number of states for cache efficiency.
    This would explain the speed.

    The "secret part" is probably the way state-change is tuned,
    probably optimised through heavy GA using a few sample files.

    At least, that's what Shelwien would advise ...

  25. #55
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,000
    Thanks
    387
    Thanked 365 Times in 145 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Dictionary preprocessing?
    No dictionary preprocessing, no alphabet reordering or other tricks/preprocessing - just plain modeling.

  26. #56
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,000
    Thanks
    387
    Thanked 365 Times in 145 Posts
    And if we return to that "butterfly4.bmp" file, there BCM was superior:

    BSC 3.0.0 -m0 -> 5,032,595
    BSC 3.0.0 -m3 -> 5,108,972
    BSC 3.0.0 -m6 -> 5,030,382
    BCM 0.12 -> 4,766,017
    FSM -> 4,783,604

    i.e. on par with BCM being many times faster...

  27. #57
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,000
    Thanks
    387
    Thanked 365 Times in 145 Posts
    Also this new thing benefits on highly redundant data, compared to BCM.

    Tuned result for fp.log is about 540,xxx bytes, the current build result is:

    FSM -> 557,883 bytes
    BCM 0.12 -> 558,328 bytes

    I think that's why it's so damn close (in my opinion) to BCM on ENWIK9.

  28. #58
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,479
    Thanks
    26
    Thanked 122 Times in 96 Posts
    Did you include blocksorting time in your comparisons? Ie if we count the time taken to transform verbatim original data into final compressed form then what is the speed difference between last released BCM and WIP version of BCM?

  29. #59
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,000
    Thanks
    387
    Thanked 365 Times in 145 Posts
    Current comparison chart (ENWIK:

    BCM 0.08 -> 29 sec
    BCM 0.09 -> 42 sec
    BCM 0.12 -> 20 sec (Fastest BCM yet)
    BSC 3.0.0 -> 12 sec
    BWTANK -> 6 sec

    i.e. it's not a WIP BCM - it's a new program called BWTANK. (This name was classified, until now ) Also, it's unoptimized VC compile, other executables are highly optimized Intel C++ compiles. Anyway, the final release might be slightly slower, but with a higher compression. It would be nice to beat BSC on ENWIK9, being faster by some margin... or to be in about the same compression and be faster by a bigger margin

  30. #60
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,000
    Thanks
    387
    Thanked 365 Times in 145 Posts
    With BCM, the sorting (BWT stage) time was nothing compared to actual CM encoding. Here, sorting takes much longer, compared to actual encoding. This makes things interesting, since the decompression became insanely fast (close to LZMA or BZIP2?)... Anyway, no surprises, since it was not a challenge to achieve this, it was a BATTLE! And hence BWTANK! A battle tank, battle machine! Anyway, this battle continues! Today I'll upgrade my PC with new motherboard to push CPU frequencies above 5 GHz!

Page 2 of 3 FirstFirst 123 LastLast

Similar Threads

  1. BCM v0.10 is here!
    By encode in forum Data Compression
    Replies: 45
    Last Post: 20th June 2010, 22:39
  2. BCM v0.06,0.07 is here! [!]
    By encode in forum Data Compression
    Replies: 34
    Last Post: 31st May 2009, 17:39
  3. BCM v0.05 is here! [!]
    By encode in forum Data Compression
    Replies: 19
    Last Post: 8th March 2009, 22:12
  4. BCM v0.04 is here! [!]
    By encode in forum Data Compression
    Replies: 64
    Last Post: 5th March 2009, 17:07
  5. BCM v0.03 is here! [!]
    By encode in forum Data Compression
    Replies: 25
    Last Post: 14th February 2009, 15:42

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •