So time ago I did some FreeArc tests where I replaced LZP with REP in text mode and it worked about as well.
So time ago I did some FreeArc tests where I replaced LZP with REP in text mode and it worked about as well.
may be i'm wrong, but you don't need any special algorithm to make it sorted. just an example:But the sort order for huffman codes is different from order
of original symbol codes,
A - 1 (counter)
B - 2
C - 1
canonical hiffman algorithm assigns the following codes
A - 10 (2 bits)
B - 0 (1 bit)
C - 11 (2 bits)
order-preseving huffman algorithm assigns the following codes
A - 00 (2 bits)
B - 01 (1 bit)
C - 11 (2 bits)
only two problems:
first, as i already said - either large decodign tables or smaller tables + search
second - we need to return 1 bit back after decoding of B. or, said other way, after dedoding of B we should sutract from bit buffer 0100000..00 (01 << 30 for 32-bit buffer) and then pop just 1 bit
but i was wrong - it isn't huffman code and even not the prefix code at all. and this probably means that it cannot be sorted?
> If using higher than order-0 prefix codes, then we cannot sort
> without decoding, I think (ie when using Huffman codes or something
> like that).
As a workaround, we just need to store context bytes with the pointers.
Ie only if contexts match, we have to compare the actual data.
Sure, if we'd use a full pointer table, there won't be any memory savings
from the compression.
But compressed data are still better for indexing/radix sorting/comparison.
And its possible to use partial tables and multiple scans
(which is necessary for MT anyway).
> I have lots of ideas on how to improve my QRotSort algorithm, so
> I'll postpone my master thesis to the end of holidays, so I'll have
> time to implement those ideas.
Well, as I mentioned, I'm not a BWT expert at all.
I'm just trying to apply tricks from areas which I know.
> May or may not. I'm pretty sure that LZP stage is order of magnitude
> faster than BWT + QLFC stage, so one thread could do LZPing and
> other threads/ cores could do further stages.
Its not so easy, because memory interface is not as parallel as CPU,
so if LZP would do a lot of random memory accesses in parallel with BWT,
that won't be any good for overall speed.
Yes, generally huffman codes are out-of-order.
And that's the only difference from "optimal ordered" actually,
because when we can't reorder the symbols, we can compute the optimal codelength for any splitting of range X..Y
into two branches with dynamic programming.
There's also a simple way to build ordered bitcodes using a bit-aligned arithmetic coder, but these
are less efficient than "optimal ordered".
isn't that an original Shannon-Fano algorithm? afaik it done exactly that - build codetree topdown, splitting all symbols into two almost equal subpartswe can compute the optimal codelength for any splitting of range X..Y
into two branches with dynamic programming.
i've improved Shkarin's algo a bit, making decompression 2x faster - it don't need to lookup memory when there is no match anywayI meant that LZP is relatively slow and symmetric - it
has to do hash lookup on each data byte basically.
> isn't that an original Shannon-Fano algorithm?
> afaik it done exactly that - build codetree topdown, splitting all symbols into two almost equal subparts
SF splits the symbols near half of total freq, yeah.
But its far from optimal - SF code for book1 is 490730 and "optimal ordered" - 461084.
> i've improved Shkarin's algo a bit, making decompression 2x faster -
> it don't need to lookup memory when there is no match anyway
Cool I guess, but normally you either have to do lookups at each symbol
(and read a len on a match), or encode match flags, which adds redundancy.
and original LZP does bothCool I guess, but normally you either have to do lookups at each symbol
(and read a len on a match), or encode match flags, which adds redundancy.
Well, LZP is not Shkarin's idea (afaik its Bloom's - http://www.cbloom.com/src/index_lz.html)
And as to Shkarin's LZP (http://www.ctxmodel.net/files/LZP-DS_v0.rar), I think that for BWT preprocessing
its better to keep metainfo (match lens etc) separately from the block data.
Indeed, LZP was first published by Charles Bloom, quite some time ago.
Btw, he recently wrote an article about all the LZP versions he coded :
http://cbloomrants.blogspot.com/2011...-variants.html
-wrong info, can't remember-
15 years are a long time![]()
Last edited by Sebastian; 5th June 2011 at 16:44.
Finally, found an idea that works - special RLE model. As a note, all BCMs since v0.09 (year 2009) have the same CM structure, the only difference in a number of components and some CM tuning. BCM 0.13 is a next step. A have two basic configurations:
One with BCM 0.11 complexity (about the same speed as BCM 0.11 or faster)
And more complex one - a la BCM 0.08 (Faster than BCM 0.0
Both versions now beat BBB and BSC at ENWIK9 compression. The only question is how fast new BCM should be. BCM 0.11-like is fast enough, but a more complex version achieves slightly higher compression at the cost of an extra compression time, anyway, it's still faster than BCM 0.08!
Will post exact numbers later.
IMHO, BCM should have about 5 compression levels (from the fastest to the best compression)
+1IMHO, BCM should have about 5 compression levels (from the fastest to the best compression)
Additionally, those levels should not only differ by used constants but also by algorithms.
So is that "RLE" just a runlength length context, or a real RLE?
ie do you still process 8*N bits with rangecoder?
It's kind of a state machine. Ans yes, I process 8 bits per input byte - i.e. pure CM. My goal was to achieve such result with an improved CM. I've done lots of experiments with straight RLE-schemes. RLE has some benefits and drawbacks... Just continue experimenting...
5 different compression modes means 5 different encoders and decoders - kind of 5 different programs. And CM is not an LZ77 algorithm - different constants will not affect the speed. For different performance, people already have BZIP2 and BSC. BCM must introduce something new - a high BWT compression in reasonable time - i.e. better compression compared to BSC and still fast enough to be practical...
For many months just fighting for more efficient CM model making it as simple as possible.
Current results on book1:
1-counter -> 216,334 bytes
2-counters -> 212,765 bytes
No SSE, just static mixing and new counters.
The speed is many times faster than original BCM and probably faster than BSC. I just need more compression! The massive research goes on...
I noticed that nearly two years passed by since the last version of BCM has been released.
Well, all that time I'm keep searching...
Now I have more CPU power - even for brute-force tuning on ENWIK8. I collected the brute-force optimized results for nearly all model combinations for many files. Finding weak and strong sides of my CM, keeping the things that work and getting rid of redundant and useless model parts... Currently my PC working 24/7...
During this time I have found a few interesting things and improved what I've got. The tuned FSM-based model is a nice one! Anyway, I'm planning to release a new BCM/BWT-based compressor this summer! My program MUST beat BBB and BSC on ENWIK9! As a note, my current FSM has complexity of one counter and the ENWIK9 result is 165,xxx,xxx bytes - pretty close to the original BCM that has complexity of 4 to 12 counters! This means it is 4 to 12 times faster...
![]()
Beware of overtuning![]()
Isn't the whole point to be tops on the benchmarks?![]()
Got something interesting and I guess I'll release a new program soon.
Despite it's CM (it processes 8-bits per input byte) new program is about two times faster than BSC (in fastest mode)
Compression is nice, but generally lower than BSC - it's the only thing that holds me to release this new beast.
Current results:
* These results are not about tuning for each file, the tuning was performed on number of files and these are results of one exact build
ENWIK8 -> 20,952,954 bytes
ENWIK9 -> 165,519,859 bytes
book1 -> 212,536 bytes
calgary.tar -> 795,632 bytes
world95.txt -> 473,778 bytes
fp.log -> 557,883 bytes
dickens -> 2,257,430 bytes
bookstar -> 9,241,794 bytes
An unoptimized Visual Studio build compresses ENWIK8 within 6 seconds, BSC in fast mode compress ENWIK8 in 12 seconds (with or without multi-threading, since for a single block compression multi-threading not helps here)...
And this was expected, since new program is so simple - no SSE, no Adaptive Mixing - just "secret formula" FSM!
Dictionary preprocessing?
I guess it might be an FSM with a small number of states for cache efficiency.
This would explain the speed.
The "secret part" is probably the way state-change is tuned,
probably optimised through heavy GA using a few sample files.
At least, that's what Shelwien would advise ...![]()
And if we return to that "butterfly4.bmp" file, there BCM was superior:
BSC 3.0.0 -m0 -> 5,032,595
BSC 3.0.0 -m3 -> 5,108,972
BSC 3.0.0 -m6 -> 5,030,382
BCM 0.12 -> 4,766,017
FSM -> 4,783,604
i.e. on par with BCM being many times faster...
Also this new thing benefits on highly redundant data, compared to BCM.
Tuned result for fp.log is about 540,xxx bytes, the current build result is:
FSM -> 557,883 bytes
BCM 0.12 -> 558,328 bytes
I think that's why it's so damn close (in my opinion) to BCM on ENWIK9.
Did you include blocksorting time in your comparisons? Ie if we count the time taken to transform verbatim original data into final compressed form then what is the speed difference between last released BCM and WIP version of BCM?
Current comparison chart (ENWIK:
BCM 0.08 -> 29 sec
BCM 0.09 -> 42 sec
BCM 0.12 -> 20 sec (Fastest BCM yet)
BSC 3.0.0 -> 12 sec
BWTANK -> 6 sec
i.e. it's not a WIP BCM - it's a new program called BWTANK. (This name was classified, until now) Also, it's unoptimized VC compile, other executables are highly optimized Intel C++ compiles. Anyway, the final release might be slightly slower, but with a higher compression. It would be nice to beat BSC on ENWIK9, being faster by some margin... or to be in about the same compression and be faster by a bigger margin
With BCM, the sorting (BWT stage) time was nothing compared to actual CM encoding. Here, sorting takes much longer, compared to actual encoding. This makes things interesting, since the decompression became insanely fast (close to LZMA or BZIP2?)... Anyway, no surprises, since it was not a challenge to achieve this, it was a BATTLE! And hence BWTANK! A battle tank, battle machine! Anyway, this battle continues! Today I'll upgrade my PC with new motherboard to push CPU frequencies above 5 GHz!