I am releasing a small demo, program does not involve BWT transform yet (hopefully in some future version). Also there is no RLE or context mixing yet and compressor does not use any filters.
It may fail on some files due to lack of thorough testing.
Qlfc is computed via MTF and the result is transformed into (now static) single level tree structure and compressed with bitwise EC. EC(AC) is probably the slowest part, I should replace it with some faster RC in future.
BTW for some strange reason Windows cross-compiled version seems almost twice as fast on Linux under Wine than the native Linux binary. (g++ (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1 vs i686-w64-mingw32-g++ (GCC) 4.7.3 20121208 with same code optimizations)
unBWT format is unknown.
Where is first index? How many bytes?
Is it sentinel version or non-sentinel version?
That's why you should used BWTS see Yaya's version for fastest. It's a purely bijective transform that is a permutation that is reversible and unique no index or sentinel needed
Added a simple RLE, there is a slight boost in compression, though more noticeable I think is the speed-up. Even though the memory requirements have increased, so it may be slower on some small files in the end. I am thinking of some advanced RLE, but I will have to sort the memory out.
Originally Posted by xezz
unBWT format is unknown.
Where is first index? How many bytes?
Is it sentinel version or non-sentinel version?
There is no BWT transform involved as of yet, but hopefully in the future. You can use BWTS or what suits you. Notice it's still experimental and I didn't have much time to test it more deeply, so if decompression fails, please let me know.
the first is first cut of a paper rejected by those who may not have understood the basic idea
the second is paper actully submitted at a string conference on prague and is more polished
@xezz this is probably due to header parsing issue, or a footer in this case, when input alphabet is too small. I will try to fix it.
@biject.bwts thanks to note, at this point my footer is not compressed and it contains some redundancy, anyway it could be useful later on to squeez some more bytes.
Well this should possibly fix the footer parsing issue. Now as the footer structure has changed and grown a bit larger, especially if the file is small, the backward compatibility will not be retained.
Also it seems a better RLE is possible, but it will require either more memory or time. If I manage to resolve memory allocation issue, it could run at about the same speed, anyway I'd like to add a new compression mode.
qad and dec files are from qad. bwt and unbwt files are from libdifsufsort examples.
I've modified libdivsufsort to accept blocks above 1e9 bytes so enwik9 is sorted in single block.
Thanks for testing Pioter, I am quite surprised with enwik9 result myself, haven't tried it on such a big block.
Here is a new version with slightly improved RLE(as default compression mode), it is yet not optimized for speed nor compression ratio and it's noticeably slower also it may fail on some files. But from now, you can try to use either c/c2 switch for default or faster mode. BTW there is no context mixing yet, but I hope I will get to it one day too.
Thanks for tips, No only heard of ANS, yet about FSE I hear the first time. So not sure how it performs, but I may try to use it, if its good comparative.
it's a huge area. yoг can find FSE topic here, also Charles Bloom and Cyan wrote a lot about ANS codecs in their blogs. there is also binary ANS codecs, not sure whether they are better than AC ones
I read that FSE is block based, possibly not the best fit for adaptive compression, it may be possible to make adaptive ANS. But for now, here is a verson, which utilises a modification of Subbotin RC (there is no source included yet, because I need to clean up the code a bit).
Here is some comparision:
enwik8bwt
Entropy coder def.mode c.time fast mode c.time
30-bit binary adaptive AC 21182969 1m15.219s 21215914 0m59.563s
bit-aligned output
64-bit Subbotin RC mod 21183433 1m5.895s 21216348 0m49.900s
byte-aligned output
32-bit Subbotin RC mod 21184310 0m55.034s 21217145 0m40.326s
bit-aligned output
32-bit Subbotin RC mod 21186080 0m50.673s 21218792 0m36.456s
nibble-aligned output
64-bit RC slowness is caused by that it's not actually 64bit executable, only an emulation, 64 bit binary should be faster.
Also it seems default mode could ran at about same speed as fast mode, with only slight penalty in compression ratio, therefore I might get rid of that fast mode in future. I have one RLE improvement in my mind right now, but need to write a decoder.
Alright the new version is up and the licence now changes to GNU GPL,so the source is included.
It slightly improves again(well not that much) RLE, and you can choose at compile time to build a 32bit or 64bit range coder by Build32bit / Build64bit macros. Also the same way, you can turn on/of latest two RLE enhancements, for older one I have to create.
There is still some room for improvement, because RLE improves e.g. with recursion, but I think it may be more interesting to try it with some CM.
To sum up things, I am using a static cyclic tree to encode MTF/qlfc output and than I compress the tree branches, and states(for example, when symbol freq. increases, symbols are less likely to repeat and so on, well not very much like in a CM manner though) using RLE, the final stage is an entropy coding.
Source is not much clear yet or descriptive I suppose, it's like a one block of code yet, sorry about that.
Also yet I don't have 64bit Linux, so the binary is just an emulation, don't know if will compile at all.
BTW for some strange reason Windows cross-compiled version seems almost twice as fast on Linux under Wine than the native Linux binary. (g++ (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1 vs i686-w64-mingw32-g++ (GCC) 4.7.3 20121208 with same code optimizations)
Code:
/*
Linux compile example:
g++ -c qad.cpp && g++ -g -O3 qad.o && time ./a.out c bwt bwt.qad
/opt/mingw32/bin/i686-w64-mingw32-g++ -O3 -s -o qad.exe qad.cpp && time wine qad.exe c enwik9bwt qad
*/
If that's the line you used to compile the Linux binary, I can tell you why it is slow. When you invoke g++ the first time, it compiles the c++ and that's when you need to pass -O3. The second invocation just links and -O3 does nothing.
Here's a simple Makefile that works in Linux. It's not doing much, but it allows you to build with a simple and standard `make`, which requires no reading and almost no thought.
You could incorporate the testing command line into the Makefile, which is common and is usually put under a target "test," so you can `make test`. I might edit this message later and add it for you.
By the way, I could not get it to decompress successfully. It gave me back a zero-length file. It's possible that I'm doing something wrong, but it's hard to see what that might be. It's late. I might try again tomorrow.
PS. I noticed that I used -O2 in the Makefile. The last time I compared them, -O2 was consistently faster than -O3, so that's what I always use, but if you have found that -O3 is actually faster here, please change that line and write a post about it. I'd be curious.
If that's the line you used to compile the Linux binary, I can tell you why it is slow. When you invoke g++ the first time, it compiles the c++ and that's when you need to pass -O3. The second invocation just links and -O3 does nothing.
Yes exactly, thanks I will include it. I come to it just as I was trying to attach divsufsort.
And as I just tested, O2 really seems slightely faster. It's not much speed oriented yet, it's a single threaded version, but before this 64bit emulation was crashing linux binary, and now it works and it's even faster than windows one, but that is maybe because of wine emulation.
Latest version attaches divsufsort 2.0.1, so it maybe easier to use, even though memory req. arose, it has some pros and cons. I was trying to keep memory usage low, but at least now I have more memory to play with. Meanwhile I was trying to do some tree branches weighting, than I realised it's more like a fitting for a particular situation(like small files versus large one), so I will posibly leave this for any possible later CM stage.
BTW yet I have not tried any preprocessors yet, all of them I just tried like drt e.g. crashes on linux, maybe there was some wrong character encoding.
A new version is up, I have removed one constant for rank to RLE context selection on default mode ( before this worked only for ranks of value of static tree size and above ). Now all ranks are used for selection, except for the first 3, which are itself selected the other way around( from RLE ), skipping first 3 ranks may hurt compression on small files a bit, but the change should help overall.
Right now I am thinking how to make use of some better prediction for final context encoding. As now there is only probability halving from time to time, and one hash to trigger cum. prob. re-scale for most used context.
It seems, the change from previous version turned out to be highly input data sensitive. And due to very rough freq. distribution approximation I make, I stripped it of default compression mode and added it(slightly modified) as an experimental filter under cf archive switch/option, but the archive compressed on default mode with version 0.8 should be still compatible with this version too. Fast mode archives from 0.7, onward.
Hello, BWT transform is already there from version 0.7 onward. Sorry if info was wrong. I will probably add there some switch in the future to skip this transform, so i.e. people could try a different type of sorting.
Also I don't know how that happened, but I left there some debugging info on decoder side at 0.9, but this should now be fixed.
thanks for the info..
When I run "qad c app.tar app.qad" both the 32-bit and 64-bit windows compiles generate an archive of 0 bytes in size.
This also happens with other input files.
OS is Win 8.1 x64
thanks for the info..
When I run "qad c app.tar app.qad" both the 32-bit and 64-bit windows compiles generate an archive of 0 bytes in size.
This also happens with other input files.
OS is Win 8.1 x64
This also happens in 32 bit Vista and 64 bit Ubuntu in Wine 1.6. The 64 bit Linux executable complains of missing libdivsufsort.so.3.
Interesting, If it's so I may try to include compiled library it into next project as well. I don't know how for windows though, I didn't had much success compiling libdivsufsort dll on Linux. But thanks for testing, I will try to get some 64bit OS too.
BTW I just found an annoying memory management bug on the decoder side, which I probably made when I started to play with filter in version 0.8. So for next version I plan to slow down decoding a bit, in default mode, it will probably use more memory for safety reasons. I will try to optimize it later.
So the version 1.0 is out, it has now slightly slower decompression with higher memory consumption, I was trying to bit simplify the previous one . I have used a completely new algorithm for previous filter though, which I think is now more stable and it works better in general.
First I didn't want to move it directly under default mode but since it changes workflow quite a bit. I did it so, and again backward compatibility is not maintained.
Now under the default mode c, is new compression mode, and cf is a bit improved version.
Default one removes low limit for context selection for first three ranks, according RLE, so that there is no context selection for ranks from RLE at all, which speeds up encoder, decoder a little (on both default c or cf mode), although compression gains seems better.
Default mode has actually some constant value left still, i.e. which is the top limit for context selection, but as I got to know, I can use previous filter to set this. The problem might be that old algorithm yet don't give always a good or proper estimations, so it is cut-off to not cross size of the tree, to ease up decoding, but that can affect compression ratio negatively for some files.
The difference is quite small
Enwik8 old filter (=cf option from 0.9)
21165185
New filter, constant top limit for CTX selection according size of the tree and the rank.
21147089
New filter, with top limit according freq. estimation (not really good) of the previous algo.
21140073 estimated top limit (which is under cf flag)
Interesting, If it's so I may try to include compiled library it into next project as well. I don't know how for windows though, I didn't had much success compiling libdivsufsort dll on Linux. But thanks for testing, I will try to get some 64bit OS too.
BTW I just found an annoying memory management bug on the decoder side, which I probably made when I started to play with filter in version 0.8. So for next version I plan to slow down decoding a bit, in default mode, it will probably use more memory for safety reasons. I will try to optimize it later.
I was able to compile and statically link libdivsufsort for Windows when I posted some code here before. I must still have the code somewhere.
Well thanks, I tested it in 32bit windows, vmplayer and it seems I already linked it there. So the sudden program exit, might be connected to not heving enough free memory, I am not sure though. I will add there some notification also will try to reduce memory usage because it already should have enough, as to compute BWT.
I was thinking how to improve the range coder predictions i.e for RLE ( or tree branches and leaves contexts).
At present RC maintain mostly only indirect statistics, i.e while RLE feeds range coder with bit-codes of length 3, RC maintains only isolated statistics for those length 3 bit-codes. But what actually is missing is higher order, or a simple order 0.
Although I am not sure if it's a good idea to combine these, because even order 0 could be quite a big list, i.e. if RLE inputs some big integers, and the trade-off might not be worth it.
qad 1.0 still gives me a 0 size compressed file in both 32 bit Windows and 64 bit Linux under Wine 1.6. The 32 and 64 bit Linux versions still complain it can't find libdivsufsort.so.3. I tried:
qad cf enwik8 enwik8.qad
qad c enwik8 enwik8.qad
qad c2 enwik8 enwik8.qad