I would normally never publish unfinished project, but since completing it will still take a *long* time and the results looks really promising for such a state, I decided to share this unfinished preliminary demo for anyone interested. I literally only finished (read "half-backed", unoptimized, without crc etc) compression stage few minutes ago. So decompression stage is not yet even implemented, but as I backtested each stage individually on small sets during development it should work fine once done.
So why even bother? Well so far, at least on my few small samples, HBA is beating LZMA on every single one. Sometimes badly. For example:
Code:
"Mamei64.exe" v0.174, original size 126mb, "7zip d64m mc32 lc4" => 26mb, hba => 11mb.
Yeah, thats ~57% difference. Bug? I doubt, but only once decompression is fully implemented I will be able to say. And this goes on, random .bmp ~3.2mb > 7zip: 2.6mb => HBA: ~800kb. You can test for yourself as I attached exe down there. Just keep it to smaller files, It wont even work(for now) on too big(did not bother with 64bit integers, will have to change it) and is anyway intended to be used with long distance dedups(I also plan to write for it one day). Meanwhile HBA use up to few MB's constant buffer at most, no dictionary. And it is not a linear time cmp/decmp like a cm's family. Oh I forgot:
Code:
Silesia(211,939,002) => HBA => 22,664,790 - in 109seconds at ~1898kb/s because BWT is slowing it badly. Of course, this is raw without any extra bytes(crc, header etc..).
So what is it really? Well I got one idea about month ago. In general HBA is a BWT+MTF based compressor with custom arithmetic coder designed specifically for it and special general purpose data filtering/handling(NOT specialized like for wav's, pics etc for example, like other codecs do). It turned out that without custom arithmetic(in my early tests), it would not show any better performance than 7zip/lzma(and in fact was slightly worse, I almost deleted whole project), but with it it made all the difference(however on its own in reverse, AC alone is of no use). So it need to be executed on all set of ideas. Also since this only require small buffer, therefore can(in the future) be fully parallelized(ideally on GPU if I can make it that far).
Main idea of HBA will remain undisclosed for now, first I need to finish decompression stage and properly test everything before I can even assess its value.
Final words:
- This is NOT an April joke, but again only when decompression is fully working we can assess 100% functionality. Also, maybe I was lucky in my 3-4 tests and it is mediocre after all.
- Currently horribly unoptimized everywhere and BWT is 90%+ of slow down(just qsort with memcmp). It only use 16k buffer ATM with ~1.6m/s on my 4.2ghz haswell(1T). Without BWT I get about 8mb/s already(1T).
- Buffer is too small, bigger buffer should improve compression further.
- I still have some math idea based on current one that should improve compression further.
- Decompression is not implemented yet, dont bother with "hba d file file".
- To be honest, I myself am skeptical ATM as it looks too good(at times) to be true. I dont want to come out look badly if I discover different "reality"(during decompression implementation for example), so lets keep real expectations for now. That custom Arithmetic coder is deciding factor here as all other stages should be 99% ok(I hope).
- In the future, HBA can be made parallel, most likely GPU capable(fully I think, not just some stages), significantly quicker and likely with even better compression(through many different improvements). So what you see is merely 1st attempt. But I welcome your data results.
- I need a lot more time.
EDIT: Please read my next post below, I had indeed bugs in A encoder which resulted in better compression ratio appearance. As of now, published HBA doesn't work so I deleted file link.