https://github.com/richox/orz
rewriting old libzling with rust-lang (just learned it for less than a month) with some optimizations, now decoding speed is 30% faster than libzling
https://github.com/richox/orz
rewriting old libzling with rust-lang (just learned it for less than a month) with some optimizations, now decoding speed is 30% faster than libzling
Bulat Ziganshin (7th March 2019),JamesWasil (25th December 2019),Mike (11th March 2018),Simorq (11th March 2018)
Written in Rust, but lots of code is marked as unsafe :]
due mainly to avoid runtime boundary checking, rust has no option to disable it![]()
sounds interesting
can you please upload a binary for windows x64 ?
cross-compiled under Mac, not tested. hopee it works
orz-v0.1.0-x86_64-pc-windows-gnu.exe
Simorq (12th March 2018)
So RichSelian, do you think that reducing offsets with 1-byte preceding context is the most practical option? I.e. you have the match finder with (256 buckets * a lot of entries). If you reduce offsets with 2 preceding bytes, you could instead have (65,536 buckets * a few entries). This second scheme could yield smaller reduced offset, that is, if you manage to find a match predicated on a two-byte preceding context (which must be harder than finding a match predicated on a one-byte context).
A compressor targeting high ratio can actually run both match finders: the cost of introducing another kind of match, to wit, predicated on a two-byte context, is roughly 1 bit per match, but the offset can be reduced by much more than one bit. This reminds me of Christian Martelock's RAZOR: he confirmed (more or less) that RAZOR runs separate match finders and encodes LZ and ROLZ matches with different symbols. I now suspect that he secretly runs three match finders.![]()
traditional LZ can be regarded as ROLZ with 0-bytes context i think
using multiple matchers may be useful for high compression ratio, but it needs to update all matches on every symbols. that will significantly slow down the speed. the ratio won't be so much better since most matches are duplicated in both matchers.
I think current ROLZ coding techniques can be combined with my Reduced Length LZ (RLLZ) idea to produce better compression ratios. Lucas proposes switching from LZ77 to LZ78 when needed:
https://encode.su/threads/3013-Reduc...put-LZ77-codes
The offsets are positions of strings in the file or a block large enough to have many similar strings but not too large so that offset_size remains short. Focus on one duplicated string, encode it one time. Then, for next occurrences of the string past the initial string, write offset. There is no need to output length code. Do this for all other duplicated strings.
Lastly, output the literals. So you have just an array of literals encoded. No output boolean prefix bit if it is literal, or match string. No literal length needed, unlike other LZ77 methods. Saves much output bits than ordinary LZ77/LZSS.
The strings are decodable, so are the literals. This is done by writing the literals last in the output or write buffer i.e. after all match strings are decoded. The literals nicely fit in the "holes" or gaps not occupied by the match strings (and they are exactly in their correct positions in the file)!
Decode Output buffer (one appropriately-sized block):
[STRING..STRING....STRING.STRING.....STRING.STRING]
(2 holes, 4 holes, 1 hole, 5 holes, 1 hole)
Traditional LZ77/LZSS assumes that each time you output a code for a literal or string you are exactly in the current position in the file. But if you back off a little, and see the whole file at once, you can actually defer outputting the literals.
Last edited by compgt; 2nd February 2020 at 16:52.