Results 1 to 8 of 8

Thread: orz - an optimized ROLZ data-compressor written in rust

  1. #1
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    171
    Thanks
    20
    Thanked 61 Times in 30 Posts

    Smile orz - an optimized ROLZ data-compressor written in rust

    https://github.com/richox/orz
    rewriting old libzling with rust-lang (just learned it for less than a month ) with some optimizations, now decoding speed is 30% faster than libzling

  2. Thanks (4):

    Bulat Ziganshin (7th March 2019),JamesWasil (25th December 2019),Mike (10th March 2018),Simorq (11th March 2018)

  3. #2
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,496
    Thanks
    26
    Thanked 132 Times in 102 Posts
    Written in Rust, but lots of code is marked as unsafe :]

  4. #3
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    171
    Thanks
    20
    Thanked 61 Times in 30 Posts
    due mainly to avoid runtime boundary checking, rust has no option to disable it

  5. #4
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    412
    Thanks
    38
    Thanked 64 Times in 38 Posts
    sounds interesting
    can you please upload a binary for windows x64 ?

  6. #5
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    171
    Thanks
    20
    Thanked 61 Times in 30 Posts
    cross-compiled under Mac, not tested. hopee it works
    orz-v0.1.0-x86_64-pc-windows-gnu.exe

  7. Thanks:

    Simorq (12th March 2018)

  8. #6
    Member
    Join Date
    Jan 2017
    Location
    Selo Bliny-S'edeny
    Posts
    24
    Thanks
    7
    Thanked 10 Times in 8 Posts
    So RichSelian, do you think that reducing offsets with 1-byte preceding context is the most practical option? I.e. you have the match finder with (256 buckets * a lot of entries). If you reduce offsets with 2 preceding bytes, you could instead have (65,536 buckets * a few entries). This second scheme could yield smaller reduced offset, that is, if you manage to find a match predicated on a two-byte preceding context (which must be harder than finding a match predicated on a one-byte context).

    A compressor targeting high ratio can actually run both match finders: the cost of introducing another kind of match, to wit, predicated on a two-byte context, is roughly 1 bit per match, but the offset can be reduced by much more than one bit. This reminds me of Christian Martelock's RAZOR: he confirmed (more or less) that RAZOR runs separate match finders and encodes LZ and ROLZ matches with different symbols. I now suspect that he secretly runs three match finders.

  9. #7
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    171
    Thanks
    20
    Thanked 61 Times in 30 Posts
    Quote Originally Posted by svpv View Post
    So RichSelian, do you think that reducing offsets with 1-byte preceding context is the most practical option? I.e. you have the match finder with (256 buckets * a lot of entries). If you reduce offsets with 2 preceding bytes, you could instead have (65,536 buckets * a few entries). This second scheme could yield smaller reduced offset, that is, if you manage to find a match predicated on a two-byte preceding context (which must be harder than finding a match predicated on a one-byte context).

    A compressor targeting high ratio can actually run both match finders: the cost of introducing another kind of match, to wit, predicated on a two-byte context, is roughly 1 bit per match, but the offset can be reduced by much more than one bit. This reminds me of Christian Martelock's RAZOR: he confirmed (more or less) that RAZOR runs separate match finders and encodes LZ and ROLZ matches with different symbols. I now suspect that he secretly runs three match finders.
    traditional LZ can be regarded as ROLZ with 0-bytes context i think
    using multiple matchers may be useful for high compression ratio, but it needs to update all matches on every symbols. that will significantly slow down the speed. the ratio won't be so much better since most matches are duplicated in both matchers.

  10. #8
    Member
    Join Date
    Sep 2018
    Location
    Philippines
    Posts
    121
    Thanks
    31
    Thanked 2 Times in 2 Posts
    Quote Originally Posted by RichSelian View Post
    traditional LZ can be regarded as ROLZ with 0-bytes context i think
    using multiple matchers may be useful for high compression ratio, but it needs to update all matches on every symbols. that will significantly slow down the speed. the ratio won't be so much better since most matches are duplicated in both matchers.
    I think current ROLZ coding techniques can be combined with my Reduced Length LZ (RLLZ) idea to produce better compression ratios. Lucas proposes switching from LZ77 to LZ78 when needed:

    https://encode.su/threads/3013-Reduc...put-LZ77-codes

    The offsets are positions of strings in the file or a block large enough to have many similar strings but not too large so that offset_size remains short. Focus on one duplicated string, encode it one time. Then, for next occurrences of the string past the initial string, write offset. There is no need to output length code. Do this for all other duplicated strings.

    Lastly, output the literals. So you have just an array of literals encoded. No output boolean prefix bit if it is literal, or match string. No literal length needed, unlike other LZ77 methods. Saves much output bits than ordinary LZ77/LZSS.

    The strings are decodable, so are the literals. This is done by writing the literals last in the output or write buffer i.e. after all match strings are decoded. The literals nicely fit in the "holes" or gaps not occupied by the match strings (and they are exactly in their correct positions in the file)!

    Decode Output buffer (one appropriately-sized block):

    [STRING..STRING....STRING.STRING.....STRING.STRING]
    (2 holes, 4 holes, 1 hole, 5 holes, 1 hole)


    Traditional LZ77/LZSS assumes that each time you output a code for a literal or string you are exactly in the current position in the file. But if you back off a little, and see the whole file at once, you can actually defer outputting the literals.
    Last edited by compgt; 2nd February 2020 at 15:52.

Similar Threads

  1. LZ4X - An Optimized LZ4 Compressor
    By encode in forum Data Compression
    Replies: 100
    Last Post: 26th March 2019, 19:46
  2. Grouped (ROLZ) LZW compressor
    By Marco_B in forum Data Compression
    Replies: 19
    Last Post: 5th April 2018, 14:39
  3. Optimized LZSS compressor
    By encode in forum Data Compression
    Replies: 11
    Last Post: 13th February 2014, 22:51
  4. A nooblike ROLZ compressor
    By RichSelian in forum Data Compression
    Replies: 11
    Last Post: 10th October 2012, 23:13
  5. xp a rolz compressor
    By pmcontext in forum Data Compression
    Replies: 40
    Last Post: 9th December 2010, 09:04

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •