Results 1 to 6 of 6

Thread: two small tricks to improve ROLZ compression ratio

  1. #1
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    171
    Thanks
    20
    Thanked 61 Times in 30 Posts

    Smile two small tricks to improve ROLZ compression ratio

    code here: https://github.com/richox/orz/blob/master/src/lz.rs


    1. minimum match length per each position
    assume we have a position to 'AAAAAAB' in ROLZ hash chain, and the next position 'AAAAAAC' comes. we get a match of length 6, and we know if there's another position that matches 'AAAAAAB', the length will be always greater than 6, otherwise 'AAAAAAC' is a better match (with shorter offset). a length<=6 match to 'AAAAAB' is impossible.


    2. expected match length per each position
    when 'AAAAAAC' is appended to hash chain a get a match of length 6, we will assume that something like 'AAAAAA*' is likely to appear again. so if there's a match to 'AAAAAAC', we expect its match length to be 6.


    this two tricks are added in Orz compressor. I saved `match_len_min` and `match_len_expected` for each position in hash chain, and used a simple transform to achieve smaller match length:


    // start from match_len_min
    // bring match_len_expected to front
    let encoded_match_len = match match_len.cmp(&match_len_expected) {
    std::cmp::Ordering::Greater => match_len - match_len_min,
    std::cmp::Ordering::Less => match_len - match_len_min + 1,
    std::cmp::Ordering::Equal => 0,
    };


    here's benchmark for enwik8:


    | match_len_min | match_len_expected | size |
    |---------------|--------------------|------------|
    | Off | Off | 27,695,500 |
    | Off | On | 27,620,396 |
    | On | Off | 27,437,772 |
    | On | On | 27,066,417 |
    Last edited by RichSelian; 26th July 2019 at 19:45.

  2. Thanks (6):

    avitar (27th July 2019),comp1 (26th July 2019),Cyan (27th July 2019),encode (26th July 2019),hexagone (26th July 2019),Mike (26th July 2019)

  3. #2
    Member
    Join Date
    Sep 2018
    Location
    Philippines
    Posts
    121
    Thanks
    31
    Thanked 2 Times in 2 Posts
    RichSelian, only 2 tricks up your sleeve?
    I hope there would be more. Waiting..

    Sincerely...

  4. #3
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    404
    Thanked 403 Times in 153 Posts
    Yeah, but how does it affect the decompression speed?

  5. #4
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    171
    Thanks
    20
    Thanked 61 Times in 30 Posts
    Quote Originally Posted by encode View Post
    Yeah, but how does it affect the decompression speed?
    for rolz it's not a big problem, these two lengths can be saved together with position in the dictionary while updating.
    about 5%~10% slower in practice.

  6. #5
    Member
    Join Date
    Jan 2017
    Location
    Selo Bliny-S'edeny
    Posts
    24
    Thanks
    7
    Thanked 10 Times in 8 Posts
    Good for text, what about binary/mixed data?
    Cannot build the code.
    Code:
    $ cargo install --git https://github.com/richox/orz --tag v1.5.0 --force
    error[E0554]: #![feature] may not be used on the stable release channel
     --> /home/at/.cargo/git/checkouts/orz-8ca8c064613dc1e7/f2dfa43/src/main.rs:1:1
      |
    1 | #![feature(nll)]
      | ^^^^^^^^^^^^^^^^
    
    error[E0554]: #![feature] may not be used on the stable release channel
     --> /home/at/.cargo/git/checkouts/orz-8ca8c064613dc1e7/f2dfa43/src/main.rs:2:1
      |
    2 | #![feature(const_slice_len)]
      | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    
    error: aborting due to 2 previous errors
    Further trying to understand how the MTF array works.

  7. #6
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    171
    Thanks
    20
    Thanked 61 Times in 30 Posts
    Quote Originally Posted by svpv View Post
    Good for text, what about binary/mixed data?
    Cannot build the code.
    Code:
    $ cargo install --git https://github.com/richox/orz --tag v1.5.0 --force
    error[E0554]: #![feature] may not be used on the stable release channel
     --> /home/at/.cargo/git/checkouts/orz-8ca8c064613dc1e7/f2dfa43/src/main.rs:1:1
      |
    1 | #![feature(nll)]
      | ^^^^^^^^^^^^^^^^
    
    error[E0554]: #![feature] may not be used on the stable release channel
     --> /home/at/.cargo/git/checkouts/orz-8ca8c064613dc1e7/f2dfa43/src/main.rs:2:1
      |
    2 | #![feature(const_slice_len)]
      | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    
    error: aborting due to 2 previous errors
    Further trying to understand how the MTF array works.
    you need a nightly build rust, run `rustup default nightly` to get one.
    the mtf.rs is not a complete move-to-front transform, it just swap symbol to a more advanced position. it is used to apply symbol exclusion and make compression ratio closer to o-1 but still have o-0 speed.

Similar Threads

  1. Rans tricks
    By Lucas in forum Data Compression
    Replies: 14
    Last Post: 4th November 2019, 17:30
  2. Replies: 14
    Last Post: 1st November 2019, 17:27
  3. Replies: 69
    Last Post: 16th August 2016, 18:46
  4. WEBP - how to improve it?
    By Stephan Busch in forum Data Compression
    Replies: 38
    Last Post: 4th June 2016, 13:43
  5. A small article on ROLZ (Russian)
    By encode in forum Forum Archive
    Replies: 21
    Last Post: 29th April 2007, 15:18

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •