Results 1 to 9 of 9

Thread: ROLZ: reduce unneeded memory access?

  1. #1
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    171
    Thanks
    20
    Thanked 61 Times in 30 Posts

    Unhappy ROLZ: reduce unneeded memory access?

    currently my ROLZ compressor uses a table structure familar with BALZ:
    Code:
    struct rolz_item {
        unsigned m_pos: 24;
        unsigned m_checksum: 8;
    };
    struct rolz_bucket {
        rolz_item m_items[15]; /* make a bucket 64-byte */
        unsigned m_head; /* make a circular queue */
    };
    rolz_bucket rolz_table[CONTEXT_SIZE];
    I found my compressor not fast as Etincelle or SlugX. I think the reason is that I do have a lot of memory access for searching. so I create the following structure:
    Code:
    struct rolz_item {
        unsigned m_pos: 22;
        unsigned m_checksum: 6;
        unsigned m_skiplen: 4;
    };
    where m_skiplen is the match length when it was matched by a new-inserted item. Here's example:
    Code:
    skiplen: 0    0    2    2    3    3    1
    item:   aaaa baaa aabb aacc aaab aaac abcc
    with such trick, we can skip a lot of items on searching, for example we insert and search a new item 'aaac':
    Code:
    matchlen:          1    1    -    -    -    -    3
    skiplen:           0    0    2    2    3    3    1
    item:      [abcd] aaaa baaa aabb aacc aaab aaac abcc
    items with "matchlen < skiplen" will be ignored, and we will have fewer memory access. however I found this trick doesn't improve my compressor, instead it make it cost more time...

  2. #2
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    885
    Thanks
    480
    Thanked 278 Times in 118 Posts
    Hi RichSelian

    It's difficult to say if this trick is actually useful.
    If you want to "sort" your reference pointers, it's probably better to go all the way to binary tree. This is what Christian Martelock's RZM did.

    On the other extreme, note that Etincelle (and probably Slugx) do a single probe per search. This is obviously key to their speed.

  3. #3
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    171
    Thanks
    20
    Thanked 61 Times in 30 Posts
    Quote Originally Posted by Cyan View Post
    Hi RichSelian

    It's difficult to say if this trick is actually useful.
    If you want to "sort" your reference pointers, it's probably better to go all the way to binary tree. This is what Christian Martelock's RZM did.

    On the other extreme, note that Etincelle (and probably Slugx) do a single probe per search. This is obviously key to their speed.
    I think any trees will severely slow down the speed. that's not what I want.
    You mean Etincelle is using order-1 ROLZ with one item per bucket (order-1 LZP, to say in another way)? followed by huff0? then how can it achieve better compression ratio then gzip?

  4. #4
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    885
    Thanks
    480
    Thanked 278 Times in 118 Posts
    Higher dictionary size (mostly).
    Etincelle can look up to 1GB of data (although you get most of the performance with a dictionary of 64MB).

  5. #5
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    171
    Thanks
    20
    Thanked 61 Times in 30 Posts
    Quote Originally Posted by Cyan View Post
    Higher dictionary size (mostly).
    Etincelle can look up to 1GB of data (although you get most of the performance with a dictionary of 64MB).
    I don't think that's the reason, Etincelle compresses enwik7 into 3634627 bytes (better than gzip: 3693800 bytes), where 1GB long match has no help?

  6. #6
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    885
    Thanks
    480
    Thanked 278 Times in 118 Posts
    There is a reduced gain with each dictionary size increase. And effectively, beyond 64MB, gain become small.

    Enwik7 doesn't reach such a scale, but is nonetheless a 10MB file. Not too bad.
    That's still a good enough dictionary size to remain competitive with gzip.

  7. #7
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    885
    Thanks
    480
    Thanked 278 Times in 118 Posts
    As a complement : it's possible to test different values of dictionary sizes, thanks to the -m options.
    Here are some results on enwik8 :

    default (-m96) : 35.70%
    -m16 : 35.77%
    -m4 : 36.01%
    -m1 : 36.68%

    So : reduced dictionary do indeed hurt compression performance, as expected, but the loss is not that large.
    16MB is almost as good as full-filesize.
    Even with smallest 1MB dictionary size, which hurts most, compression performance remains reasonable.

    That's a bit better than I remembered...

  8. #8
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    171
    Thanks
    20
    Thanked 61 Times in 30 Posts
    Quote Originally Posted by Cyan View Post
    As a complement : it's possible to test different values of dictionary sizes, thanks to the -m options.
    Here are some results on enwik8 :

    default (-m96) : 35.70%
    -m16 : 35.77%
    -m4 : 36.01%
    -m1 : 36.68%

    So : reduced dictionary do indeed hurt compression performance, as expected, but the loss is not that large.
    16MB is almost as good as full-filesize.
    Even with smallest 1MB dictionary size, which hurts most, compression performance remains reasonable.

    That's a bit better than I remembered...
    large dictionary size has no help with small files (like enwik7), LZP will lose lots of matches, which huff0 cannot benefit from. [LZP + low-order entropy encoder] is not a good idea at all.
    I think there must be something more interesting in etincelle, but not only dictionary size.

  9. #9
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    885
    Thanks
    480
    Thanked 278 Times in 118 Posts
    I can't tell.
    I've not been able to do any better so far, as if Etincelle reached a "miraculous compromise" between speed and strength.

Similar Threads

  1. A nooblike ROLZ compressor
    By RichSelian in forum Data Compression
    Replies: 11
    Last Post: 10th October 2012, 23:13
  2. ROLZ and Search Trees ?
    By Guzzo in forum Data Compression
    Replies: 5
    Last Post: 1st August 2012, 00:03
  3. Does someone has access to IEEE? Looking for a paper.
    By Sebastian in forum Data Compression
    Replies: 2
    Last Post: 23rd January 2012, 09:12
  4. xp a rolz compressor
    By pmcontext in forum Data Compression
    Replies: 40
    Last Post: 9th December 2010, 09:04
  5. ROLZ explanation?
    By Trixter in forum Data Compression
    Replies: 5
    Last Post: 10th June 2008, 18:24

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •