Results 1 to 12 of 12

Thread: rle8 - A fast 8 bit RLE

  1. #1
    Member
    Join Date
    Aug 2019
    Location
    Germany
    Posts
    7
    Thanks
    9
    Thanked 2 Times in 2 Posts

    rle8 - A fast 8 bit RLE

    rle8 is an 8 bit run-length-encoding I wrote a few months ago.
    It's designed for performance (especially if only one symbol is worth being run-length-encoded) with SSE2 / AVX / AVX2 decoder variants.
    In single symbol mode the task boils down to an equivalent of memchr(..), memset(..).
    I tried using OpenCL for decoding, but it seemed like task wasn't particularly suitable (or maybe I just suck at OpenCL).
    If there are multiple symbols that are worth being run-length-encoded it will try to preserve symbol frequencies (to improve compressability of the encoded result with general-purpose entropy encoding).
    It was designed for the compression stage of an image-compression, but I didn't end up using it after all.

    Source: https://github.com/rainerzufalldererste/rle8

    Nothing fancy but maybe someone here is interested.
    If you have any feedback or ideas on how to improve decoding performance please let me know.

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,717
    Thanks
    271
    Thanked 1,185 Times in 656 Posts
    I guess you can compare it to this: https://github.com/powturbo/TurboRLE

  3. #3
    Member
    Join Date
    Aug 2019
    Location
    Germany
    Posts
    7
    Thanks
    9
    Thanked 2 Times in 2 Posts
    I've added a small optimization which enables rle8 to slightly beat TurboRLE in specific cases.
    TurboRLE does an awesome job even in complex circumstances, where rle8 isn't that competitive.

    Single RLE-Worthy Symbol:No code has to be inserted here.

    More Complex File:
    No code has to be inserted here.
    Last edited by rainerzufalldererste; 14th August 2019 at 15:49. Reason: Table Tags don't seem to be supported o.O

  4. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,717
    Thanks
    271
    Thanked 1,185 Times in 656 Posts
    They are supported, but somehow with "|" as column separator and LF as row separator

    No code has to be inserted here.

    Code:
    {table}A|B|C
    D|E|F
    G|H|I
    {/table}

  5. Thanks (2):

    JamesB (14th August 2019),rainerzufalldererste (14th August 2019)

  6. #5
    Member
    Join Date
    Aug 2019
    Location
    Germany
    Posts
    7
    Thanks
    9
    Thanked 2 Times in 2 Posts
    I have changed the rle8_ultra implementation to have a maximum length of 32 (allowing to not even iterate over the bytes to set, but rather just _mm256_storeu_si256 the symbols and move the write pointer forward by the count)
    This is obviously slower and less efficient for files with symbols that occur quite often, but beats TurboRLE in terms of speed on my encoded image test file (by a tiny amount).
    I believe the results (both speed and efficiency wise) can be improved by selecting certain modes adaptively throughout the file. (e.g. max length 32; use unused symbol for the RLE symbol to not stop scanning on single occurrences of the symbol in question; etc.)

    Obviously TurboRLE still beats rle8 and rle8_ultra easily when not using an 8 bit RLE.

    New results:

    Encoded image file:
    No code has to be inserted here.

    The single run length encodable symbol file I had also used previously (rle_ultra is a lot slower here, but normal rle still beats TurboRLE barely)
    No code has to be inserted here.

  7. #6
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    565
    Thanks
    67
    Thanked 198 Times in 147 Posts
    TurboRLE: Run Length Encoding vs RLE8:a Fast 8 bit RLE

    I've integrated rle8 into TurboBench: Compression Benchmark and tested this with different distributions.
    TurboRLE is pareto for all benchmarked files.

    See the full Benchmark
    Last edited by dnd; 25th February 2020 at 13:20.

  8. Thanks:

    rainerzufalldererste (18th August 2019)

  9. #7
    Member
    Join Date
    Aug 2019
    Location
    Germany
    Posts
    7
    Thanks
    9
    Thanked 2 Times in 2 Posts
    Thanks for integrating rle8 into TurboBench!

    Seems like I need to do some more optimizations.

  10. Thanks:

    dnd (18th August 2019)

  11. #8
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    565
    Thanks
    67
    Thanked 198 Times in 147 Posts
    compiling under TurboBench: Compression Benchmark
    To enable the compilation under gcc, I've modified "rle8.h" by surrounding it with 'extern "C" ' and renamed "_xgetbv" (already defined in gcc) in "rle8_cpu.c".

    To build turbobench, clone rle8 under the turbobench directory, then use "make RLE8=1".
    Code:
    git clone --recursive git://github.com/powturbo/TurboBench.git
    cd TurboBench
    git clone git://github.com/rainerzufalldererste/rle8.git    (modify rle8.h and rle8_cpu.c as described)
    make RLE8=1

  12. Thanks:

    rainerzufalldererste (18th August 2019)

  13. #9
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    486
    Thanks
    169
    Thanked 166 Times in 114 Posts
    Whether or not you have a method that beats TurboRLE, it's clear you've prodded dnd into improving his own code so it's still a win for data compression. Thanks.

  14. Thanks:

    dnd (20th August 2019)

  15. #10
    Member
    Join Date
    Aug 2019
    Location
    Germany
    Posts
    7
    Thanks
    9
    Thanked 2 Times in 2 Posts
    dnd's recent changes to TurboRLE are very impressive and even the encoder is now incredibly fast as well.
    However (following my horrible naming convention) I have added rle8 extreme. (also has 16, 32 and 64 bit rle variants)
    So now for a few days there are again certain scenarios (seems like it's files that run-length-encode down to ~20-50%) will decompress considerably faster using RLE8 than in TurboRLE.

    1034.db (Checkers program "End Game Table Base")

    No code has to be inserted here.

    video-frame.raw (heavily quantized video frame DCTs)

    No code has to be inserted here.
    Last edited by rainerzufalldererste; 22nd August 2019 at 23:31. Reason: Fixed typo

  16. #11
    Member
    Join Date
    Aug 2019
    Location
    Germany
    Posts
    7
    Thanks
    9
    Thanked 2 Times in 2 Posts
    I've added 24 Bit and 48 Bit variants to rle8.

    1034.db
    No code has to be inserted here.

    video-frame.raw
    No code has to be inserted here.

    pixelart.bmp (https://i.redd.it/tj5oyhhuehv11.png converted to BMP)
    No code has to be inserted here.

    Obviously the 8, 16, 32, 64 Bit variants of both rle8 and TurboRLE do a pretty terrible job on the last one (both en-/decompression speed and compression ratio wise) and don't reduce the original file size by more than 0.01%.

  17. #12
    Member
    Join Date
    Aug 2019
    Location
    Germany
    Posts
    7
    Thanks
    9
    Thanked 2 Times in 2 Posts
    I had improved the performance of the rle8 Extreme 8 Bit Encoder and added a 128 Bit variant a while ago but wasn't able to profile it properly.
    However I had access to a current gen Intel CPU (i9-9900K) recently and was absolutely blown away with the results:

    File: 1034.db (Checkers program "End Game Table Base", 419.225.625 Bytes)
    Code:
    | Type                      |  Ratio  | Encoding Speed | Decoding Speed | Entropy Compr. To |
    |~~~~~~~~~~~~~~~~~~~~~~~~~~~|~~~~~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
    | rle8 Extreme 8 Bit        | 23.02 % |  1482.5 MiB/s  |  12473.0 MiB/s |      12.08 %      |
    | rle8 Extreme 8 Bit Single | 20.59 % |   473.3 MiB/s  |  10676.4 MiB/s |      12.06 %      |
    | rle8 Extreme 16 Bit       | 24.89 % |  1118.5 MiB/s  |  10503.6 MiB/s |      12.27 %      |
    | rle8 Extreme 24 Bit       | 27.27 % |  1345.8 MiB/s  |  12788.1 MiB/s |      11.95 %      |
    | rle8 Extreme 32 Bit       | 28.39 % |  1485.0 MiB/s  |  10928.2 MiB/s |      12.27 %      |
    | rle8 Extreme 48 Bit       | 31.22 % |  1758.2 MiB/s  |  13129.2 MiB/s |      12.02 %      |
    | rle8 Extreme 64 Bit       | 33.36 % |  1623.6 MiB/s  |  12586.9 MiB/s |      11.98 %      |
    | rle8 Extreme 128 Bit      | 39.84 % |  1521.6 MiB/s  |  13386.2 MiB/s |      12.16 %      |
    File: https://i.redd.it/tj5oyhhuehv11.png (PNG converted to BMP, 123.710.454 Bytes)
    Code:
    | Type                |  Ratio  | Encoding Speed | Decoding Speed | Entropy Compr. To |
    |~~~~~~~~~~~~~~~~~~~~~|~~~~~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
    | rle8 Extreme 8 Bit  | 99.99 % |  4892.4 MiB/s  |  10863.6 MiB/s |      76.18 %      |
    | rle8 Extreme 24 Bit |  1.84 % |  3710.0 MiB/s  |  16767.0 MiB/s |       1.32 %      |
    | rle8 Extreme 48 Bit |  2.78 % |  6407.1 MiB/s  |  16446.9 MiB/s |       2.12 %      |
    Full Results & Comparison available at https://github.com/rainerzufalldererste/rle8

  18. Thanks:

    dnd (5th January 2020)

Similar Threads

  1. 32 bit or 64 bit Windows usage case
    By necros in forum The Off-Topic Lounge
    Replies: 10
    Last Post: 18th August 2016, 09:57
  2. LZ4, BWT, RLE?
    By alberto98fx in forum Data Compression
    Replies: 6
    Last Post: 3rd July 2016, 21:09
  3. Compression benchmarking: 64 bit images and 24 bit codecs
    By m^2 in forum The Off-Topic Lounge
    Replies: 6
    Last Post: 30th November 2011, 17:01
  4. Fast Bit I/O
    By encode in forum Data Compression
    Replies: 12
    Last Post: 10th October 2011, 15:17
  5. RINGS Fast Bit Compressor.
    By Nania Francesco in forum Forum Archive
    Replies: 115
    Last Post: 26th April 2008, 22:58

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •