Results 1 to 11 of 11

Thread: RLE - run-length encoding

  1. #1
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,026
    Thanks
    103
    Thanked 410 Times in 285 Posts

    RLE - run-length encoding

    RLE version 0.0.0.6 - run-length encoding

    A little improved RLE version for escape collisions and long sequences.

    RLE run-length encoding:
    rle e input output
    rle d input output

    There is one optional option (also needed by decoding):
    -e xxx (escape decimal, default 27)
    -bench (bench in memory)

    Command line version, can work with .NET framework 4.8.

    Very simple first version, there is no error handling and not well tested.

    Input can be any file with consecutive repeating bytes.
    Output can be compressed with an other archiver.

    Created also my second C++ program and added Windows VS, GCC, Intel, Linux and Mac OS X binaries.

    Hitachi patent looks expired:
    https://patentimages.storage.googlea.../US4586027.pdf


    Input:
    1,000,000,000 bytes - enwik9

    Output:
    991,104,444 bytes, 3.983 sec. - 2.929 sec., rle 0.0.0.5 VB.NET 4.8
    991,104,444 bytes, 2.327 sec. - 1.468 sec., rle 0.0.0.5 C++ GCC 8.1.0
    991,104,444 bytes, 2.515 sec. - 1.471 sec., rle 0.0.0.5 C++ Intel 19.1 Update 2
    991,104,444 bytes, 3.087 sec. - 1.684 sec., rle 0.0.0.5 C++ VS 2019


    Input:
    10,000,000,000 bytes - enwik10

    Output:
    9,886,790,489 bytes, 24.988 sec. - 14.805 sec., rle 0.0.0.5 C++ Intel 19.1 Update 2
    9,886,790,489 bytes, 30.509 sec. - 16.850 sec., rle 0.0.0.5 C++ VS 2019


    Input:
    400,000,000 bytes - TS40.txt

    Output:
    399,987,375 bytes, 1.831 sec. - 1.115 sec., rle 0.0.0.3 VB.NET 4.8
    399,987,375 bytes, 0.874 sec. - 0.718 sec., rle 0.0.0.3 C++ GCC 8.1.0
    399,987,375 bytes, 1.145 sec. - 0.585 sec., rle 0.0.0.3 C++ Intel 19.1 Update 2
    399,987,375 bytes, 1.084 sec. - 0.664 sec., rle 0.0.0.3 C++ VS 2019


    Input:
    357,579,332 bytes - TS40.kwe (depth 255)

    Output:
    356,559,540 bytes, 1.741 sec. - 1.064 sec., rle 0.0.0.3 VB.NET 4.8
    356,559,540 bytes, 0.937 sec. - 0.624 sec., rle 0.0.0.3 C++ GCC 8.1.0
    356,559,540 bytes, 1.107 sec. - 0.534 sec., rle 0.0.0.3 C++ Intel 19.1 Update 2
    356,559,540 bytes, 1.085 sec. - 0.609 sec., rle 0.0.0.3 C++ VS 2019


    Input:
    269,717,172 bytes - TS40.kwe (depth 65,535)

    Output:
    251,039,562 bytes, 1.352 sec. - 0.839 sec., rle 0.0.0.3 VB.NET 4.8
    251,039,562 bytes, 0.749 sec. - 0.484 sec., rle 0.0.0.3 C++ GCC 8.1.0
    251,039,562 bytes, 0.880 sec. - 0.466 sec., rle 0.0.0.3 C++ Intel 19.1 Update 2
    251,039,562 bytes, 0.888 sec. - 0.527 sec., rle 0.0.0.3 C++ VS 2019
    Attached Files Attached Files
    Last edited by Sportman; 1st August 2020 at 23:04.

  2. Thanks (2):

    JamesWasil (23rd July 2020),Mike (23rd July 2020)

  3. #2
    Member
    Join Date
    Aug 2019
    Location
    Germany
    Posts
    8
    Thanks
    9
    Thanked 3 Times in 3 Posts
    Nice to see people working on rle. I had not even heard of that patent.

    I’m not sure if the files you used for testing are very representative of the actual encoding performance, since they’re not very compressible at all (using rle).
    The files I’ve used for profiling my run-length encoding are listed in the README.md on my github: (https://github.com/rainerzufalldererste/rle8)

    Hope that helps

  4. Thanks:

    Sportman (26th July 2020)

  5. #3
    Member nikkho's Avatar
    Join Date
    Jul 2011
    Location
    Spain
    Posts
    554
    Thanks
    223
    Thanked 166 Times in 107 Posts
    Surprising that GCC generates better Windows code than VC.

  6. #4
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,026
    Thanks
    103
    Thanked 410 Times in 285 Posts
    Quote Originally Posted by rainerzufalldererste View Post
    The files I’ve used for profiling my run-length encoding are listed in the README.md
    Added -bench option for in memory testing.

    Input:
    419,225,625 bytes - 1034.db

    Output:
    Code:
    92,616,751 bytes, 0.902 sec. 443.24 MiB/s - 0.468 sec.,   854.28 MiB/s, 22.09%, rle 0.0.0.3 -bench VB.NET
    92,616,751 bytes, 1.199 sec. 333.45 MiB/s - 0.837 sec.,   477.66 MiB/s, 22.09%, rle 0.0.0.3 VB.NET
    92,616,751 bytes, 0.437 sec. 914.89 MiB/s - 0.265 sec., 1,508.70 MiB/s, 22.09%, rle 0.0.0.3 -bench C++ GCC
    92,616,751 bytes, 0.812 sec. 492.37 MiB/s - 0.374 sec., 1,069.00 MiB/s, 22.09%, rle 0.0.0.3 C++ GCC
    92,616,751 bytes, 0.450 sec. 888.46 MiB/s - 0.366 sec., 1,092.36 MiB/s, 22.09%, rle 0.0.0.3 -bench C++ Intel
    92,616,751 bytes, 0.735 sec. 543.95 MiB/s - 0.405 sec.,   987.17 MiB/s, 22.09%, rle 0.0.0.3 C++ Intel
    92,616,751 bytes, 0.541 sec. 781.25 MiB/s - 0.367 sec., 1,089.39 MiB/s, 22.09%, rle 0.0.0.3 -bench C++ VS
    92,616,751 bytes, 0.890 sec. 479.40 MiB/s - 0.486 sec.,   822.64 MiB/s, 22.09%, rle 0.0.0.3 C++ VS

    Input:
    88,473,600 bytes - video_frame.raw

    Output:
    Code:
    16,576,873 bytes, 0.185 sec.   456.08 MiB/s - 0.089 sec.,   948.03 MiB/s, 18.74%, rle 0.0.0.3 -bench VB.NET
    16,576,873 bytes, 0.244 sec.   345.80 MiB/s - 0.165 sec.,   511.36 MiB/s, 18.74%, rle 0.0.0.3 VB.NET
    16,576,873 bytes, 0.094 sec.   897.61 MiB/s - 0.046 sec., 1,834.24 MiB/s, 18.74%, rle 0.0.0.3 -bench C++ GCC
    16,576,873 bytes, 0.187 sec.   451.20 MiB/s - 0.062 sec., 1,360.89 MiB/s, 18.74%, rle 0.0.0.3 C++ GCC
    16,576,873 bytes, 0.083 sec. 1,016.57 MiB/s - 0.071 sec., 1,188.38 MiB/s, 18.74%, rle 0.0.0.3 -bench C++ Intel
    16,576,873 bytes, 0.145 sec.   581.90 MiB/s - 0.073 sec., 1,155.82 MiB/s, 18.74%, rle 0.0.0.3 C++ Intel
    16,576,873 bytes, 0.108 sec.   781.25 MiB/s - 0.070 sec., 1,205.36 MiB/s, 18.74%, rle 0.0.0.3 -bench C++ VS
    16,576,873 bytes, 0.176 sec.   479.40 MiB/s - 0.091 sec.,   927.20 MiB/s, 18.74%, rle 0.0.0.3 C++ VS
    Last edited by Sportman; 29th July 2020 at 22:27.

  7. #5
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    506
    Thanks
    187
    Thanked 177 Times in 120 Posts
    Benchmarking the size of the RLE output may not be the whole picture. It really depends on what is downstream.

    For example we have pre-RLE where "abccccdebbbd" may become "ab<X>c<4>bbbd" and post-RLE which may become "abccc<1>debbb<0>d". The former uses an rle meta-character <X> to indicate the next bytes are an RLE entry. The latter just looks at the history and after 3 repeats outputs an additional length. The latter is larger, but potentially may compress smaller if you have something more complex than an order-0 frequency entropy encoder.

    One other method I found that works well (posted somewhere else here - it wasn't my idea) was the Meso RLE which worked out upfront which symbols needed RLE and which didn't, and then every occurence of that symbol is sym+length and every other symbol is just as-is even if it's in a run. It's data dependent (and rubbish for things like BWT), but can be useful in some cases.

  8. #6
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,026
    Thanks
    103
    Thanked 410 Times in 285 Posts
    Input:
    124,555,246 bytes - 1800dpi.bmp (https://www.artwork.com/gerber/vlbv/bmp_support.htm)

    Output:
    Code:
    31,959,587 bytes, 0.235 sec.   505.47 MiB/s - 0.127 sec.   935.32 MiB/s, 25.66%, rle 0.0.0.3 -bench VB.NET
    31,959,587 bytes, 0.329 sec.   361.05 MiB/s - 0.234 sec.   507.63 MiB/s, 25.66%, rle 0.0.0.3 VB.NET
    31,959,587 bytes, 0.109 sec. 1,089.77 MiB/s - 0.078 sec. 1,522.84 MiB/s, 25.66%, rle 0.0.0.3 -bench C++ GCC
    31,959,587 bytes, 0.203 sec.   585.15 MiB/s - 0.109 sec. 1,089.74 MiB/s, 25.66%, rle 0.0.0.3 C++ GCC
    31,959,587 bytes, 0.103 sec. 1,153.25 MiB/s - 0.086 sec. 1,381.18 MiB/s, 25.66%, rle 0.0.0.3 -bench C++ Intel
    31,959,587 bytes, 0.181 sec.   656.27 MiB/s - 0.111 sec. 1,070.10 MiB/s, 25.66%, rle 0.0.0.3 C++ Intel
    31,959,587 bytes, 0.128 sec.   928.01 MiB/s - 0.098 sec. 1,212.05 MiB/s, 25.66%, rle 0.0.0.3 -bench C++ VS
    31,959,587 bytes, 0.244 sec.   486.82 MiB/s - 0.137 sec.   867.02 MiB/s, 25.66%, rle 0.0.0.3 C++ VS
    Last edited by Sportman; 1st August 2020 at 23:58.

  9. #7
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,026
    Thanks
    103
    Thanked 410 Times in 285 Posts
    Added RLE version 0.0.0.4 with encoding fix for some cases and added Intel compiler binary.

  10. #8
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,026
    Thanks
    103
    Thanked 410 Times in 285 Posts
    Added RLE version 0.0.0.5 with improved encoding speed and Linux/Mac OS X displayed time/speed fix.

  11. #9
    Member
    Join Date
    May 2020
    Location
    Berlin
    Posts
    64
    Thanks
    10
    Thanked 20 Times in 15 Posts
    Quote Originally Posted by nikkho View Post
    Surprising that GCC generates better Windows code than VC.
    It's actually not that surprising. GCC added good features over the last versions. And MS VC never was one of the best. It just does what it is supposed to do. I think this was not much different in the 90s, with Borland and Watcom as competitors, later ICC, PGI, Sun, Clang/LLVM and some other good ones for x86/x64 (especially for vectorization and autoparallelization)

    But it is suprising, that GCC is sometimes better than ICC now on some archs. This can be seen in compiler usage in SPEC benchmark runs.

    Addendum: Of course, a reason for that is that GCC got dev support from CPU manufacturers like AMD. I tracked their contributions in the past to be able to derive the microarchitecture of their next CPU families (Bulldozer and Zen), like in this blog entry (which is even linked from the Zen Wikipedia article ).
    Last edited by Dresdenboy; 2nd August 2020 at 19:30.

  12. #10
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,026
    Thanks
    103
    Thanked 410 Times in 285 Posts
    Added RLE version 0.0.0.6 with encoding fix for some cases.

  13. #11
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,026
    Thanks
    103
    Thanked 410 Times in 285 Posts
    Input:
    1,000,000,124 bytes, enwik9.bwt

    Output:
    Code:
    468,906,482 bytes, 3.025 sec. 315.26 MiB/s - 1.414 sec.  674.45 MiB/s, 46.89%, rle 0.0.0.6 -bench VB.NET
    468,906,482 bytes, 3.795 sec. 251.30 MiB/s - 2.571 sec.  370.94 MiB/s, 46.89%, rle 0.0.0.6 VB.NET
    468,906,482 bytes, 1.766 sec. 540.02 MiB/s - 0.922 sec. 1034.35 MiB/s, 46.89%, rle 0.0.0.6 -bench C++ GCC
    468,906,482 bytes, 2.765 sec. 344.91 MiB/s - 1.437 sec.  663.66 MiB/s, 46.89%, rle 0.0.0.6 C++ GCC
    468,906,482 bytes, 1.826 sec. 522.28 MiB/s - 1.126 sec.  846.96 MiB/s, 46.89%, rle 0.0.0.6 -bench C++ Intel
    468,906,482 bytes, 2.683 sec. 355.45 MiB/s - 1.394 sec.  684.13 MiB/s, 46.89%, rle 0.0.0.6 C++ Intel
    468,906,482 bytes, 2.032 sec. 469.33 MiB/s - 1.202 sec.  793.41 MiB/s, 46.89%, rle 0.0.0.6 -bench C++VS
    468,906,482 bytes, 2.975 sec. 320.56 MiB/s - 1.506 sec.  633.25 MiB/s, 46.89%, rle 0.0.0.6 C++ VS

    Input:
    400,000,052 bytes - TS40.bwt

    Output:
    Code:
    235,490,336 bytes, 1.349 sec. 282.78 MiB/s 0.613 sec.  622.30 MiB/s, 58.87%, rle 0.0.0.6 -bench VB.NET
    235,490,336 bytes, 1.690 sec. 225.72 MiB/s 1.105 sec.  345.22 MiB/s, 58.87%, rle 0.0.0.6 VB.NET
    235,490,336 bytes, 0.765 sec. 498.65 MiB/s 0.375 sec. 1017.25 MiB/s, 58.87%, rle 0.0.0.6 -bench C++ GCC
    235,490,336 bytes, 1.234 sec. 309.13 MiB/s 0.624 sec.  611.33 MiB/s, 58.87%, rle 0.0.0.6 C++ GCC
    235,490,336 bytes, 0.850 sec. 448.79 MiB/s 0.468 sec.  815.11 MiB/s, 58.87%, rle 0.0.0.6 -bench C++ Intel
    235,490,336 bytes, 1.182 sec. 322.73 MiB/s 0.607 sec.  628.45 MiB/s, 58.87%, rle 0.0.0.6 C++ Intel
    235,490,336 bytes, 0.929 sec. 410.62 MiB/s 0.501 sec.  761.42 MiB/s, 58.87%, rle 0.0.0.6 -bench C++VS
    235,490,336 bytes, 1.318 sec. 289.43 MiB/s 0.670 sec.  569.36 MiB/s, 58.87%, rle 0.0.0.6 C++VS
    Last edited by Sportman; 2nd August 2020 at 00:40.

Similar Threads

  1. TurboRLE: Turbo Run Length Encoding
    By dnd in forum Data Compression
    Replies: 25
    Last Post: 20th August 2019, 08:21
  2. Question on Run Length Encoding
    By khavish in forum Data Compression
    Replies: 13
    Last Post: 27th May 2018, 11:23
  3. Implementing Run-length encoding in CUDA
    By nemequ in forum Data Compression
    Replies: 1
    Last Post: 28th June 2016, 13:08
  4. Replies: 38
    Last Post: 27th April 2016, 18:01
  5. xrle-eXtreme Run Length Encoding
    By algorithm in forum Data Compression
    Replies: 5
    Last Post: 20th April 2015, 23:02

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •