Results 1 to 30 of 30

Thread: FARI - Fast Arithmetic Compressor

  1. #1
    Member
    Join Date
    Aug 2011
    Location
    Canada
    Posts
    113
    Thanks
    9
    Thanked 22 Times in 15 Posts

    FARI - Fast Arithmetic Compressor

    FARI is an arithmetic compressor with extremely high compression/decompression speeds. It uses an order 17 adaptive bitwise model (rate=1/16) and compresses at over 20mb/s on my laptop. The current release version is 0.3, as I feel that an arithmetic coder with slower decompression than compression cannot be considered a final version. Benchmarks as as follows:

    Code:
    ENWIK8 :  41,027,671 bytes ( 4.3 seconds encode/ 6.0 seconds decode)
    ENWIK9 : 390,794,122 bytes (43.1 seconds encode/60.0 seconds decode)
    
    Run on a 2.20ghz mobile i7 (8 cores) with 8gb of RAM
    Tested executable is FARI-x64-AVX.exe
    Attached Files Attached Files

  2. Thanks (3):

    Nania Francesco (6th December 2013),samsat1024 (16th December 2013),Stephan Busch (6th December 2013)

  3. #2
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    can you say how AVX is employed in your program? using SIMD for compression becomes intersting for me in last weeks

  4. #3
    Member
    Join Date
    Aug 2011
    Location
    Canada
    Posts
    113
    Thanks
    9
    Thanked 22 Times in 15 Posts
    I did no work in assembler or manual optimization, so the only AVX present in the provided executable was generated by allowing MSVC2013 to use AVX instructions.

  5. #4
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    [removed...]
    Last edited by Bulat Ziganshin; 6th December 2013 at 00:42.

  6. #5
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    ... using SIMD for compression becomes intersting for me in last weeks
    I'm going off topic slightly here: it's useful for a few cases such as fast string searches (load 16 characters, if they don't match, use move mask to tell you the first non-matching position) and paq-style counters (i.e. compares become masks for operations, and you can update 8-16 counters at once).

    But in my opinion, it is not very useful before/after load and store operations. For example, SIMD for 8 paq counters was only 20% faster than incremental because of the cost of loading from distant memory locations.

    On the other side, you can do summing and rescaling extremely efficiently, such as cumulative count for arithmetic coding or rescaling totals down. Just my 2 cents.

  7. #6
    Member
    Join Date
    Oct 2013
    Location
    Filling a much-needed gap in the literature
    Posts
    350
    Thanks
    177
    Thanked 49 Times in 35 Posts
    Bulat,

    Have you seen Jan Wassenberg's work from the Fraunhofer Institute on vectorized lossless image compression?

    http://onlinelibrary.wiley.com/doi/1....1109/abstract

    "This report introduces a new lossless asymmetric single instruction multiple data codec designed for extremely efficient decompression of large satellite images. A throughput in excess of 3GB/s allows decompression to proceed in parallel with asynchronous transfers from fast block devices such as disk arrays. This is made possible by a simple and fast single instruction multiple data entropy coder that removes leading null bits. Our main contribution is a new approach for vectorized prediction and encoding. Unlike previous approaches that treat the entropy coder as a black box, we account for its properties in the design of the predictor. The resulting compressed stream is 1.2 to 1.5 times as large as JPEG-2000, but can be decompressed 100 times as quickly – even faster than copying uncompressed data in memory. Applications include streaming decompression for out of core visualization. To the best of our knowledge, this is the first entirely vectorized algorithm for lossless compression."

    It looked interesting to me, and apparently he's got something better now---with very fast compression as well as decompression---that he's not actively doing anything with right now for boring corporate reasons. I'm obviously not competent to judge this sort of thing, though... (You seem to be.)

  8. #7
    Member
    Join Date
    Oct 2013
    Location
    Filling a much-needed gap in the literature
    Posts
    350
    Thanks
    177
    Thanked 49 Times in 35 Posts
    BTW, as I understand it, Wassenberg's entropy coder is very crude compared to most of the stuff discussed in this thread, but very fast. It wouldn't be a competitor for the small-fraction-of-a-bit stuff discussed here, but I thought it was interesting under the general heading of "using SIMD for compression" as Bulat mentioned.

  9. #8
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by david_werecat View Post
    FARI is an arithmetic compressor with extremely high compression/decompression speeds. It uses an order 17 adaptive bitwise model (rate=1/16) and compresses at over 20mb/s on my laptop. The current release version is 0.3, as I feel that an arithmetic coder with slower decompression than compression cannot be considered a final version. Benchmarks as as follows:

    Code:
    ENWIK8 :  41,027,671 bytes ( 4.3 seconds encode/ 6.0 seconds decode)
    ENWIK9 : 390,794,122 bytes (43.1 seconds encode/60.0 seconds decode)
    
    Run on a 2.20ghz mobile i7 (8 cores) with 8gb of RAM
    Tested executable is FARI-x64-AVX.exe
    An encoding with faster compression than decompression could be useful in some circumstances, e.g. backups.

    Is there a reason why modeling and arithmetic coding tend to be combined? It seems like, logically, the actual arithmetic coder could be separated from the model.

  10. #9
    Member
    Join Date
    Aug 2011
    Location
    Canada
    Posts
    113
    Thanks
    9
    Thanked 22 Times in 15 Posts
    The reason that the model and the arithmetic coder are combined is that this gave greater speed. Updating the model within the same routine allows further optimizations such as being able to eliminate a branch as well.

  11. Thanks:

    nburns (6th December 2013)

  12. #10
    Member
    Join Date
    Aug 2011
    Location
    Canada
    Posts
    113
    Thanks
    9
    Thanked 22 Times in 15 Posts
    MTARI is a blockwise multithreaded version of FARI. A benchmark on enwik8 gives the following:

    Code:
    ENWIK8 :  41,655,528 bytes ( 1.09 seconds encode/ 1.34 seconds decode)
    
    Run on a 2.20ghz mobile i7 (8 cores) with 8gb of RAM
    Tested executable is MTARI-x64.exe
    Attached Files Attached Files

  13. #11
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    I tested mtari on LTCB and Silesia.
    http://mattmahoney.net/dc/text.html#3972
    http://mattmahoney.net/dc/silesia.html

    The supplied executables couldn't find msvcr120.dll, so I recompiled with MinGW gcc 4.8.0 -O2 -fopenmp. Without -fopenmp it runs single threaded but otherwise the same.

  14. Thanks:

    david_werecat (11th December 2013)

  15. #12
    Member
    Join Date
    Aug 2011
    Location
    Canada
    Posts
    113
    Thanks
    9
    Thanked 22 Times in 15 Posts
    Thanks for testing it! Maybe I should start making static compiles so that it's not necessary to recompile it for anyone who doesn't have the latest runtime. It seems a little silly for Microsoft to keep on releasing newly named runtime dlls when there are no code breaking changes between versions. It would be easier to keep the same name and would save the redundancy of having to install multiple versions of the same runtime or bundling a portion of that runtime in every released application.

  16. #13
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    I have to release zpaq with static compiles using g++. Didn't used to have to do this.

  17. #14
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    876
    Thanks
    472
    Thanked 175 Times in 85 Posts

    Question

    With MinGW and GCC 4.8.1 I am getting error messages when trying to compile MTARI:

    Code:
    C:\MinGW\bin>gcc mtari.c -O2 -fopenmp
    C:\Users\squeeze\AppData\Local\Temp\ccRn1E4w.o:MTARI.c:(.text+0x75): undefined r
    eference to `fa_compress'
    C:\Users\squeeze\AppData\Local\Temp\ccRn1E4w.o:MTARI.c:(.text+0x116): undefined
    reference to `fa_decompress'
    C:\Users\squeeze\AppData\Local\Temp\ccRn1E4w.o:MTARI.c:(.text+0x163): undefined
    reference to `fa_compress'
    C:\Users\squeeze\AppData\Local\Temp\ccRn1E4w.o:MTARI.c:(.text+0x1a3): undefined
    reference to `fa_decompress'
    collect2.exe: error: ld returned 1 exit status

  18. #15
    Member
    Join Date
    Aug 2011
    Location
    Canada
    Posts
    113
    Thanks
    9
    Thanked 22 Times in 15 Posts
    The proper command line to compile is: gcc MTARI.c FastARI.c -O2 -fopenmp

  19. Thanks:

    Stephan Busch (11th December 2013)

  20. #16
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    876
    Thanks
    472
    Thanked 175 Times in 85 Posts
    using this command line to compile: gcc MTARI.c FastARI.c -O2 -fopenmp
    and then using this command line to compress: E:\TESTSETS>timer mtari c app.tar app.mtari
    leads to:

    Exit code: -1073741819

    Kernel Time = 0.125 = 1%
    User Time = 4.656 = 73%
    Process Time = 4.781 = 75%
    Global Time = 6.319 = 100%

    and using: E:\TESTSETS>timer mtari c app.tar app.mtari 4

    keeps mtari.exe busy for hours. Archive size grows, but really slow.

  21. Thanks:

    david_werecat (12th December 2013)

  22. #17
    Member
    Join Date
    Aug 2011
    Location
    Canada
    Posts
    113
    Thanks
    9
    Thanked 22 Times in 15 Posts
    Here's a new version that will hopefully fix the issue:
    Attached Files Attached Files

  23. #18
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    876
    Thanks
    472
    Thanked 175 Times in 85 Posts
    on my machine, the issue still exists with this version.

  24. #19
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    I just put together a quick plugin for Squash but I had to modify fa_decompress a bit in order to make sure not to attempt to decompress memory outside of the input buffer, and allow passing a larger-than-necessary output buffer (in case the decompressed size is not known). The version I used is at http://pastebin.com/vedsvCST—it's probably possible to do it a bit cleaner, but I was going for relatively non-invasive changes. I don't suppose you're interested in carrying that? I'd love to not have to maintain the patch

    Also, do you plan on publishing a git repository? I'm trying to use git submodules for Squash and if there is going to be an official repository I don't want to create my own.

  25. Thanks:

    david_werecat (12th December 2013)

  26. #20
    Member
    Join Date
    Aug 2011
    Location
    Canada
    Posts
    113
    Thanks
    9
    Thanked 22 Times in 15 Posts
    @Stephan Busch
    Would it be possible to upload the file in question for testing purposes?

    @nemequ
    Thanks for the patch, I've included it in the project as the default decompression method while keeping the old method as an "unsafe" version. The project is now on github at https://github.com/davidcatt/FastARI. As for decompressing without a known decompressed length, this will most likely put a small amount of garbage at the end of the decompressed data since the codec doesn't have an EOF mechanism (rather it uses block sizes as a truncation method). I may implement an EOF mechanism in the future pending speed tests.
    edit: I've implemented an EOF enabled version with functions fae_compress and fae_decompress.
    Last edited by david_werecat; 12th December 2013 at 22:22.

  27. Thanks:

    nemequ (13th December 2013)

  28. #21
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    876
    Thanks
    472
    Thanked 175 Times in 85 Posts
    Its the App corpus (http://www.compressionratings.com/fi...zechart_app.7z) put in a single .TAR file.

  29. Thanks:

    david_werecat (12th December 2013)

  30. #22
    Member
    Join Date
    Aug 2011
    Location
    Canada
    Posts
    113
    Thanks
    9
    Thanked 22 Times in 15 Posts
    Thanks for the test data. I haven't encountered an issue on my laptop with it, so I'm wondering if gcc handles openmp differently than msvc. Would you mind testing with this executable?
    Attached Files Attached Files

  31. #23
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    876
    Thanks
    472
    Thanked 175 Times in 85 Posts
    would you please also provide a file called vcomp120.dll which I could not find anywhere but which seems to be needed in order to run that executable.

  32. #24
    Member
    Join Date
    Aug 2011
    Location
    Canada
    Posts
    113
    Thanks
    9
    Thanked 22 Times in 15 Posts
    Sorry about that, it appears that statically linking doesn't include the openmp library. Here's the dll:
    Attached Files Attached Files

  33. #25
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    876
    Thanks
    472
    Thanked 175 Times in 85 Posts
    Thank you. First tests indicate that this version runs on my system.

  34. #26
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    Quote Originally Posted by david_werecat View Post
    The project is now on github at https://github.com/davidcatt/FastARI. As for decompressing without a known decompressed length, this will most likely put a small amount of garbage at the end of the decompressed data since the codec doesn't have an EOF mechanism (rather it uses block sizes as a truncation method). I may implement an EOF mechanism in the future pending speed tests.
    edit: I've implemented an EOF enabled version with functions fae_compress and fae_decompress.
    Sounds great, thanks! Unfortunately fae_decompress is failing my unit tests. I've split out a test case and added it to the issue tracker on your github repository. Once that gets resolved I'll go ahead and push the plugin and start benchmarking—I'll post again once I have the results.

    It seems unlikely that someone would be interested in all three versions at the same time—maybe it would be better to just use some preprocessor defines to switch between them so you can avoid code duplication? It's not issue from my point of view, but if I were in your shoes I know I wouldn't want to maintain three versions of basically the same thing.

  35. #27
    Member
    Join Date
    Aug 2011
    Location
    Canada
    Posts
    113
    Thanks
    9
    Thanked 22 Times in 15 Posts
    I've fixed the issue, it turns out that it was hitting the end of the input buffer because the encoder wasn't flushing enough bytes. I'll also most likely add preprocessor defines in a future commit after reviewing each version of the functions.

  36. #28
    Member
    Join Date
    Nov 2011
    Location
    Tunisia
    Posts
    8
    Thanks
    155
    Thanked 4 Times in 4 Posts
    I've compiled a static version with MSYS2(with pacman)/gcc 4.8.2 threads-posix, (64-bit:seh) / (32-bit:dwarf2)
    -generic :
    Code:
    gcc MTARI.c FastARI.c -O3 -fopenmp -v -static
    -corei7-avx :
    Code:
    gcc MTARI.c FastARI.c -O3 -v -march=corei7-avx -mtune=corei7-avx -fopenmp -static
    Checksums :
    File;crc32;md5;sha1;sha2 (256)
    Code:
    MTARI-0.2-x86-64.zip;7D5DE6B2;E60CCCA055B088A5A7F3AE00E5F112C5;130A9084A981ED3D4645CE6089047272B92C77F0;E3C37AC9F41363F03026641F0572B854EF9B6FFEFD8A59AD9D0B70B9F4BF3A63
    MTARI-0.2-x86-64-corei7-avx.exe;38425D1D;6D248640CBACB946CF3FA9B42A0FD733;8DCCDC2F0F182F29D66DDA3764311169AE2DF21F;C9625365E959CF7AD05EEF69BAF308AFD4CEE62779005244B87E1D6CE3587309
    MTARI-0.2-x86-64-generic.exe;1A44EA81;62CE5F253B6D1DDA1A0DE1970F3D796F;B06E031773B5B1390249B94CA9B18FC9B27A5886;D16F011A5DA0678EB8A5B18CCDFC5C420DD0BB479BB76D1DD59E138611A0B9DF
    Code:
    MTARI-0.2-i686.zip;84ABBDB6;89EE530B8014A6775362491FECBA5667;0662947124795FE9F3E50C1A88F896E422113646;7230C9819102D1F23E7C5E6A37D86D65217687792C9161B414BD2ACF9713A34E
    MTARI-0.2-i686-corei7-avx.exe;CC7C8720;CBECC0FB579A2DE9BA0624E684C46A41;F387C4E6F0CC2A1072CA7E7B8E39EAAE7B160FAD;BE89083D2E2BDAB8794ECFE99E705C0CD9FC67CAC62A81C7277852207FBB448E
    MTARI-0.2-i686-generic.exe;BE683070;BBA7502D5F3408FA4DA3FDAFDF9468B8;B13A0F7178965FC5B133ACCD0029D1FDC0EE61CD;02972A320DEF9064E9B7A4B927CBBDB5360C93A6583BD296791268B318D3CD96
    Attached Files Attached Files
    Last edited by samsat1024; 19th December 2013 at 01:39.

  37. Thanks:

    david_werecat (16th December 2013)

  38. #29
    Member
    Join Date
    Jul 2008
    Location
    Netherlands
    Posts
    16
    Thanks
    2
    Thanked 1 Time in 1 Post
    Bit late to this party.. but the GCC version is not doing well:

    starting from offset 003FFFFE things go wrong (4G-2) . The MSVC build is ok. Tried several files.. all the same result.

    Comparing files dec.tar.srep and DEC.TAR.SREP2
    003FFFFE: FF 5C
    003FFFFF: FF 00
    00400000: 5C FF
    00400001: 00 FF
    00400002: FF 60
    00400004: 60 C4
    00400005: FF FE
    00400006: C4 00
    00400007: FE F6
    00400008: 00 40
    0040000A: 40 80
    0040000C: 80 9C
    0040000D: F6 11

    Edit: to be more specific.. the core-i7-avx build.. not tested the others, but I suspect all gcc builds are flawed

    E
    Last edited by edcassa; 26th July 2015 at 16:36.

  39. #30
    Member Skymmer's Avatar
    Join Date
    Mar 2009
    Location
    Russia
    Posts
    681
    Thanks
    38
    Thanked 168 Times in 84 Posts
    There is nothing wrong with GCC. Actually any build created under default conditions is flawed. No matter what compiler used. The only way to create a correct build is to compile with FA_USE_EOF definition. Otherwise you'll get either different size of decoded file, either differences between decoded and original file.

    starting from offset 003FFFFE things go wrong
    Its simply explainable. Buffer size of MTari is 0x400000.

    Here is the simple benchmark performed on WinXP x64 SP2, i7-2700K@4800, ramdisk, 4 threads for encoding\decoding, test file is 4 000 000 000 bytes.
    MSVC and GCC compiles are taken from this thread, IC compiles created by me.
    Code:
    IC11 x86			35.437s		39.593s
    MSVC x86			37.140s		44.515s
     GCC x86-generic		44.187s		55.703s
    
    IC11 x64			34.375s		42.031s
    MSVC x64-no-msvcr		30.453s		38.093s
    MSVC x64			32.328s		36.781s
     GCC x64-generic		36.500s		42.062s
    Although MSVC performs really good for MTari, there is no way to create completely static builds for OpenMP with MSVC.
    Static IC binaries are below.
    Attached Files Attached Files

Similar Threads

  1. another (too) fast compressor
    By Cyan in forum Data Compression
    Replies: 139
    Last Post: 6th February 2016, 21:41
  2. LZSR fast compressor !
    By Nania Francesco in forum Data Compression
    Replies: 8
    Last Post: 4th October 2011, 06:46
  3. PACKET v.0.01 new fast compressor !
    By Nania Francesco in forum Data Compression
    Replies: 45
    Last Post: 19th June 2008, 02:44
  4. RINGS Fast Bit Compressor.
    By Nania Francesco in forum Forum Archive
    Replies: 115
    Last Post: 26th April 2008, 22:58
  5. Fast PPMII+VC Compressor
    By in forum Forum Archive
    Replies: 4
    Last Post: 2nd August 2006, 20:17

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •