Results 1 to 10 of 10

Thread: Amazon AZ64 compression better than Zstd

  1. #1
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    346
    Thanks
    129
    Thanked 53 Times in 37 Posts

    Amazon AZ64 compression better than Zstd

    Hi all – Amazon's new AZ64 codec looks interesting. They say that it compresses better than LZO and is 40% faster, and compresses better than Zstd and is 70% faster.

    Does anyone know more about it and how it works? The only thing they say is that they leverage SIMD instructions, and they imply that it shines with small data in particular. It's used in their Redshift database, which I think is based on PostgreSQL.

    The SIMD angle is interesting. I think there's still a lot of headroom with compression codecs because surprisingly few of them leverage SIMD very well. Are Zstd and brotli leveraging SIMD?

  2. #2
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    830
    Thanks
    239
    Thanked 300 Times in 180 Posts
    Quote Originally Posted by SolidComp View Post
    Are Zstd and brotli leveraging SIMD?
    It looks like that compressor may be using the properties of specific data types and searches done into database tables -- i.e., possibly data is ordered. Compressing ordered uint32s is a different game than general purpose compression. Riegeli might be closer to that than zstd or brotli, but most likely not quite the same use case.

  3. #3
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,564
    Thanks
    773
    Thanked 687 Times in 372 Posts
    At its core, the AZ64 algorithm compresses smaller groups of data values and uses single instruction, multiple data (SIMD) instructions for parallel processing. Use AZ64 to achieve significant storage savings and high performance for numeric, date, and time data types.
    it's the so-called integer compression

  4. #4
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    774
    Thanks
    237
    Thanked 248 Times in 152 Posts
    Indeed this is only for SMALLINT, INTEGER, BIGINT, DECIMAL, DATE, TIMESTAMP, TIMESTAMPTZ ... where LZ is not very useful - basic techniques for time series, separate data types should do a better job.
    Here are some benchmarks: https://stackoverflow.com/questions/...at-performance


    Recently zstd was added to ROOT they use in most of high energy physics experiments ( https://arxiv.org/pdf/2004.10531 ) - again LZ makes no sense for this kind of data, specialized compressor should do a better job ( https://arxiv.org/pdf/1511.00856 ).

  5. #5
    Member
    Join Date
    Feb 2015
    Location
    United Kingdom
    Posts
    171
    Thanks
    28
    Thanked 73 Times in 43 Posts
    Ah the old comparing apples to oranges trick, never fails to impress .

  6. #6
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    503
    Thanks
    181
    Thanked 177 Times in 120 Posts
    It's a reasonable comparison to make if you're running a database engine and want to justify the extra complexity of adding in type-custom compression engines over using the generic ones you already have available, but it's probably been taken out of context here.

  7. Thanks:

    Bulat Ziganshin (29th April 2020)

  8. #7
    Member
    Join Date
    Mar 2016
    Location
    USA
    Posts
    56
    Thanks
    7
    Thanked 23 Times in 15 Posts
    For these types, you'd often use Delta coding or another specialized algorithm, so I wonder how this compares. That thread mentions that their own ANALYZE system currently doesn't test the new algorithm and they haven't given any numbers for the other methods.

  9. #8
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    346
    Thanks
    129
    Thanked 53 Times in 37 Posts
    I wonder if they're using Daniel Lemire's integer SIMD compression libraries.

  10. #9
    Member
    Join Date
    May 2017
    Location
    UK
    Posts
    13
    Thanks
    1
    Thanked 3 Times in 3 Posts
    I don't think it's apples to oranges, as JamesB said it shows better performance for some domains.

    My observation as an outsider in the compression scene is that most focus seems to be on human words compression. All the main corpus have a big skew towards this type of data, rather than scientific data.

    Scientific data is the most volumous and would benefit the most from compression IMO. I actually think the reason oodle has made such strides is they benchmark on game data which is basically scientific data.


  11. #10
    Member
    Join Date
    Feb 2015
    Location
    United Kingdom
    Posts
    171
    Thanks
    28
    Thanked 73 Times in 43 Posts
    One thing these benchmarks often fail to mention is the class of compressor, integers compressors can be lossy if they sort records before delta coding, as in they cannot restore the original order of the data. At which point comparing it to a lossless codec is no longer fair, since there is no such thing as an "unsort" unless we're dealing with burrows wheeler transforms, which we aren't.
    Hence why comparing the two can be an apples to oranges comparison.

    But yes from a broad perspective compression is compression, as for being able to restore the exact original content, that's another question .

Similar Threads

  1. shar2 zstd archiver
    By Shelwien in forum Data Compression
    Replies: 7
    Last Post: 26th November 2019, 20:27
  2. ZSTD license
    By Bulat Ziganshin in forum Data Compression
    Replies: 41
    Last Post: 2nd October 2017, 04:42
  3. LzTurbo vs. Oodle vs. Zstd
    By dnd in forum Data Compression
    Replies: 21
    Last Post: 28th June 2017, 23:30
  4. zstd HC
    By inikep in forum Data Compression
    Replies: 10
    Last Post: 7th November 2015, 23:27

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •