Well, you need to be careful to make useful comparisons when doing benchmarks. I think that Brotli in general aims for stronger compression than the other formats I benchmarked, and brotli -q 11 -w 24 in particular uses a much larger sliding window size and a much more thorough encoder than the other algorithms. So to get a fair comparison you would need to either run the other algorithms in a similar mode (which isn't fully possible for LZFSE and DEFLATE as they have a lower maximum window size; this certainly could be considered a disadvantage of those formats), or use a lower compression mode of Brotli.

So, I edited my posts above to add brotli -6 -w 18 and brotli -6 -w 19, as those seemed the most comparable for faster compression. Results were as expected: Brotli had the best compression ratio but a slower decoder.

For the sake of curiosity, I also compared brotli -11 -w 24 with Zstd levels 19 through 22 as this seemed at least somewhat comparable for better compression. Here were the results:


Method                        Csize (bytes)  Ctime (ms) Dtime(ms)
----------------------------- -------------  ---------- ---------
Brotli, -11 -w 24             50050328       823513     990       
Zstd -22                      52790083       154684     533       
Zstd -21                      52824748       123616     533       
Zstd -20                      53018920       108542     535       
Zstd -19                      53939418       93909      513       

ARM (32-bit)

Method                        Csize (bytes)  Ctime (ms) Dtime(ms)
----------------------------- -------------  ---------- ---------
Brotli, -11 -w 24             50050328       8098536    4902      
Zstd -22                      52814371       853979     4200      
Zstd -21                      52839938       789343     4546      
Zstd -20                      53018920       694703     4762      
Zstd -19                      53939418       495606     3796
So on ARM, Brotli decoding was slower than Zstd and about twice as slow as zlib. To some extent this is expected, due to the more complex algorithm and the large sliding window size. Brotli encoding was also very slow --- it ran for over 2 hours.

It may be worth noting that based on my results before, Zstd is slower on ARM than it should be --- XPACK was much faster. I'm not sure why.

I also don't know why Zstd compressed to a different size at levels 21 and 22 on ARM vs on x86_64.