# Thread: Compression codecs that use floating point math

1. ## Compression codecs that use floating point math

Hi all – Are there any codecs that use floating point? When I looked at various codecs' repositories a while back, I didn't find any floats or doubles. What are some codecs that use floating point?

How do arithmetic coders handle their values internally?

Thanks.

2. C/C++ compilers are broken in regards to consistent float-point arithmetics.
And now cpus are broken too: https://stackoverflow.com/questions/...g-point-errors

Still, its possible to use floats in asymmetric algorithms (eg. optimization of LPC coefs) or lossy compression.

> How do arithmetic coders handle their values internally?

Normally AC and ANS just work with integers.
There're some experimental rangecoder implementations using floats,
but they're not reliable and don't provide any benefits.

3. ## Thanks:

SolidComp (9th June 2020)

4. Some proof-of-concept coders use floats and even doubles. E.g.:

/* Dynamic Markov Compression (DMC) Version 0.0.0
*
*
*
* Gordon V. Cormack
* University of Waterloo
* cormack@uwaterloo.ca
*/
/* This program implements DMC as described in
*
* "Data Compression using Dynamic Markov Modelling",
* by Gordon Cormack and Nigel Horspool
* in Computer Journal 30:6 (December 1987)
*
* It uses floating point so it isn't fast. Converting to fixed point
* isn't too difficult.
*/

Or

/* Arithmetic coding routines by A.C. Popat, 1990. For theory, see
* * Popat, Ashok C., "Scalar Quantization with Arithmetic Coding,"
* * SM thesis, MIT Dept. of Elec. Eng. and Comp. Sci., Cambridge,
* * Massaachusetts, June 1990.
*/

I would say if your compressor uses some form of adaptive quantization it tends to utilize FP math.

Or

// Dasher v. 1.6
// Copyright (C) 2001 (David Ward djw30@cam.ac.uk)

Or PPMY (no idea who that is from)

Doubles through out.

The experiments before paq (e.g. p12, p5, p6) have double probabilistic "neurons".

There is an MQ-coder implementation which uses doubles all through.

That's about storage in FP, there are too many projects doing log2 in FP to mention.

5. ## Thanks:

SolidComp (13th June 2020)

6. I recently found a data corruption bug in the GeCo3 compressor ( https://github.com/cobilab/geco3/issues/1 ). They fixed if by refactoring some of the floating point math code. (After that they also decided to remove "-Ofast" gcc switch from their makefile. A good decision, since "-Ofast" implies "-ffast-math".)

Although the compressor now passes my tests, I'm still not sure if it's safe to use it. The mechanism by which input file name was related to reproducible data corruption is still unknown, so there may be still other issues (I suspect memory safety problems).

Another case showing that extreme care is needed when using floating point math in reproducible experiments.

7. OptimFROG uses fp-math, also SAC, they are both lossless

8. Originally Posted by Shelwien
C/C++ compilers are broken in regards to consistent float-point arithmetics.
And now cpus are broken too: https://stackoverflow.com/questions/...g-point-errors
I understand
1. compilers have flags to perform "unsafe" optimization of floating-point expressions ("fast math")
2. with the x87 FPU, intermediate rounding makes normal C programs practically unpredictable
3. libraries implement their functions differently (including the C runtime of your choice)

But could you please elaborate on the CPU part? I was under the impression that math should be fully reproducable even on different CPUs, as long as
1. you don't enable lossy compiler optimization
2. it uses single- or double-precision, not 80-bit floats
3. it uses SSE/AVX registers instead of the FPU stack
4. it does not rely on libraries, i.e. only add/sub/mul/div
5. you don't mess with FPU flags like flush-to-zero or rounding mode
(The linked question was solved with correct compiler settings. Where does the CPU come into play?)

9. > compilers have flags to perform "unsafe" optimization of floating-point expressions ("fast math")
> with the x87 FPU, intermediate rounding makes normal C programs practically unpredictable
> libraries implement their functions differently (including the C runtime of your choice)

1) Recently compilers try to avoid using FPU at all - instead they use SSE/AVX code,
which works with element's precision. However due to different optimizations it
might work differently on various platforms (Intel/AMD etc).

2) In FP, even a+b==b+a is not true. Unfortunately it wasn't a concern during early
development of compilers, so now its very hard to fix.
Just count the number of proposed workarounds here: https://www.nccs.nasa.gov/images/Flo...onsistency.pdf
Note that even after all that, they still can't guarantee the same results everywhere,
just limited error.

> But could you please elaborate on the CPU part?
> I was under the impression that math should be fully reproducable even on different CPUs, as long as
> you don't enable lossy compiler optimization

It was a speculation in https://stackoverflow.com/a/968460/395609 ,
based on information which turned to be false (results did change
in runtime, but it was due to different alignment of dynamically allocated blocks).

But with all the Spectre bugs, I'd not be surprised if FP consistency
is not always preserved even on CPU level, because of instruction reordering etc.

> (The linked question was solved with correct compiler settings. Where does the CPU come into play?)

They weren't solved in the sense of exactly same binary results everywhere,
just error within bounds.

In any case, I did stumble on this issue in my own projects, and wasted a lot of time on software workarounds -
up to using a custom wrapper class for floats with volatile variables everywhere - it still wasn't 100% safe at the time,
and turned out not future-proof (newer compiler versions ignore my workarounds).
Thus I think that the only way to get consistent result is to not use FP at all.

10. Brotli quality 10 and 11 do floating point on encoding side for entropy estimation.

Zopfli does floating point for entropy estimation.

JPEG XL reference implementation is fully 32-bit floating point (r, g, b, a) are 4x 32-bit floating point number during the decompression. (Say goodbye and farewell to banding artefacts, and welcome to maxed out wide gamut and HDR!)

11. Entropy estimation and lossy codecs do not count. Rounding errors don't have any correctness effect.

12. Originally Posted by Jyrki Alakuijala
Brotli quality 10 and 11 do floating point on encoding side for entropy estimation.

Zopfli does floating point for entropy estimation.

JPEG XL reference implementation is fully 32-bit floating point (r, g, b, a) are 4x 32-bit floating point number during the decompression. (Say goodbye and farewell to banding artefacts, and welcome to maxed out wide gamut and HDR!)

Wow, I thought 16-bit floats per channel were the max visual benefit for humans. I don't know if that counts HDR or not, but I thought JPEG-XR and OpenEXR used 16-bit.

13. 16 bit is usually enough, but unsigned 16 bit has some issues when one wants to do image processing with it -- one would only have 11-12 bits of actual dynamics and that can already start causing issues. However, the 32-bit float was chosen because of better support in streaming instructions on a large variety of platforms. Despite of this and being YUV444 JPEG XL decoding is still about 2x faster than other modern codecs.

14. Originally Posted by Shelwien
1) […] However due to different optimizations it might work differently on various platforms (Intel/AMD etc).
I just don’t think so. For example, the racing game Live for Speed stores its record times as a sequence of player inputs and uses it during replay to re-run the original simulation of the lap. It does so since the early 2000s, through dozens of updates. While there are occasional problems on overclocked/overheated CPUs, I have yet to hear of failures specific to CPU models. Games are chaotic systems; if CPUs were non-deterministic with floats, what Live for Speed does would be impossible.

Originally Posted by Shelwien
2) In FP, even a+b==b+a is not true. Unfortunately it wasn't a concern during early development of compilers, so now its very hard to fix.
I know it for (a+b)+c vs. a+(b+c), but with only two numbers?! How is that?! IEEE754 guarantees 0.5 ULPs of accuracy, and even if the mathematical result happens to lie exactly halfway between two float representations, ties-to-even takes effect.
(The situation is different for min/max. These are actually specified as max(a,b)!=max(b,a) if one of them is NaN.)

Regarding your project: Yeah, C++ made some pretty bad mistakes in the way it treats volatile, and compiler vendors are definitely at fault pushing their fast math options. It’s incredibly hard, but I still wouldn’t blame the CPU. I’ve been in CAD and we had a tough time with floating-point peculiarities (like splitting a line segment in the middle creating just one new line instead of two), but our computations were 100% reproducable between CPUs, even without volatile – as long as we used the correct compiler options and sticked to the same math library.

Originally Posted by Jyrki Alakuijala
However, the 32-bit float was chosen because of better support in streaming instructions on a large variety of platforms.
This. 32-bit floats can perform 23-bit integer math perfectly, but often at a higher throughput and wide SIMD availability across platforms.
If you are purely compute-limited, you may profit from an increased number of registers as well. Branching is often harder, though.

15. > It does so since the early 2000s, through dozens of updates.
> While there are occasional problems on overclocked/overheated CPUs,

Well, as example: https://nfs.fandom.com/wiki/Motor_Ci.../Patch_History
"Fixed a bug which caused replays to behave erratically."
"Replay no longer experience strange camera bumpiness. Note, this will break existing replay files."

Basically you're exaggerating its consistency,
and their developers probably designed a replay system which doesn't rely on FP consistency,
because these games had to run on completely different platforms.

> I have yet to hear of failures specific to CPU models. Games are chaotic systems;
> if CPUs were non-deterministic with floats, what Live for Speed does would be impossible.

Well, its certainly much less reliable than you think: https://stackoverflow.com/a/3144126/395609
Games are simply more robust than lossless compression algorithms.

> I know it for (a+b)+c vs. a+(b+c), but with only two numbers?! How is that?!

1) https://godbolt.org/z/QHZUUm (7FD39BD2+7F800001=7FD39BD2 7F800001+7FD39BD2=7FC00001)

2) What if "a" and "b" have different FP types (eg. float and double)?

3) "When Floating-Point Addition Isn't Commutative" https://dl.acm.org/doi/pdf/10.1145/156301.156303

> IEEE754 guarantees 0.5 ULPs of accuracy, and even if the mathematical result happens to lie exactly

1) Yes, newer (2008+) version of IEEE-754 is more strict, but its also too new -
there's still plenty of hardware from before that.

2) "Many real-world implementations of the architectures nearly
support IEEE754, but with caveats like not having the full set of NaNs,
forcing denorms to zero, errors of an ULP or two in multiplication /
division results, having multiplication differ by an ULP or two depending
on operand order etc. So "99% of CPUs are IEEE754 compliant" needs a
disclaimer - the spirit is true and for the purposes of the question you
are correct, but in general the devil is often in the detail. More like,
99% of CPUs are 99% IEEE754 compliant."

> It’s incredibly hard, but I still wouldn’t blame the CPU.

I'm not "blaming CPUs". I just stumbled on this idea when googling stuff for this thread,
and it seemed fairly realistic - with all the relatively recent out-of-order/speculative
CPU optimizations and Spectre bugs (which also appear mostly due to speed optimizations in the cpu).

> our computations were 100% reproducable between CPUs,

At least transcendental functions certainly have different implementations on different CPUs.
https://stackoverflow.com/a/13102431/395609

16. Those interested in the deeper issues of floating point representations and alternative number systems might be interested in unums and posits: https://en.wikipedia.org/wiki/Unum_(number_format)

17. Originally Posted by Shelwien
Well, as example: https://nfs.fandom.com/wiki/Motor_Ci.../Patch_History
"Fixed a bug which caused replays to behave erratically."
"Replay no longer experience strange camera bumpiness. Note, this will break existing replay files."
Is there any comment on the root cause? Live for Speed had similar issues, but they were programming errors, like inadvertedly changing the random number generator for wind. But this would have happened with integer math or bignum or anything else just as well!

> I have yet to hear of failures specific to CPU models. Games are chaotic systems;
> if CPUs were non-deterministic with floats, what Live for Speed does would be impossible.

Well, its certainly much less reliable than you think: https://stackoverflow.com/a/3144126/395609
Again, that post does not say anything about the root cause – in fact, floating-point numbers are not even mentioned in the thread?!

————

> I know it for (a+b)+c vs. a+(b+c), but with only two numbers?! How is that?!

1) https://godbolt.org/z/QHZUUm (7FD39BD2+7F800001=7FD39BD2 7F800001+7FD39BD2=7FC00001)
I see, you’re trying to say that one version of the code will later raise an additional floating-point exception (because it results in a signalling NaN) and the other won’t (because it results in a quiet NaN)? That’s a nasty situation indeed, but from my understanding, at least Visual C++ will mind that and won’t re-order the operands or instructions under /fp:strict:
The compiler does not perform algebraic transformations on floating-point expressions, such as reassociation or distribution, unless the transformation is guaranteed to produce a bitwise identical result.
————

2) What if "a" and "b" have different FP types (eg. float and double)?
You mean the issue described here and here?
Regardless of the value of FLT_EVAL_METHOD, any floating-point expression may be contracted, that is, calculated as if all intermediate results have infinite range and precision (unless #pragma STDC FP_CONTRACT is off)
I didn’t know that, thanks! Like above, it is controlled by compiler settings.

————

3) "When Floating-Point Addition Isn't Commutative" https://dl.acm.org/doi/pdf/10.1145/156301.156303
Good paper, but from my understanding their main issue is „When does rounding take place?“, and you generally won’t have this issue when compiling for SSE/AVX (except when mixing different precisions per above) because every operation rounds to register width?

————

At least transcendental functions certainly have different implementations on different CPUs.
https://stackoverflow.com/a/13102431/395609
Like I said: With the FPU you’re screwed anyway. SSE/AVX does not have transcendental instructions, so there is nothing to worry as long as you stick to the proper library.

————

> IEEE754 guarantees 0.5 ULPs of accuracy, and even if the mathematical result happens to lie exactly

1) Yes, newer (2008+) version of IEEE-754 is more strict, but its also too new -
there's still plenty of hardware from before that.

2) "Many real-world implementations of the architectures nearly
support IEEE754, but with caveats like not having the full set of NaNs,
forcing denorms to zero, errors of an ULP or two in multiplication /
division results, having multiplication differ by an ULP or two depending
on operand order etc. So "99% of CPUs are IEEE754 compliant" needs a
disclaimer - the spirit is true and for the purposes of the question you
are correct, but in general the devil is often in the detail. More like,
99% of CPUs are 99% IEEE754 compliant."
I have little experience with non-x86 FP handling, so I assume you’re right. But at least x86 with SSE/AVX and any modern compiler shouldn’t need that disclaimer.