This is a list of falsehoods programmers tend to believe about floating point numbers- specifically the IEEE-754 floating point numbers used ubiquitously today.
When using floating point, it is easy to write programs which may seem to compute the right answer but are actually hiding subtle bugs. In serious applications numerical computing quickly gets complicated, requiring the consideration of many factors, like the accumulation of error, numerical stability, and the how the numbers flow throughout the program. Knowing some floating point quirks provides a good foundation for when your math starts to look off.
While this list does not go into the details of correctly using floating point numbers, it does enumerate a number of assumptions often made by programmers.
All of these assumptions are wrong
- Floating point arithmetic is exact
- Floating point arithmetic is always inexact
- The properties of arithmetic (commutativity, associativity, distributivity, inverse) hold
- The error in floating point math always tends to average itself out
- Floating point math is precise enough for programs which manage money
- A list of numbers can be summed in any order without affecting the result
- A list of numbers can be multiplied in any order without affecting the result
- Floating point can’t be used for integer math
- Floating point numbers are either 64 or 32 bits
- Floating point numbers have
2^n
bits - If two floating point numbers have different bits, they are not equal
- If two floating point numbers have the same bits, they are equal
- The reciprocal of two equal numbers is also equal
- There is only one way to encode NaN
- Floating point functions supported by the CPU are computed as accurately as possible
- Arithmetic operations execute in a constant amount of time
- Addition/multiplication operations execute in a constant amount of time
- Floating point math is always executed on specialized hardware
- Exceptions in floating point math always throw
- Floating point math always rounds the same way
- Programs built with the same compiler brand will produce the exact same results
- Programs build with the same compiler version will produce the exact same results
- Debug and release mode give identical results
- CPUs with the same instruction set produce the exact same results executing floating point instructions
- 32 bit and 64 bit versions of the same program running on the same machine will produce the same results
Further Reading
- Bruce Dawson’s Blogs on Floating Point
- Floating Point Guide - Michael Borgwardt
- What Every Computer Scientist Should Know About Floating-Point Arithmetic - David Goldberg