How Floating-Point Arithmetic Works in Python

Today, machines use IEEE standard binary floating-point format to represent numbers, which is almost similar to the scientific notation. In scientific notation, we use base 10 but machines store the numbers in binary and most decimal fractions cannot be represented exactly as binary fractions. This is known as the Representation error. 

Here is an example of the representation error in Python.

When you print 0.1 in Python, what you actually see is a round-off value. Python stores the decimal values in the form of binary floating-point format and returns the approximate decimal value converted from the approximate binary fraction of the exact decimal value.


But if you try to do arithmetic with this number you don’t get the exact result. As shown in the Python code below, when you add 0.1 and 0.2, you would expect the result to be 0.3 but that’s not what you actually get. Since 0.1 or 0.2 cannot be expressed in binary floating-point exactly, we don’t get the desired output.

print(0.1 + 0.2)

For example, a decimal fraction like 1/3 cannot be represented using only a finite number of digits, because no matter how many digits you write it will still be an approximation (1/3 = 0.333…). Similarly, in binary, some numbers cannot be represented exactly. Lets now see how you can convert a decimal fraction into binary.


To convert 0.1 into binary, we multiply 0.1 by 2 and separately write the fraction and integer. The fraction obtained in the previous result is again multiplied by 2 and the same process is repeated again. The process stops when we get 0 in the fraction. The integers obtained, form the binary value of the decimal fraction (0.1 = 0.00011001100110011…).

This is the binary representation of the decimal number. But, this conversion is just a part of the binary floating-point format. The format is described here briefly.

In a 64-bit computer, the first bit of a binary floating-point number represents the sign bit. The sign bit is either 0 or 1 for positive or negative respectively. The next 11 bits are for the exponent and last 52 bits are for the mantissa.

To convert a decimal number in the binary floating-point format, you can use an online converter. You can also follow the steps as described in this website.

This next example shows a similar representation error when you try to add 1/3, which is a recurring decimal number (0.333…).If you add it three times, as shown in the code below, you would expect it to be 0.999… but what you actually get is a round-off value.

print(1/3 + 1/3 + 1/3)

Let us look at another example.

It is known that we cannot represent irrational numbers like √2 exactly because its actual value has infinite digits. So, we first print the square root of 2 and get a finite number of digits as output. Now, we multiply √2 with itself we don’t get the exact number. If we now compare 2 and the result of √2×√2 we get False output.

It’s again because of the representation error because we know it should be true. So, to correct this error we write it in a different way s.t. the difference between these two values is less than 0.1 which then gives True output.

import math

a = math.sqrt(2)



print(2 == a*a)

print(abs(a*a - 2.0) < 0.1)

These examples represent how Floating-point arithmetic works in Python. The errors presented here are inherent in the binary floating-point numbers, not in the Python code.


Related Posts