Final Answers
© 2000-2021 Gérard P. Michon, Ph.D.

Rounding Numbers

To what heights would science now be raised
if Archimedes had made that discovery !
Carl Friedrich Gauss (1777-1855)
[ about the positional system of numeration ]

On the Art of Rounding Numbers
A Guide to the Proper Use of Floating-Point Arithmetic

Floating-point arithmetic gives you reliable accurate results only if you know how to work around its shortcomings. Floating-point arithmetic is not ordinary arithmetic; addition is not even associative! There are two very demanding rules to respect:

Never ever subtract nearly equal quantities.

Never add several negligible quantities to a dominant addend.

(2007-08-15) Scientific notation
Any nonzero number is equal to a multiple of a power of 10.

We may write any nonzero number in a unique way as a product into a power of 10 of a signed coefficient with a magnitude at least equal to 1 and strictly less than 10. The aforementioned power of 10 is called the [decimal] order of magnitude and it can be omitted when it's equal to unity (1 is 10 to the power of zero). For example, the speed of light expressed in scientific notation is:

c = 2.99792458 ´ 10⁸ m/s

A very important feature of scientific notation is that trailing zeros can only occur in the coefficient after the decimal point (since there's only one nonzero digit before that decimal point). Therefore, trailing zeros are always significant digits in scientific notation, as discussed in the next article (which, incidentally, presents an example where a result can only be given in scientific notation if the implied precision is to be stated correctly).

Nikki (Yahoo! 2007-08-14) Significant Figures
On the precision implied by giving just so many significant figures.

What's the precision of the factors in the following product?
How precise is the result? How would that result be best stated?

2.9 ´ 3.5 ´ 10.0

If nothing else is said, we can only assume that each factor is known within half a unit of the "least significant digit" given. Trailing zeroes are significant only if they occur after the decimal point (as is the case for the third factor above).

For example, 10.0 denotes a quantity which occurs anywhere between 9.95 and 10.05 with uniform probability. The above product is thus:

between 2.85 ´ 3.45 ´ 9.95 = 97.83 and 2.95 ´ 3.55 ´ 10.05 = 105.25

Therefore, it's best expressed in scientific notation as 1.0´10². Indeed, this denotes a quantity between 95 and 105. Close enough and not too precise.

Resist the temptation to write "100" here, since the trailing zeroes before the decimal point would imply that only the leading "1" is significant, indicating a very fuzzy result, between 50 and 150.

You can't merely give the result as 101.5 because that would be a gross misrepresentation of the precision involved. Instead, you should state:

2.9 ´ 3.5 ´ 10.0 = 1.0 ´ 10²

With the standardized notation which specifies precision via a standard deviation expressed in units of the least significant digit (see computation) as presented in the next article, we could write:

2.9 ´ 3.5 ´ 10.0 = 101.5(13)

On 2007-09-20, Barry wrote: [edited summary]

How would you report 10^0.0200 = 1.047128548... with the proper number of significant figures?

What about -log(0.001178) ?

Well, if the value 0.0200 comes from rounding, it's actually between 0.01995 and 0.02005. So the result is between 1.047008 and 1.047249. Stating the result as 1.0471 gives the impression that the true value is between 1.04705 and 1.04715. This is slightly too precise (by a factor of 2) but that's not grossly misleading (so, it's OK in my book). The alternative would be to state the result as 1.047, which is too coarse (by a factor of 5).

If you only rely on significant figures to express the precision of your results, you're always faced with a similar choice between two different levels of precision that differ from each other by a factor of 10. Just choose the lesser of two evils, knowing that you will occasionally have to misrepresent the precision of your result by a factor of 3 (or a bit more).

Such unsatisfying limitations can't be circumvented within the "significant figures" scheme. When the precision of a result has to be stated more rigorously, it's best to give either its upper and lower bounds (at a 99% confidence level) or to indicate an estimate of the standard deviation (as a two-digit number between parentheses after the least significant digit, as discussed in the next article).

In the second example, -log(0.001178) denotes some value

between -log(0.0011785) = 2.92867 and -log(0.0011775) = 2.92904

That's best reported as 2.929 (which says "between 2.9285 and 2.9295").

Interestingly, logarithms are the quintessential example of a case where the number of significant figures in the result is not directly related to the number of significant figures of the input data. In the following pathological example, the input has only 3 significant figures but the result does have 9 significant figures:

log ( 7.89 ´ 10¹²³⁴⁵⁶) = 123456.897

(2015-06-21) Meaning of inequalities with rounded numbers
Use strict inequalities to indicate the rounded value is a true bound.

Strict inequalities are easy: x < 1.5 and x < ³/₂ are equivalent.

When non-strict inequalities are used with rounded numbers, they acquire completely different meanings, similar to the meaning acquired by equalities in that case... What such an equality states is that a strict inequality is true for the tightest different bound expressible at the same level of precision. For example, x ≤ 1.5 means that x < 1.6.

The former is more intuitive than the latter, as it gives the best acceptable value at the relevant precision. This is familiar to old-school engineers but others may struggle when confronted with this, especially in tables.

(2007-08-14) Standardized precision (between trailing parentheses)
Standard deviation expresses the uncertainty or precision of a result.

In many cases, the above rules concerning significant digits are too coarse to convey a good indication of the claimed precision.

Professionals state the accuracy of their figures by giving the uncertainty expressed in units of the last figure between parentheses (see examples).

Technically, this uncertainty is expressed either as the relevant standard deviation or as 1/3 of the "firm" bounds you may have on either side of the mean (both definition are equivalent if we identify "firm bounds" with the 99.73% confidence level in a normal Gaussian distribution).

Straight rounding errors are not at all "normally distributed" along a Gaussian curve. Instead, the error is uniformly distributed over an interval whose width is equal to one unit of the least significant digit retained. This entails a standard deviation of 1/Ö12 = 0.29 in terms of that unit.

In our previous example of a product of three rounded value, what we have to determine is the standard deviation of the following random variable:

( 2.9 + 0.1 X ) ( 3.5 + 0.1 Y ) ( 10.0 + 0.1 Z)

Where X, Y and Z are independent random variables, each uniformly distributed between -½ and +½. The average (mathematical expectation) of that random variable is 101.5 and its standard deviation is 1.3444711... (HINT: this involves averaging the square of the above inside a cube of unit volume).

Thus, our product can be expressed with standardized precision as

101.50(134) or 101.5(13)

The latter form is the more common one, since standardized precision is most often expressed with 2 significant digits (3 digits is an overkill).

A close examination reveals that some authors round uncertainties upward systematically. This practice comes, presumably, from a dubious concern of never claiming too much precision. That misplaced modesty is unscientific. Uncertainty should be treated like any other quantity and be quoted at its alloted (2-digit) level of precision. It would be a mistake to give the above as 101.5(14).

(2007-08-15) Engineering Notation
Stating a nonzero number as a multiple of a power of 1000.

Engineering notation is superficially similar to scientific notation, except that the exponent of 10 is restricted to a multiple of 3 (thus, the relevant power of 10 is actually a power of 1000). For this to be possible in all cases, the coefficient is allowed to go from 1 (included) to 1000 (excluded).

Because there may be trailing zeros before the decimal point in engineering notation, the number of significant digits is not always clear. This is the main reason why the systematic use of the engineering notation is strongly discouraged in print, unless accuracy is stated with the above convention.

By extension, we also call engineering notation any system resembling scientific notation where the absolute magnitude of the coefficient is not restricted to the 1-10 range (it could, occasionally, be more than 1000 or less than 1). List of results spanning several orders of magnitude are sometimes more readable this way, since we can merely compare coefficients as the order of magnitude (a power of 10) remains constant.

(2007-08-07, 2021-07-05) Inaccuracy lurking in the quadratic formula
Alternative approaches for robust solutions to quadratic equations.

The quadratic formula was first published early in the ninth century AD (c. 810) by Al Kwarizmi. It's a mainstay of middle-school algebra, giving the two roots of the polynomial a x²+ b x + c when b²- 4ac is positive:

		Ö
x =	- b ±		b²- 4ac

	2 a

WLG, we may consider only monic polynomials Up to a change of sign, their coefficients are just the symmetric functions of the roots m-u and m+u; namely the sum 2m (twice the mean m) and the product p:

[ x - (m+u) ] × [ x - (m-u) ] = x² - 2m x + m²-u² = x² - 2m x + p

This implies that u² = m² - p. If that quantity is positive, the square root function can be used to obtain a simplified version of the quadratic formula, currently advocated by Po-Shen Loh (coach for the US IMO team).

		Ö
x =	m ±		m²- p

There are two problems with all formulas of this type :

They're not robust, as a subtraction of nearly equal quantities may be required, with an unacceptable loss of floating-point accuracy.
The square-root function isn't well-defined (although the two-valued locution "±Ö" is). Especially for equations with complex coefficients.

To solve the first problem, we may first compute whichever root is trouble-free because it doesn't involve the subtraction of nearly equal quantity (thus choosing the "+" sign if m is positive and the "-" sign if it's negative). The other root is then computed by dividing p into that root, which doesn't cause any undue loss of floating-point accuracy (a multiplication or a division never does). Other similar techniques can be found in the following section, including the robust expression for a difference of square roots (of which the above can be construed as a special case).

We run into the aforementioned second problem when trying to discuss precisely which root is trouble-free in the case of complex coefficients.

Let's consider the following form of quadratic equation which uses the hyperbolic sine function (sh). Any normalized quadratic equation with a negative constant term can be recast into that form:

x² + 2 a x sh q - a² = 0

Its two real solutions, are then given by the following robust formulas:

-a exp q and a exp -q

Using the reverse hyperbolic function Argsh to obtain q, if need be, will never entail any loss of floating-point precision...

This transformation is also helpful to express with pretty formulas the solutions to some elementary problems in mathematical physics.

Numerically Stable Method for Solving Quadratic Equations by Berthold K.P. Horn (MIT, 2005-03-07).

(2007-08-07) Devising robust formulas...
How to avoid subtracting nearly equal quantities.

In what follows, the number x need not be small, but it may well be...

In each of the examples below, the floating-point computation on the left-hand side will lead to an unacceptable loss of precision when x is small. The given substitute should be used, which is mathematically identical but won't lead to potentially nonsensical results with floating-point arithmetic.

Square Root :

Ö(a+x) - Öa = x / [ Ö(a+x) + Öa ]

Exponential :

e^{a + x} - e^a = 2 sh(x/2) e^a+x/2

Cosine : [ Usage examples: 1 | 2 ]

1 - cos x = 2 sin²(x/2)

Inverse of Hyperbolic Tangent (cf. relativistic rapidity) :

Argth (a+x) - Argth (a) = Argth ( x / [1-a(a+x)] )

(2019-02-10) Underflow
0.0 represents a small result which is not known to be exactly zero.

As floating-point quantities always represent an approximation, there is no reason why identical floating-point numbers should denote the same number.

1.0 - 1.0 = 0.0

Calculator designers should resist the temptation to equate 0 and 0.0 :

0⁰ = 1 and 0.0⁰ = 1
( x^0.0 is undefined unless x is an exact positive integer.)

More generally, one should distinguish floating-point and exact values. Besides explicit rounding to exact values (under the user's responsibility) an operation on floating-point numbers never yields an exact value and should never be misrepresented as such.

1 + 0. = 1.

Final Answers
© 2000-2021 Gérard P. Michon, Ph.D.

Rounding Numbers

See also:

Related Links (Outside this Site)

On the Art of Rounding Numbers
A Guide to the Proper Use of Floating-Point Arithmetic

Square Root :

Exponential :

Cosine : [ Usage examples: 1 | 2 ]

Inverse of Hyperbolic Tangent (cf. relativistic rapidity) :

Final Answers © 2000-2021 Gérard P. Michon, Ph.D.

Rounding Numbers

See also:

Related Links (Outside this Site)

On the Art of Rounding NumbersA Guide to the Proper Use of Floating-Point Arithmetic

Square Root :

Exponential :

Cosine : [ Usage examples: 1 | 2 ]

Inverse of Hyperbolic Tangent (cf. relativistic rapidity) :

Final Answers
© 2000-2021 Gérard P. Michon, Ph.D.

On the Art of Rounding Numbers
A Guide to the Proper Use of Floating-Point Arithmetic