home | index | units | counting | geometry | algebra | trigonometry | calculus | functions
analysis | sets & logic | number theory | recreational | misc | nomenclature & history | physics

Final Answers
© 2000-2021   Gérard P. Michon, Ph.D.

Rounding Numbers

 Carl Friedrich Gauss To what heights would science now be raised
if Archimedes had made that discovery !

Carl Friedrich Gauss  (1777-1855)
[ about the positional system of numeration ]
 Michon

See also:

Related Links (Outside this Site)

Volume of a Tetrahedron & Programming Languages  by  Prof. W. Kahan.
Values of Physical Constants   (CODATA: 1973, 1986, 1998, 2002, 2006, 2010, 2014...)
Rounding errors  by  Peter Cameron  (2012-08-03).
Measurements of GC. Rothleitner & S. Schlamminger  (AIP, Nov. 2017).
 
Online Arbitrary Precision Calculator  by  Mikko Tommila  (bef. 2019-06-10).

Wikipedia :   Kahan summation algorithm   |   William M. Kahan (1933-)

Floating-point numbers (14:24)  by  Kelsey Houston-Edwards (2017-07-21).

 
border
border

On the Art of Rounding Numbers
A Guide to the Proper Use of Floating-Point Arithmetic

Floating-point arithmetic gives you reliable accurate results only if you know how to work around its shortcomings.  Floating-point arithmetic is not ordinary arithmetic; addition is not even associative!  There are two very demanding rules to respect:

 Simon Stevin 
 (1548-1620)
  • Never  ever  subtract nearly equal quantities.
  • Never add  several  negligible quantities to a dominant  addend.


(2007-08-15)   Scientific notation
Any nonzero number is equal to a multiple of a  power of 10.

We may write any nonzero number in a  unique  way as a product into a power of  10  of a signed coefficient with a magnitude at least equal to  1  and strictly less than  10.  The aforementioned power of  10  is called the [decimal] order of magnitude and it can be omitted when it's equal to unity  (1 is 10 to the power of zero).  For example, the speed of light expressed in scientific notation is:

c   =   2.99792458 ´ 10 8  m/s

A very important feature of scientific notation is that trailing zeros can only occur in the coefficient  after  the decimal point  (since there's only one nonzero digit before that decimal point).  Therefore, trailing zeros are always  significant digits  in scientific notation, as discussed in the next article  (which, incidentally, presents an example where a result can  only  be given in scientific notation if the implied precision is to be stated correctly).


Nikki  (Yahoo! 2007-08-14)   Significant Figures
On the precision implied by giving just so many significant figures.

What's the precision of the factors in the following product?
How precise is the result?  How would that result be best stated?

2.9 ´ 3.5 ´ 10.0

If nothing else is said, we can only assume that each factor is known within half a unit of the "least significant digit" given.  Trailing zeroes are significant  only  if they occur  after  the decimal point  (as is the case for the third factor above).

For example,  10.0  denotes a quantity which occurs anywhere between  9.95  and  10.05  with uniform probability.  The above product is thus:

between   2.85 ´ 3.45 ´ 9.95 = 97.83   and   2.95 ´ 3.55 ´ 10.05 = 105.25

Therefore, it's best expressed in scientific notation as  1.0´10 2.  Indeed, this denotes a quantity between 95 and 105.  Close enough and not too precise.

Resist the temptation to write "100" here, since the trailing zeroes before the decimal point would imply that only the leading "1" is significant, indicating a very fuzzy result, between 50 and 150.

You can't merely give the result as  101.5  because that would be a gross misrepresentation of the precision involved.  Instead, you should state:

2.9 ´ 3.5 ´ 10.0   =   1.0 ´ 10 2

With the  standardized notation  which specifies precision via a standard deviation expressed in units of the least significant digit  (see computation) as presented in the next article, we could write:

2.9 ´ 3.5 ´ 10.0   =   101.5(13)

On 2007-09-20,  Barry  wrote:       [edited summary]
How would you report 100.0200 = 1.047128548... with the proper number of significant figures?
 
What about  -log(0.001178) ?

Well, if the value  0.0200  comes from rounding, it's actually between 0.01995 and 0.02005.  So the result is between 1.047008 and 1.047249.  Stating the result as 1.0471 gives the impression that the true value is between 1.04705 and 1.04715.  This is slightly  too  precise (by a factor of 2) but that's not  grossly  misleading  (so, it's OK in my book).  The alternative would be to state the result as 1.047, which is too coarse  (by a factor of 5).

If you only rely on  significant figures  to express the precision of your results, you're always faced with a similar choice between two different levels of precision that differ from each other by a factor of 10.  Just choose the lesser of two evils, knowing that you will occasionally have to misrepresent the precision of your result by a factor of 3 (or a bit more).

Such unsatisfying limitations can't be circumvented within the "significant figures" scheme.  When the precision of a result has to be stated more rigorously, it's best to give either its upper and lower bounds (at a 99% confidence level) or to indicate an estimate of the standard deviation  (as a two-digit number between parentheses after the least significant digit, as discussed in the next article).

In the second example,   -log(0.001178)   denotes  some  value

between   -log(0.0011785) = 2.92867   and   -log(0.0011775) = 2.92904

That's best reported as  2.929  (which says "between 2.9285 and 2.9295").

Interestingly, logarithms are the quintessential example of a case where the number of significant figures in the result is not directly related to the number of significant figures of the input data.  In the following  pathological  example, the input has only 3 significant figures but the result does have  9  significant  figures:

log ( 7.89 ´ 10 123456 )   =   123456.897


(2015-06-21)   Meaning of inequalities with rounded numbers
Use  strict  inequalities to indicate the rounded value is a true bound.

Strict inequalities are easy:   x  <  1.5   and   x  <  3/2   are equivalent.

When non-strict inequalities are used with rounded numbers, they acquire completely different meanings, similar to the meaning acquired by equalities in that case...  What such an equality states is that a strict inequality is true for the tightest different bound  expressible at the same level of precision.  For example,  x ≤ 1.5  means that  x < 1.6.

The former is more intuitive than the latter, as it gives the best acceptable value at the relevant precision.  This is familiar to old-school engineers but others may  struggle  when confronted with this, especially in tables.


(2007-08-14)   Standardized precision  (between trailing parentheses)
Standard deviation  expresses the uncertainty or  precision of a result.

In many cases, the above rules concerning  significant digits  are too coarse to convey a good indication of the claimed precision.

Professionals state the accuracy of their figures by giving the uncertainty expressed in units of the last figure between parentheses (see examples).

Technically, this uncertainty is expressed either as the relevant  standard deviation  or as 1/3 of the "firm" bounds you may have on either side of the mean  (both definition are equivalent if we identify "firm bounds" with the 99.73% confidence level in a normal Gaussian distribution).


Straight rounding errors are not at all "normally distributed" along a Gaussian curve.  Instead, the error is uniformly distributed over an interval whose width is equal to one unit of the least significant digit retained.  This entails a standard deviation of  1/Ö12 = 0.29  in terms of that unit.

In our previous example of a product of three rounded value, what we have to determine is the standard deviation of the following random variable:

( 2.9 + 0.1 X ) ( 3.5 + 0.1 Y ) ( 10.0 + 0.1 Z)

Where X, Y and Z are independent random variables, each uniformly distributed between  -½ and +½.  The average (mathematical expectation) of that random variable is  101.5  and its standard deviation is  1.3444711...  (HINT: this involves averaging the square of the above inside a cube of unit volume).

Thus, our product can be expressed with standardized precision as

101.50(134)   or   101.5(13)

The latter form is the more common one, since standardized precision is most often expressed with  2  significant digits  (3 digits is an overkill).

A close examination reveals that some authors round uncertainties  upward  systematically.  This practice comes, presumably, from a dubious concern of never claiming too much precision.  That misplaced modesty is unscientific.  Uncertainty should be treated like any other quantity and be quoted at its alloted (2-digit) level of precision.  It would be a mistake to give the above as  101.5(14).


(2007-08-15)   Engineering Notation
Stating a nonzero number as a multiple of a power of 1000.

Engineering notation is  superficially  similar to scientific notation,  except that the exponent of 10 is restricted to a multiple of 3  (thus, the relevant power of 10 is actually a power of 1000).  For this to be possible in all cases, the coefficient is allowed to go from 1 (included) to 1000 (excluded).

Because there may be trailing zeros  before  the decimal point in engineering notation, the number of significant digits is not always clear.  This is the main reason why the systematic use of the engineering notation is strongly discouraged in print, unless accuracy is stated with the above convention.

By extension, we also call  engineering notation  any system resembling scientific notation where the absolute magnitude of the coefficient is not restricted to the 1-10 range  (it could, occasionally, be more than 1000 or less than 1).  List of results spanning several orders of magnitude are sometimes more readable this way, since we can merely compare coefficients as the order of magnitude (a power of 10) remains constant.


(2007-08-07, 2021-07-05)   Inaccuracy lurking in the  quadratic formula
Alternative approaches for  robust  solutions to quadratic equations.

The  quadratic formula  was first published early in the ninth century AD  (c. 810)  by  Al Kwarizmi.  It's a mainstay of  middle-school  algebra,  giving the two roots of the polynomial  a x+ b x + c  when  b- 4ac  is positive:

Ö Vinculum
x   =     - b ±  b- 4ac
Vinculum
2 a

WLG,  we may consider only  monic polynomials  Up to a change of sign,  their coefficients are just the  symmetric functions  of the roots  m-u and m+u;  namely the  sum  2m  (twice the mean m)  and the  product  p:

[ x - (m+u) ] × [ x - (m-u) ]   =   x2 - 2m x + m2-u2   =   x2 - 2m x + p

This implies that  u2 = m2 - p.  If that quantity is positive,  the  square root function can be used to obtain a simplified version of the quadratic formula, currently  advocated  by  Po-Shen Loh  (coach for the US  IMO  team).

Ö Vinculum
x   =     m ±  m- p

There are two problems with all formulas of this type :

  • They're not  robust,  as a subtraction of nearly equal quantities may be required,  with an unacceptable loss of floating-point accuracy.
  • The square-root function isn't well-defined  (although the two-valued locution "±Ö"  is).  Especially for equations with complex coefficients.

To solve the first problem,  we may first compute whichever root is trouble-free because it doesn't involve the subtraction of nearly equal quantity  (thus choosing the "+" sign if m is positive and the "-" sign if it's negative).  The other root is then computed by dividing p into that root,  which doesn't cause any undue loss of floating-point accuracy  (a multiplication or a division never does).  Other similar techniques can be found in the following section, including the robust expression for a difference of square roots  (of which the above can be construed as a special case).

We run into the aforementioned second problem when trying to discuss precisely which root is trouble-free in the case of complex coefficients.

Let's consider the following form of quadratic equation which uses the  hyperbolic sine function  (sh).  Any  normalized quadratic equation with a negative constant term can be recast into that form:

x 2  +  2 a x sh q  -  a 2   =   0

Its two real solutions, are then given by the following  robust  formulas:

-a exp q     and     a exp -q

Using the reverse hyperbolic function  Argsh  to obtain  q,  if need be,  will never entail any loss of floating-point precision...

This transformation is also helpful to express with pretty formulas the solutions to some elementary problems in mathematical physics.

Numerically Stable Method for Solving Quadratic Equations  by  Berthold K.P. Horn  (MIT, 2005-03-07).


(2007-08-07)   Devising  robust  formulas...
How to avoid subtracting  nearly equal  quantities.

In what follows, the number  x  need not  be small, but it  may  well be...

In each of the examples below,  the floating-point computation on the left-hand side will lead to an unacceptable loss of precision when  x  is small.  The given substitute should be used,  which is mathematically identical but won't lead to potentially nonsensical results with floating-point arithmetic.

Square Root :

Ö(a+x) - Öa   =   x / [ Ö(a+x) + Öa ]

Exponential :

e a + x - e a   =   2 sh(x/2)  e a+x/2

Cosine :   [ Usage examples:  1 | 2 ]

1  -  cos x   =   2 sin2 (x/2)

Inverse of Hyperbolic Tangent (cf. relativistic rapidity) :

Argth (a+x) - Argth (a)   =   Argth ( x / [1-a(a+x)] )


(2019-02-10)   Underflow
0.0  represents a small result which is not  known  to be exactly zero.

As floating-point quantities always represent an  approximation,  there is no reason why identical floating-point numbers should denote the same number.

1.0 - 1.0   =   0.0

Calculator designers should resist the temptation to equate  0  and  0.0 :

00 = 1     and     0.00 = 1
( x0.0 is undefined  unless  x  is an  exact  positive integer.)

More generally,  one should distinguish floating-point and exact values.  Besides  explicit  rounding to exact values  (under the user's responsibility)  an operation on floating-point numbers never yields an exact value and should never be misrepresented as such.

1  +  0.   =   1.

border
border
visits since August 15, 2007
 (c) Copyright 2000-2021, Gerard P. Michon, Ph.D.