7A Case Study on Floating-Point Arithmetic Stability
This chapter is inspired by the example in Christoph Überhuber, “Computer-Numerik” Vol. 1, Springer 1995, Chap. 2.3.
7.1 Calculation of \(\pi\) According to Archimedes
A lower bound for \(2\pi\) – the circumference of the unit circle – is obtained by summing the side lengths of a regular \(n\)-gon inscribed in the unit circle. The figure on the left illustrates the iterative doubling of the number of vertices starting from a square with side length \[s_4=\sqrt{2}\].
Fig. 1
Fig. 2
The second figure shows the geometry of the vertex doubling.
With \(|\overline{AC}|= s_{n},\quad |\overline{AB}|= |\overline{BC}|= s_{2n},\quad |\overline{MN}| =a, \quad |\overline{NB}| =1-a,\) the Pythagorean theorem applied to triangles \(MNA\) and \(NBA\) respectively yields \[
a^2 + \left(\frac{s_{n}}{2}\right)^2 = 1\qquad\text{and } \qquad
(1-a)^2 + \left(\frac{s_{n}}{2}\right)^2 = s_{2n}^2
\] Elimination of \(a\) gives the recursion \[
s_{2n} = \sqrt{2-\sqrt{4-s_n^2}} \qquad\text{with initial value}\qquad
s_4=\sqrt{2}
\] for the length \(s_n\) of one side of the inscribed regular \(n\)-gon.
The sequence \((n\cdot s_n)\) converges monotonically from below to the limit \(2\pi\): \[
n\, s_n \rightarrow 2\pi % \qquad\text{and} \qquad {n c_n}\downarrow 2\pi
\] The relative error of approximating \(2\pi\) by an \(n\)-gon is: \[
\epsilon_n = \left| \frac{n\, s_n-2 \pi}{2\pi} \right|
\]
7.2 Two Iteration Formulas
The equation \[
s_{2n} = \sqrt{2-\sqrt{4-s_n^2}}\qquad \qquad \text{(Iteration A)}
\] is mathematically equivalent to: \[
s_{2n} = \frac{s_n}{\sqrt{2+\sqrt{4-s_n^2}}} \qquad \qquad \text{(Iteration B)}
\]
However, Iteration A is ill-conditioned and numerically unstable, as the following code demonstrates. The output shows the approximation for \(\pi\) for both formulae iteration.
usingPrintfiterationA(s) =sqrt(2-sqrt(4- s^2))iterationB(s) = s /sqrt(2+sqrt(4- s^2))s_B = s_A =sqrt(2) # initial valueϵ(x) =abs(x -2π)/2π # relative error ϵ_A =Float64[] # vectors for the plotϵ_B =Float64[]is =Float64[]@printf"""Approx. value of π n-gon Iteration A Iteration B"""for i in3:35push!(is, i) s_A =iterationA(s_A) s_B =iterationB(s_B) doublePi_A =2^i * s_A doublePi_B =2^i * s_Bpush!(ϵ_A, ϵ(doublePi_A))push!(ϵ_B, ϵ(doublePi_B))@printf"%14i %20.15f %20.15f\n"2^i doublePi_A/2 doublePi_B/2end
While Iteration B stabilizes to a value accurate within machine precision, Iteration A is unstable and diverges. A plot of the relative errors \(\epsilon_i\) confirms this:
Further iterations do not improve the result; it stabilizes at a relative error of approximately 2.5 machine epsilons.
ϵ_B[end]/eps(Float64)
2.5464790894703255
Iteration A is unstable; already at \(i=16\), the relative error begins to grow again.
The cause is a typical numerical cancellation effect. The side lengths \(s_n\) become very small very quickly. Thus \(a_n=\sqrt{4-s_n^2}\) is only slightly smaller than 2, and computing \(s_{2n}=\sqrt{2-a_n}\) leads to a catastrophic cancellation.
setprecision(80) # precision for BigFloats =sqrt(BigFloat(2))@printf" a = √(4-s^2) as BigFloat and as Float64\n\n"for i=3:44 s =iterationA(s) x =sqrt(4-s^2)if i >20@printf"%i %30.26f %20.16f \n" i x Float64(x)endend
This demonstrates the loss of significant digits. It also shows that using BigFloat with 80 bits of precission (mantissa length) only slightly delays the onset of the cancellation effect.
Countermeasures against unstable algorithms typically require better, stable algorithms, not more precise machine numbers.