Nick Craig-Wood's Articles

2016-12-24

Pi - Chudnovsky

2011-10-08 00:00

In Part 3 we managed to calculate 1,000,000 decimal places of π with Machin's arctan formula. Our stated aim was 100,000,000 places which we are going to achieve now!

We have still got a long way to go though, and we'll have to improve both our algorithm (formula) for π and our implementation.

The current darling of the π world is the Chudnovsky algorithm which is similar to the arctan formula but it converges much quicker. It is also rather complicated. The formula itself is derived from one by Ramanjuan who's work was extraordinary in the extreme. It isn't trivial to prove, so I won't! Here is Chudnovsky's formula for π as it is usually stated:

\[
\frac{1}{\pi} = 12 \sum^\infty_{k=0} \frac{(-1)^k (6k)! (13591409 + 545140134k)}{(3k)!(k!)^3 640320^{3k + 3/2}}
\]

That is quite a complicated formula, we will make more comprehensible in a moment. If you haven't seen the Σ notation before it just like a sum over a for loop in python. For example:

\[
\sum^{10}_{k=1} k^2
\]

Is exactly the same as this python fragment:

sum(k**2 for k in range(1,11))

First let's get rid of that funny fractional power:

\begin{eqnarray*}
\frac{1}{\pi} &=& 12 \sum^\infty_{k=0} \frac{(-1)^k (6k)! (13591409 + 545140134k)}{(3k)!(k!)^3 640320^{3k + 3/2}} \\
              &=& \frac{12}{640320 \sqrt{640320}} \sum^\infty_{k=0} \frac{(-1)^k (6k)! (13591409 + 545140134k)}{(3k)!(k!)^3 640320^{3k}} \\
              &=& \frac{1}{426880 \sqrt{10005}} \sum^\infty_{k=0} \frac{(-1)^k (6k)! (13591409 + 545140134k)}{(3k)!(k!)^3 640320^{3k}} \\
\end{eqnarray*}

Now let's split it into two independent parts. See how there is a + sign inside the Σ? We can split it into two series which we will call a and b and work out what π is in terms of a and b:

\begin{eqnarray*}
a     &=& \sum^\infty_{k=0} \frac{(-1)^k (6k)!}{(3k)!(k!)^3 640320^{3k}} \\
      &=& 1
          - \frac{6\cdot5\cdot4}{(1)^3 640320^3}
          + \frac{12\cdot11\cdot10\cdot9\cdot8\cdot7}{(2\cdot1)^3 640320^6}
          - \frac{18\cdot17\cdots13}{(3\cdot2\cdot1)^3 640320^{9}}
          + \cdots \\
b     &=& \sum^\infty_{k=0} \frac{(-1)^k (6k)!k}{(3k)!(k!)^3 640320^{3k}} \\
\frac{1}{\pi} &=& \frac{13591409a + 545140134b}{426880 \sqrt{10005}} \\
\pi           &=& \frac{426880 \sqrt{10005}}{13591409a + 545140134b} \\
\end{eqnarray*}

Finally note that we can calculate the next a term from the previous one, and the b terms from the a terms which simplifies the calculations rather a lot.

\begin{eqnarray*}
a_k &=& \frac{(-1)^k (6k)!}{(3k)!(k!)^3 640320^{3k}} \\
b_k &=& k \cdot a_k \\
\frac{a_{k}}{a_{k-1}} &=& - \frac{(6k-5)(6k-4)(6k-3)(6k-2)(6k-1)6k}{3k(3k-1)(3k-2)k^3 640320^3} \\
                      &=& - \frac{24(6k-5)(2k-1)(6k-1)}{k^3 640320^3} \\
\end{eqnarray*}

OK that was a lot of maths! But it is now in an easy to compute state, which we can do like this:

def pi_chudnovsky(one=1000000):
    """
    Calculate pi using Chudnovsky's series

    This calculates it in fixed point, using the value for one passed in
    """
    k = 1
    a_k = one
    a_sum = one
    b_sum = 0
    C = 640320
    C3_OVER_24 = C**3 // 24
    while 1:
        a_k *= -(6*k-5)*(2*k-1)*(6*k-1)
        a_k //= k*k*k*C3_OVER_24
        a_sum += a_k
        b_sum += k * a_k
        k += 1
        if a_k == 0:
            break
    total = 13591409*a_sum + 545140134*b_sum
    pi = (426880*sqrt(10005*one, one)*one) // total
    return pi

We need to be able to take the square root of long integers which python doesn't have a built in for. Luckily this is easy to provide. This uses a square root algorithm devised by Newton which doubles the number of significant places in the answer (quadratic convergence) each iteration:

 def sqrt(n, one):
     """
     Return the square root of n as a fixed point number with the one
     passed in.  It uses a second order Newton-Raphson convergence.  This
     doubles the number of significant figures on each iteration.
     """
     # Use floating point arithmetic to make an initial guess
     floating_point_precision = 10**16
     n_float = float((n * floating_point_precision) // one) / floating_point_precision
     x = (int(floating_point_precision * math.sqrt(n_float)) * one) // floating_point_precision
     n_one = n * one
     while 1:
         x_old = x
         x = (x + n_one // x) // 2
         if x == x_old:
             break
     return x

It uses normal floating point arithmetic to make an initial guess then refines it using Newton's method.

See pi_chudnovsky.py for the complete program. When I run it I get this:

Digits Time (seconds)
10 4.696e-05
100 0.0001530
1000 0.002027
10000 0.08685
100000 8.453
1000000 956.3

Which is nearly 3 times quicker than the best result in part 3, and gets us 1,000,000 places of π in about 15 minutes.

Amazingly there are still two major improvements we can make to this. The first is to recast the calculation using something called binary splitting. Binary splitting is a general purpose technique for speeding up this sort of calculation. What it does is convert the sum of the individual fractions into one giant fraction. This means that you only do one divide at the end of the calculation which speeds things up greatly, because division is slow compared to multiplication.

Consider the general infinite series

\[
S(0,\infty) = \frac{a_0p_0}{b_0q_0}
+ \frac{a_1p_0p_1}{b_1q_0q_1}
+ \frac{a_2p_0p_1p_2}{b_2q_0q_1q_2}
+ \frac{a_3p_0p_1p_2p_3}{b_3q_0q_1q_2q_3}
+ \cdots
\]

This could be used as a model for all the infinite series we've looked at so far. Now lets consider the partial sum of that series from terms a to b (including a, not including b).

\begin{eqnarray*}
S(a,b) &=& \frac{a_ap_a}{b_aq_a}
+ \frac{a_{a+1}p_ap_{a+1}}{b_{a+1}q_aq_{a+1}}
+ \frac{a_{a+2}p_ap_{a+1}p_{a+2}}{b_{a+2}q_aq_{a+1}q_{a+2}}
+ \cdots
+ \frac{a_{b-1}p_ap_{a+1}p_{a+2}\cdots p_{b-1}}{b_{b-1}q_aq_{a+1}\cdots p_{b-1}} \\
\end{eqnarray*}

This is a part of the infinite series, and if a=0 and b=∞ it becomes the infinite series above.

Now lets define some extra functions:

\begin{eqnarray*}
P(a,b) &=& p_ap_{a+1}\cdots p_{b-1} \\
Q(a,b) &=& q_aq_{a+1}\cdots q_{b-1} \\
B(a,b) &=& b_ab_{a+1}\cdots b_{b-1} \\
T(a,b) &=& B(a,b)Q(a,b)S(a,b) \\
\end{eqnarray*}

Let's define m which is a <= m <= b. Making m as near to the middle of a and b will lead to the quickest calculations, but for the proof, it has to be somewhere between them. Lets work out what happens to our variables P, Q, R and T when we split the series in two:

\begin{eqnarray*}
P(a,b) &=& P(a,m)P(b,m) \\
Q(a,b) &=& Q(a,m)Q(b,m) \\
B(a,b) &=& B(a,m)B(b,m) \\
T(a,b) &=& B(m,b)Q(m,b)T(a,m) + B(a,m)P(a,m)T(m,b) \\
\end{eqnarray*}

The first three of those statements are obvious from the definitions, but the last deserves proof.

\begin{eqnarray*}
\lefteqn{B(m,b)Q(m,b)T(a,m) + B(a,m)P(a,m)T(m,b) = } \\
& & \left[T(a,b) = B(a,b)Q(a,b)S(a,b)\right] \\
&=& B(m,b)Q(m,b)B(a,m)Q(a,m)S(a,m) + B(a,m)P(a,m)B(m,b)Q(m,b)S(m,b) \\
& & \left[Q(a,b) = Q(a,m)Q(b,m)\right] \\
& & \left[B(a,b) = B(a,m)B(b,m)\right] \\
&=& B(a,b)Q(a,b)S(a,m) + B(a,b)P(a,m)Q(m,b)S(m,b) \\
&=& B(a,b)(Q(a,b)S(a,m) + P(a,m)Q(m,b)S(m,b)) \\
& & \left[Q(m,b) = \frac{Q(a,b)}{Q(a,m)}\right] \\
&=& B(a,b)\left(Q(a,b)S(a,m) + \frac{Q(a,b)P(a,m)}{Q(a,m)}S(m,b)\right) \\
&=& B(a,b)Q(a,b)\left(S(a,m) + \frac{P(a,m)}{Q(a,m)}S(m,b)\right) \\
&=& B(a,b)Q(a,b)\left(
\frac{a_ap_a}{b_aq_a}
+ \frac{a_{a+1}p_ap_{a+1}}{b_{a+1}q_aq_{a+1}}
+ \frac{a_{a+2}p_ap_{a+1}p_{a+2}}{b_{a+2}q_aq_{a+1}q_{a+2}}
+ \cdots
+ \frac{a_{m-1}p_ap_{a+1}p_{a+2}\cdots p_{m-1}}{b_{m-1}q_aq_{a+1}\cdots p_{m-1}}
+ \frac{p_ap_{a+1}\cdots p_{m-1}}{q_aq_{a+1}\cdots q_{m-1}}
+ \left(\frac{a_mp_m}{b_mq_m}
+ \frac{a_{m+1}p_mp_{m+1}}{b_{m+1}q_mq_{m+1}}
+ \frac{a_{m+2}p_mp_{m+1}p_{m+2}}{b_{m+2}q_mq_{m+1}q_{m+2}}
+ \cdots
+ \frac{a_{b-1}p_mp_{m+1}p_{m+2}\cdots p_{b-1}}{b_{b-1}q_mq_{m+1}\cdots p_{b-1}}
\right)\right) \\
&=& B(a,b)Q(a,b)\left(
\frac{a_ap_a}{b_aq_a}
+ \frac{a_{a+1}p_ap_{a+1}}{b_{a+1}q_aq_{a+1}}
+ \frac{a_{a+2}p_ap_{a+1}p_{a+2}}{b_{a+2}q_aq_{a+1}q_{a+2}}
+ \cdots
+ \frac{a_{m-1}p_ap_{a+1}p_{a+2}\cdots p_{m-1}}{b_{m-1}q_aq_{a+1}\cdots p_{m-1}}
+ \frac{p_ap_{a+1}\cdots p_{m-1}}{q_aq_{a+1}\cdots q_{m-1}}
+ \frac{a_mp_ap_{a+1}\cdots p_m}{b_mq_aq_{a+1}\cdots q_m}
+ \frac{a_{m+1}p_ap_{a+1}\cdots p_{m+1}}{b_{m+1}q_aq_{a+1}\cdots q_{m+1}}
+ \frac{a_{m+2}p_ap_{a+1}\cdots p_{m+2}}{b_{m+2}q_aq_{a+1}\cdots q_{m+2}}
+ \cdots
+ \frac{a_{b-1}p_ap_{a+1}\cdots p_{b-1}}{b_{b-1}q_aq_{a+1}\cdots p_{b-1}}
\right) \\
&=& B(a,b)Q(a,b)S(a,b) \\
&=& T(a,b) \\
\end{eqnarray*}

We can use these relations to expand the series recursively, so if we want S(0,8) then we can work out S(0,4) and S(4,8) and combine them. Likewise to calculate S(0,4) and S(4,8) we work out S(0,2), S(2,4), S(4,6), S(6,8) and combine them, and to work out those we work out S(0,1), S(1,2), S(2,3), S(3,4), S(4,5), S(5,6), S(6,7), S(7,8). Luckily we don't have to split them down any more as we know what P(a,a+1), Q(a,a+1) etc is from the definition above.

\begin{eqnarray*}
P(a,a+1) &=& p_a \\
Q(a,a+1) &=& q_a \\
B(a,a+1) &=& b_a \\
S(a,a+1) &=& \frac{a_ap_a}{b_aq_a} \\
T(a,a+1) &=& B(a,a+1)Q(a,a+1)S(a,a+1) \\
         &=& b_aq_a\frac{a_ap_a}{b_aq_a} \\
         &=& a_ap_a \\
\end{eqnarray*}

And when you've finally worked out P(0,n),Q(0,n),B(0,n),T(0,n) you can work out S(0,n) with

\begin{eqnarray*}
S(0,n) &=& \frac{T(0,n)}{B(0,n)Q(0,n)} \\
\end{eqnarray*}

If you want a more detailed and precise treatment of binary splitting then see Bruno Haible and Thomas Papanikolaou's paper.

So back to Chudnovksy's series. We can now set these parameters in the above general formula:

\begin{eqnarray*}
p_0 &=& 1 \\
p_a &=& (6a-5)(2a-1)(6a-1) \\
q_0 &=& 1 \\
q_a &=& a^3\cdot640320^3/24 \\
b_a &=& 1 \\
a_a &=& (13591409 + 545140134a) \\
\end{eqnarray*}

This then makes our Chudnovksy pi function look like this:

def pi_chudnovsky_bs(digits):
    """
    Compute int(pi * 10**digits)

    This is done using Chudnovsky's series with binary splitting
    """
    C = 640320
    C3_OVER_24 = C**3 // 24
    def bs(a, b):
        """
        Computes the terms for binary splitting the Chudnovsky infinite series

        a(a) = +/- (13591409 + 545140134*a)
        p(a) = (6*a-5)*(2*a-1)*(6*a-1)
        b(a) = 1
        q(a) = a*a*a*C3_OVER_24

        returns P(a,b), Q(a,b) and T(a,b)
        """
        if b - a == 1:
            # Directly compute P(a,a+1), Q(a,a+1) and T(a,a+1)
            if a == 0:
                Pab = Qab = 1
            else:
                Pab = (6*a-5)*(2*a-1)*(6*a-1)
                Qab = a*a*a*C3_OVER_24
            Tab = Pab * (13591409 + 545140134*a) # a(a) * p(a)
            if a & 1:
                Tab = -Tab
        else:
            # Recursively compute P(a,b), Q(a,b) and T(a,b)
            # m is the midpoint of a and b
            m = (a + b) // 2
            # Recursively calculate P(a,m), Q(a,m) and T(a,m)
            Pam, Qam, Tam = bs(a, m)
            # Recursively calculate P(m,b), Q(m,b) and T(m,b)
            Pmb, Qmb, Tmb = bs(m, b)
            # Now combine
            Pab = Pam * Pmb
            Qab = Qam * Qmb
            Tab = Qmb * Tam + Pam * Tmb
        return Pab, Qab, Tab
    # how many terms to compute
    DIGITS_PER_TERM = math.log10(C3_OVER_24/6/2/6)
    N = int(digits/DIGITS_PER_TERM + 1)
    # Calclate P(0,N) and Q(0,N)
    P, Q, T = bs(0, N)
    one = 10**digits
    sqrtC = sqrt(10005*one, one)
    return (Q*426880*sqrtC) // T

Hopefully you'll see how the maths above relates to this as I've used the same notation in each. Note that we calculate the number of digits of pi we expect per term of the series (about 14.18) to work out how many terms of the series we need to compute, as the binary splitting algorithm needs to know in advance how many terms to calculate. We also don't bother calculating B(a) as it is always 1. Defining a function inside a function like this makes what is known as a closure. This means that the inner function can access the variables in the outer function which is very convenient in this recursive algorithm as it stops us having to pass the constants to every call of the function.

See pi_chudnovsky_bs.py for the complete program. When I run it I get this:

Digits Time (seconds)
10 1.096e-05
100 3.194e-05
1000 0.0004899
10000 0.03403
100000 3.625
1000000 419.1

This is a bit more than twice as fast as pi_chudnovsky.py giving us our 1,000,000 places in just under 7 minutes. If you profile it you'll discover that almost all the time spent in the square root calculations (86% of the time) whereas only 56 seconds is spent in the binary splitting part. We could spend time improving the square root algorithm, but it is time to bring out the big guns: gmpy.

Gmpy is a python interface to the gmp library which is a C library for arbitrary precision arithmetic. It is very fast, much faster than the built in int type in python for large numbers. Luckily gmpy provides a type (mpz) which works exactly like normal python int types, so we have to make hardly any changes to our code to use it. These are using initialising variables with the mpz type, and using the sqrt method on mpz rather than our own home made sqrt algorithm:

import math
from gmpy2 import mpz
from time import time

def pi_chudnovsky_bs(digits):
    """
    Compute int(pi * 10**digits)

    This is done using Chudnovsky's series with binary splitting
    """
    C = 640320
    C3_OVER_24 = C**3 // 24
    def bs(a, b):
        """
        Computes the terms for binary splitting the Chudnovsky infinite series

        a(a) = +/- (13591409 + 545140134*a)
        p(a) = (6*a-5)*(2*a-1)*(6*a-1)
        b(a) = 1
        q(a) = a*a*a*C3_OVER_24

        returns P(a,b), Q(a,b) and T(a,b)
        """
        if b - a == 1:
            # Directly compute P(a,a+1), Q(a,a+1) and T(a,a+1)
            if a == 0:
                Pab = Qab = mpz(1)
            else:
                Pab = mpz((6*a-5)*(2*a-1)*(6*a-1))
                Qab = mpz(a*a*a*C3_OVER_24)
            Tab = Pab * (13591409 + 545140134*a) # a(a) * p(a)
            if a & 1:
                Tab = -Tab
        else:
            # Recursively compute P(a,b), Q(a,b) and T(a,b)
            # m is the midpoint of a and b
            m = (a + b) // 2
            # Recursively calculate P(a,m), Q(a,m) and T(a,m)
            Pam, Qam, Tam = bs(a, m)
            # Recursively calculate P(m,b), Q(m,b) and T(m,b)
            Pmb, Qmb, Tmb = bs(m, b)
            # Now combine
            Pab = Pam * Pmb
            Qab = Qam * Qmb
            Tab = Qmb * Tam + Pam * Tmb
        return Pab, Qab, Tab
    # how many terms to compute
    DIGITS_PER_TERM = math.log10(C3_OVER_24/6/2/6)
    N = int(digits/DIGITS_PER_TERM + 1)
    # Calclate P(0,N) and Q(0,N)
    P, Q, T = bs(0, N)
    one_squared = mpz(10)**(2*digits)
    sqrtC = (10005*one_squared).sqrt()
    return (Q*426880*sqrtC) // T

See pi_chudnovsky_bs_gmpy.py for the complete program. When I run it I get this:

Digits Time (seconds)
10 1.597e-05
100 3.409e-05
1000 0.003403
10000 0.003571
100000 0.09120
1000000 1.760
10000000 30.11
100000000 542.2

So we have achieved our goal of calculating 100,000,000 places of π in just under 10 minutes! What is limiting the program now is memory... 100,000,000 places takes about 600MB of memory to run. With 6 GB of free memory it could probably calculate one billion places in a few hours.

What if we wanted to go faster? Well you could use gmp-chudnovsky.c which is a C program which implements the Chudnovsky Algorithm. It is heavily optimised and rather difficult to understand, but if you untangle it you'll see it does exactly the same as above, with one extra twist. The twist is that it factors the fraction in the binary splitting phase as it goes along. If you examine the gmp-chudnovsky.c results page you'll see that the little python program above acquits itself very well - the python program only takes 75% longer than the optimised C program on the same hardware.

One algorithm which has the potential to beat Chudnovsky is the Arithmetic Geometric Mean algorithm which doubles the number of decimal places each iteration. It involves square roots and full precision divisions which makes it tricky to implement well. In theory it should be faster than Chudnovsk but, so far, in practice Chudnovsky is faster.

Here is a chart of all the different π programs we've developed in Part 1, Part 2, Part 3 and Part 4 with their timings on my 2010 Intel® Core™ i7 CPU Q820 @ 1.73GHz running 64 bit Linux:

Chart of timings for calculating π with various methods
Categories: maths, pymath, python | View Comments
Read and Post Comments

Pi - Machin

2011-07-19 00:00

In Part 2 we managed to calculate 100 decimal places of π with Archimedes' recurrence relation for π which was a fine result, but we can do bigger and faster.

To go faster we'll have to use a two pronged approach - a better algorithm for π, and a better implementation.

In 1706 John Machin came up with the first really fast method of calculating π and used it to calculate π to 100 decimal places. Quite an achievement considering he did that entirely with pencil and paper. Machin's formula is:

π/4 = 4cot⁻¹(5) - cot⁻¹(239)
    = 4tan⁻¹(1/5) - tan⁻¹(1/239)
    = 4 * arctan(1/5) - arctan(1/239)

We can prove Machin's formula using some relatively easy math! Hopefully you learnt this trigonometric formula at school:

\[
tan(a + b) = \frac{tan(a) + tan(b)}{1 - tan(a) \cdot tan(b)}
\]

Substitute a = arctan(x) and b = arctan(y) into it, giving:

\[
tan(arctan(x) + arctan(y)) = \frac{tan(arctan(x)) + tan(arctan(y)}{1 - tan(arctan(x)) \cdot tan(arctan(y))}
\]

Note that tan(arctan(x)) = x which gives:

\[
tan(arctan(x) + arctan(y)) = \frac{x + y}{1 - xy}
\]

arctan both sides to get:

\[
arctan(x) + arctan(y) = arctan\left(\frac{x + y}{1 - xy}\right)
\]

This gives us a method for adding two arctans. Using this we can prove Machin's formula with a bit of help from python's fractions module:

>>> from fractions import Fraction
>>> def add_arctan(x, y): return (x + y) / (1 - x * y)
...
>>> a = add_arctan(Fraction(1,5), Fraction(1,5))
>>> a # 2*arctan(1/5)
Fraction(5, 12)
>>> b = add_arctan(a, a)
>>> b # 4*arctan(1/5)
Fraction(120, 119)
>>> c = add_arctan(b, Fraction(-1,239))
>>> c # 4*arctan(1/5) - arctan(1/239)
Fraction(1, 1)
>>>

So Machin's formula is equal to arctan(1/1) = π/4 QED! Or if you prefer it written out in mathematical notation:

\begin{eqnarray*}
4arctan\left(\frac{1}{5}\right) - arctan\left(\frac{1}{239}\right)
  &=& 2\left(arctan\left(\frac{1}{5}\right) + arctan\left(\frac{1}{5}\right)\right) - arctan\left(\frac{1}{239}\right) \\
  &=& 2arctan\left(\frac{5}{12}\right) - arctan\left(\frac{1}{239}\right) \\
  &=& arctan\left(\frac{5}{12}\right) + arctan\left(\frac{5}{12}\right) - arctan\left(\frac{1}{239}\right) \\
  &=& arctan\left(\frac{120}{119}\right) - arctan\left(\frac{1}{239}\right) \\
  &=& arctan\left(\frac{1}{1}\right) \\
  &=& \frac{\pi}{4} \\
\end{eqnarray*}

So how do we use Machin's formula to calculate π? Well first we need to calculate arctan(x) which sounds complicated, but we've seen the infinite series for it already in Part 1 :

\[
arctan(x) = x - \frac{x^3}{3} + \frac{x^5}{5} - \frac{x^7}{7} + \frac{x^9}{9} - \cdots ( -1 <= x <= 1 )
\]

We can see that if x=1/5 then the terms get smaller very quickly, which means that the series will converge quickly. Let's substitute x = 1/x and get:

\[
arctan(1/x) = \frac{1}{x} - \frac{1}{3x^3} + \frac{1}{5x^5} - \frac{1}{7x^7} + \frac{1}{9x^9} - \cdots ( x >= 1 )
\]

Each term in the series is created by dividing the previous term by a small integer. Dividing two 100 digit numbers is hard work as I'm sure you can imagine from your experiences with long division at school! However it is much easier to divide a short number (a single digit) into a 100 digit number. We called this short division at school, and that is the key to making a much faster π algorithm. In fact if you are dividing an N digit number by an M digit number it takes rougly N*M operations. That square root we did in Part 2 did dozens of divisions of 200 digit numbers. So if we could somehow represent the current term in a int [1] then we could use this speedy division to greatly speed up the calculation.

The way we do that is to multiply everything by a large number, lets say 10 100. We then do all our calculations with integers, knowing that we should shift the decimal place 100 places to the left when done to get the answer. This needs a little bit of care, but is a well known technique known as fixed point arithmetic.

The definition of the arctan(1/x) function then looks like this:

def arctan(x, one=1000000):
    """
    Calculate arctan(1/x)

    arctan(1/x) = 1/x - 1/3*x**3 + 1/5*x**5 - ... (x >= 1)

    This calculates it in fixed point, using the value for one passed in
    """
    power = one // x            # the +/- 1/x**n part of the term
    total = power               # the total so far
    x_squared = x * x           # precalculate x**2
    divisor = 1                 # the 1,3,5,7 part of the divisor
    while 1:
        power = - power // x_squared
        divisor += 2
        delta = power // divisor
        if delta == 0:
            break
        total += delta
    return total

The value one passed in is the multiplication factor as described above. You can think of it as representing 1 in the fixed point arithmetic world. Note the use of the // operator which does integer divisions. If you don't use this then python will make floating point values [2]. In the loop there are two divisions, power // x_squared and power // divisor. Both of these will be dividing by small numbers, x_squared will be 5² = 25 or 239² = 57121 and divisor will be 2 * the number of iterations which again will be small.

So how does this work in practice? On my machine it calculates 100 digits of π in 0.18 ms which is over 30,000 times faster than the previous calculation with the decimal module in Part 2!

Can we do better?

Well the answer is yes! Firstly there are better arctan formulae. Amazingly there are other formulae which will calculate π too, like these, which are named after their inventors:

def pi_machin(one):
    return 4*(4*arctan(5, one) - arctan(239, one))

def pi_ferguson(one):
    return 4*(3*arctan(4, one) + arctan(20, one) + arctan(1985, one))

def pi_hutton(one):
    return 4*(2*arctan(3, one) + arctan(7, one))

def pi_gauss(one):
    return 4*(12*arctan(18, one) + 8*arctan(57, one) - 5*arctan(239, one))

It turns out that Machin's formula is really very good, but Gauss's formula is slightly better.

The other way we can do better is to use a better formula for arctan(). Euler came up with this accelerated formula for arctan:

\[
arctan(1/x) = \frac{x}{1+x^2}
            + \frac{2}{3}\frac{x}{(1+x^2)^2}
            + \frac{2\cdot4}{3\cdot5}\frac{x}{(1+x^2)^3}
            + \frac{2\cdot4\cdot6}{3\cdot5\cdot7}\frac{x}{(1+x^2)^4}
            + \cdots
\]

This converges to arctan(1/x) at the roughly the same rate per term than the formula above, however each term is made directly from the previous term by multiplying by 2n and dividing by (2n+1)(1+x²). This means that it can be implemented with only one (instead of two) divisions per term, and hence runs roughly twice as quickly. If we implement this in python in fixed point, then it looks like this:

def arctan_euler(x, one=1000000):
    """
    Calculate arctan(1/x) using euler's accelerated formula

    This calculates it in fixed point, using the value for one passed in
    """
    x_squared = x * x
    x_squared_plus_1 = x_squared + 1
    term = (x * one) // x_squared_plus_1
    total = term
    two_n = 2
    while 1:
        divisor = (two_n+1) * x_squared_plus_1
        term *= two_n
        term = term // divisor
        if term == 0:
            break
        total += term
        two_n += 2
    return total

Notice how we pre-calculate as many things as possible (like x_squared_plus_1) to get them out of the loop.

See the complete program: pi_artcan_integer.py. Here are some timings [3] of how the Machin and the Gauss arctan formula fared, with and without the accelerated arctan.

Formula Digits Normal Arctan (seconds) Euler Arctan (seconds)
Machin 10 3.004e-05 2.098e-05
Gauss 10 2.098e-05 1.907e-05
Machin 100 0.0001869 0.0001471
Gauss 100 0.0001869 0.0001468
Machin 1000 0.005599 0.003520
Gauss 1000 0.005398 0.003445
Machin 10000 0.3995 0.2252
Gauss 10000 0.3908 0.2186
Machin 100000 38.33 21.69
Gauss 100000 37.44 21.12
Machin 1000000 3954.1 2396.8
Gauss 1000000 3879.4 2402.1

There are several interesting things to note. Firstly that we just calculated 1,000,000 decimal places of π in 40 minutes! Secondly that to calculate 10 times as many digits it takes 100 times as long. This means that our algorithm scales as O(n²) where n is the number of digits. So if 1,000,000 digits takes an hour, 10,000,000 would take 100 hours (4 days) and 100,000,000 would take 400 days (13 months). Your computer might run out of memory, or explode in some other fashion when calculating 100,000,000 digits, but doesn't seem impossible any more.

The Gauss formula is slightly faster than the Machin for nearly all the results. The accelerated arctan formula is 1.6 times faster than the normal arctan formula so that is a definite win.

A billion (1,000,000,000) digits of pi would take about 76 years to calculate using this program which is a bit out of our reach! There are several ways we could improve this. If we could find an algorithm for calculating π which was quicker than O(n²), or if we could find an O(n²) algorithm which just ran a lot faster. We could also throw more CPUs at the problem. To see how this might be done, we could calculate all the odd terms on one processor, and all the even terms on another processor for the arctan formula and add them together at the end.. That would run just about twice as quickly. That could easily be generalised to lots of processors:

arctan(1/x) = 1/x - 1/3x³ + 1/5x⁵ - 1/7x⁷ + ...
            = 1/x         + 1/5x⁵ -       + ... # run on cpu 1
            +     - 1/3x³         - 1/7x⁷ + ... # run on cpu 2

So if we had 76 CPU cores, we could probably calculate π to 1,000,000,000 places in about a year.

There are better algorithms for calculating π though, and there are faster ways of doing arithmetic than python's built in long integers. We'll explore both in Part 4!

[1]Python's integer type was called called long in python 2, just int in python 3
[2]Unless you are using python 2
[3]All timings generated on my 2010 Intel® Core™ i7 CPU Q820 @ 1.73GHz running 64 bit Linux
Categories: maths, pymath, python | View Comments
Read and Post Comments

Pi - Archimedes

2011-07-18 00:00

It is clear from Part 1 that in order to calculate π we are going to need a better method than evaluating Gregory's Series.

Here is one which was originally discovered by Archimedes in 200BC. Most of the proofs for series for calculating π require calculus or other advanced math, but since Archimedes didn't know any of those you can prove it without! In fact I thought I had discovered it for the first time when I was 13 or so, but alas I was 2000 years too late!

Archimedes knew that the ratio of the circumference of a circle to its diameter was π. He had the idea that he could draw a regular polygon inscribed within the circle to approximate π, and the more sides he drew on the polygon, the better approximation to π he would get. In modern maths speak, we would say as the number of sides of the polygon tends to infinity, the circumference tends to 2π.

Instead of drawing the polygon though, Archimedes calculated the length using a geometric argument, like this.

Diagram for proof of Archimedes π formula

Draw a circle of radius 1 unit centered at A. Inscribe an N sided polygon within it. Our estimate for π is half the circumference of the polygon (circumference of a circle is 2πr, r = 1, giving 2π). As the sides of the polygon get smaller and smaller the circumference gets closer and closer to 2π.

The diagram shows one segment of the polygon ACE. The side of the polygon CE has length d n. Assuming we know d for an N sided polygon, If we can find an expression for the length CD = d 2n, the edge length of a polygon with 2N sides, in terms of d n only, then we can improve our estimate for π. Let's try to do that.

We bisect the triangle CAE to make CAD and DAE the two new equal segments of the 2N sided polygon.

Using Pythagoras's theorem on the right angled triangle ABC, we can see:

\begin{eqnarray*}
AB^2 + BC^2 &=& 1^2 = 1 \\
         AB &=& \sqrt{1 - BC^2} \\
\end{eqnarray*}

Given that

\begin{equation*}
   BC = \frac{d_n}{2}
\end{equation*}

Substituting gives

\begin{eqnarray*}
   AB &=& \sqrt{1-\left(\frac{d_n}{2}\right)^2} \\
   BD &=& 1 - AB \\
      &=& 1 - \sqrt{1-\frac{d_n^2}{4}} \\
\end{eqnarray*}

Using Pythagoras's theorem on the right angled triangle CBD

\begin{eqnarray*}
CD^2 &=& BC^2 + BD^2 \\
    &=& \left(\frac{d_n}{2}\right)^2 + \left(1 - \sqrt{1-\left(\frac{d_n}{2}\right)^2}\right)^2 \\
    &=& \frac{d_n^2}{4} + \left(1 - 2\sqrt{1-\frac{d_n^2}{4}} + \left(1 - \frac{d_n^2}{4}\right)\right) \\
    &=& 2 - 2\sqrt{1-\frac{d_n^2}{4}} \\
CD = d_{2n} &=& \sqrt{2 - 2\sqrt{1-\frac{d_n^2}{4}}} \\
\end{eqnarray*}

d 2n is the length of one side of the polygon with 2N sides.

This means that if we know the length of the sides of a polygon with N sides, then we can calculate the length of the sides of a polygon with 2N sides.

What does this mean? Lets start with a square. Inscribing a square in a circle looks like this, with the side length √2. This gives an estimate for π as 2√2, which is poor (at 2.828) but it is only the start of the process.

Square inscribed in a circle

We can the calculate the side length of an octagon, from the side length of a square, and the side length of a 16-gon from an octagon etc.

Polygons inscribed in a circle

It's almost time to break out python, but before we do so, we'll just note that to calculate d 2n from d n, the first thing you do is calculate d² and at the very end you take the square root. This kind of suggests that it would be better to keep track of d² rather than d, which we do in the program below (pi_archimedes.py):

import math

def pi_archimedes(n):
    """
    Calculate n iterations of Archimedes PI recurrence relation
    """
    polygon_edge_length_squared = 2.0
    polygon_sides = 4
    for i in range(n):
        polygon_edge_length_squared = 2 - 2 * math.sqrt(1 - polygon_edge_length_squared / 4)
        polygon_sides *= 2
    return polygon_sides * math.sqrt(polygon_edge_length_squared) / 2

def main():
    """
    Try the series
    """
    for n in range(16):
        result = pi_archimedes(n)
        error = result - math.pi
        print("%8d iterations %.10f error %.10f" % (n, result, error))

if __name__ == "__main__":
    main()

If you run this, then it produces:

Iterations Sides Result Error
0 4 2.8284271247 -0.3131655288
1 8 3.0614674589 -0.0801251947
2 16 3.1214451523 -0.0201475013
3 32 3.1365484905 -0.0050441630
4 64 3.1403311570 -0.0012614966
5 128 3.1412772509 -0.0003154027
6 256 3.1415138011 -0.0000788524
7 512 3.1415729404 -0.0000197132
8 1024 3.1415877253 -0.0000049283
9 2048 3.1415914215 -0.0000012321
10 4096 3.1415923456 -0.0000003080
11 8192 3.1415925765 -0.0000000770
12 16384 3.1415926335 -0.0000000201
13 32768 3.1415926548 0.0000000012
14 65536 3.1415926453 -0.0000000083
15 131072 3.1415926074 -0.0000000462

Hooray! We calculated π to 8 decimal places in only 13 iterations. Iteration 0 for the square shows up the expected 2.828 estimate for π. You can see after iteration 13 that the estimate of π starts getting worse. That is because we only calculated all our calculations to the limit of precision of python's floating point numbers (about 17 digits), and all those errors start adding up.

We can easily calculate more digits of π using Pythons excellent decimal module. This allows you to do arbitrary precision arithmetic on numbers. It isn't particularly quick, but it is built in and easy to use.

Let's calculate π to 100 decimal places now. That sounds like a significant milestone! (pi_archimedes_decimal.py):

from decimal import Decimal, getcontext

def pi_archimedes(n):
    """
    Calculate n iterations of Archimedes PI recurrence relation
    """
    polygon_edge_length_squared = Decimal(2)
    polygon_sides = 2
    for i in range(n):
        polygon_edge_length_squared = 2 - 2 * (1 - polygon_edge_length_squared / 4).sqrt()
        polygon_sides *= 2
    return polygon_sides * polygon_edge_length_squared.sqrt()

def main():
    """
    Try the series
    """
    places = 100
    old_result = None
    for n in range(10*places):
        # Do calculations with double precision
        getcontext().prec = 2*places
        result = pi_archimedes(n)
        # Print the result with single precision
        getcontext().prec = places
        result = +result           # do the rounding on result
        print("%3d: %s" % (n, result))
        if result == old_result:
            break
        old_result = result

You'll see if you look at the pi_archimedes function that not a lot has changed. Instead of using the math.sqrt function we use the sqrt method of the Decimal object. The edge gets initialised to Decimal(2) rather than 2.0 but otherwise the methods are the same. The main method has changed a bit. You'll see that we set the precision of the decimal calculations using the getcontext().prec = ... call. This sets the precision for all following calculations. There are other ways to do this which you can see in the decimal module docs. We do the Archimedes calculations with 200 decimal places precision, then print the result out with 100 decimal places precision by changing the precision and using the result = +result trick. When the result stops changing we end, because that must be π!

If you run this, you'll find at iteration 168 it produces 100 accurately rounded decimal places of π. So far so good!

There are two downsides to this function though. One is that the decimal arithmetic is quite slow. On my computer it takes about 6 seconds to calculate 100 digits of π which sounds fast, but if you were to try for 1,000,000 places you would be waiting a very very long time! The other problem is the square root. Square roots are expensive operations, they take lots of multiplications and divisions and we need to do away with that to go faster.

In Part 3 we'll be getting back to the infinite series for arctan and on to much larger numbers of digits of π.

Categories: maths, pymath, python | View Comments
Read and Post Comments

Pi - Gregory's Series

2011-07-17 00:00

Lets calculate π (or Pi if you prefer)! π is an irrational number (amongst other things) which means that it isn't one whole number divided by another whole number. In fact the digits of π are extremely random - if you didn't know they were the digits of π they would be perfectly random.

π has been calculated to billions of decimal places, but in this series of articles, we're going to aim for the more modest target of 100,000,000 decimal places calculated by a python program in a reasonable length of time. We are going to try some different formulae and algorithms following a roughly historical path.

You were probably taught at school that 22/7 is a reasonable approximation to π. Maybe you were even taught that 355/113 was a better one. You were unlikely to have been taught how π is actually calculated. You could draw a circle and measure the circumference, but that is very hard to do accurately.

Luckily maths comes to the rescue. There are 100s of formulae for calculating π, but the simplest one I know of is this:

\begin{equation*}
\frac{\pi}{4} = 1 - \frac{1}{3} + \frac{1}{5} - \frac{1}{7} + \frac{1}{9} - \frac{1}{11} + \frac{1}{13} - \frac{1}{15} + \frac{1}{17} - \cdots
\end{equation*}

This is known as Gregory's Series (or sometimes the Leibnitz formula) for π. That ... stands for keep going forever. To make the next term in the series, alternate the sign, and add 2 to the denominator of the fraction, so + 1/19 is next then - 1/21 etc.

To see why this is true, we need to look at the Leibnitz formula for arctan (or tan⁻¹). This is:

\begin{equation*}
arctan(x) = x - \frac{x^3}{3} + \frac{x^5}{5} - \frac{x^7}{7} + \frac{x^9}{9} - \cdots ( -1 <= x <= 1 )
\end{equation*}

Note that the result of arctan(x) is in radians. I'm not going to prove that formula here as it involves calculus, but you can see a proof of the arctan formula if you want.

arctan is the inverse of tan, so arctan(tan(x)) = x. tan(x) is the familiar (hopefully) ratio of the opposite and adjacent sides of a right angled triangle and how that relates to the angle.

If we set the angle to 45°, which is π/4 radians, then:

   tan(45°) = 1
⇒ tan(π/4) = 1
⇒ arctan(1) = π/4

which when substituted in the above becomes:

\begin{equation*}
\frac{\pi}{4} = 1 - \frac{1}{3} + \frac{1}{5} - \frac{1}{7} + \frac{1}{9} - \cdots
\end{equation*}

This may be the simplest infinite series formula for π but it has the disadvantage that you'll need to add up a great number of terms to get an acceptably accurate answer.

Here is a bit of python to calculate it (pi_gregory.py):

import math

def pi_gregory(n):
    """
    Calculate n iterations of Gregory's series
    """
    result = 0
    divisor = 1
    sign = 1
    for i in range(n):
        result += sign / divisor
        divisor += 2
        sign = -sign
    return 4 * result

def main():
    """
    Try Gregory's series
    """
    for log_n in range(1,8):
        n = 10**log_n
        result = pi_gregory(n)
        error = result - math.pi
        print("%8d iterations %.10f error %.10f" % (n, result, error))

if __name__ == "__main__":
    main()

All the important stuff happens in pi_gregory and hopefully you'll see how that relates to the infinite series for arctan. Note the trick of keeping the sign of the term in a variable as 1 or -1 and then multiplying by this. This is a common trick, easier to write than an if statement and probably faster.

The program above prints:

Iterations Result Error
10 3.0418396189 -0.0997530347
100 3.1315929036 -0.0099997500
1000 3.1405926538 -0.0009999997
10000 3.1414926536 -0.0001000000
100000 3.1415826536 -0.0000100000
1000000 3.1415916536 -0.0000010000
10000000 3.1415925536 -0.0000001000

That looks like our old friend π doesn't it! However you can see that it takes 10 times as many iterations to add another decimal place of accuracy to the result. That isn't going to get us even 10 decimal places in a reasonable length of time... We'll have to get a bit cleverer. (If the program printed out something very different then you need to run it with python 3 not python 2 see the introduction.) You can also see something very odd in the error terms - the result is correct to 10 decimal places, all except for one digit. This is a known oddity and you can read up about it The Leibniz formula for pi .

Those of you paying attention will have noted that we used math.pi from the python standard library in the above program to calculate the difference of the result from π. So python already knows the value of π! However that is only a double precison value (17 digits or so) and we are aiming for much more. We are going to leave the world of double precision floating point behind, and calculate a lot more digits of π, much quicker, in the next exciting episode (Part 2)!

Categories: maths, pymath, python | View Comments
Read and Post Comments

This is going to be a series of articles which are all about having fun with maths and Python. I'm going to show you a bit of maths and then how to apply it with a python program, then work out how to improve it.

I decided to write these after looking in my doodles folder on my computer and finding 100s of programs to calculate this bit of maths, or explore that bit of maths, and I thought it would be interesting for me to share them. Hopefully you'll find it interesting too!

I'm going to use python 3 for all python code. This enables me to future proof these articles and encourage everyone who hasn't tried Python 3 to have a go. The programs will run with minor modifications on python 2 - just watch out for division which has changed from python 2 to 3. In fact if you run any of these programs under python 2 then put from __future__ import division at the top of them.

Python is a really excellent language for investigations into maths. It is easy to read and has excellent long number support with infinite precision integers and the decimal module. I'm going to attempt to demonstrate what I want to demonstrate not using any third party modules, but I may occasionally use the gmpy module.

I'm going to pitch the level of maths below calculus, no more than the maths that everyone should have learnt at school. If I want to use some bit of maths outside that range I'll explain it first. This does means that sometimes I won't be able to show a proof for things, but I'll add a link to an online proof if I can find one.

You are going to see maths written both in traditional style:

\begin{equation*}
x=\frac{-b \pm \sqrt {b^2-4ac}}{2a}
\end{equation*}

And in computer style which every programmer knows anyway. This involves a few more brackets, but it is much easier to translate to computer code. For example:

(-b +/- sqrt(b**2 - 4*a*c)) / (2*a)

In Python computer maths we write:

  • * to mean multiply
  • / to mean divide
  • ** to mean to the power
  • // to mean divide but give the result as an integer
  • % to mean the same as the modulo or mod operator, ie do the division and take the remainder.

I'm going to make free with unicode fonts so you'll see:

  • π - pi
  • x² - x squared
  • √x - square root of x
  • ± - plus or minus

If these aren't appearing properly, then you'll need to try a different browser, or install some more fonts.

As for the programming level, anyone who knows a little bit of python should be able to work out what is happening - I won't be doing anything too fancy!

I'll tag all the articles with pymath so you can find them or subscribe to the rss.

Happy Calculating!

Nick Craig-Wood

nick@craig-wood.com

Categories: pymath | View Comments
Read and Post Comments