Pi - Chudnovsky

In Part 3 we managed to calculate 1,000,000 decimal places of π with Machin's arctan formula. Our stated aim was 100,000,000 places which we are going to achieve now!

Fun with Maths and Python
This is a having fun with maths and python article. See the introduction for important information!

We have still got a long way to go though, and we'll have to improve both our algorithm (formula) for π and our implementation.

The current darling of the π world is the Chudnovsky algorithm which is similar to the arctan formula but it converges much quicker. It is also rather complicated. The formula itself is derived from one by Ramanjuan who's work was extraordinary in the extreme. It isn't trivial to prove, so I won't! Here is Chudnovsky's formula for π as it is usually stated:

$$ \frac{1}{\pi} = 12 \sum^\infty_{k=0} \frac{(-1)^k (6k)! (13591409 + 545140134k)}{(3k)!(k!)^3 640320^{3k + 3/2}} $$

That is quite a complicated formula, we will make more comprehensible in a moment. If you haven't seen the Σ notation before it just like a sum over a for loop in python. For example:

$$ \sum^{10}_{k=1} k^2 $$

Is exactly the same as this python fragment:

sum(k**2 for k in range(1,11))

First let's get rid of that funny fractional power:

$$ \begin{align} \frac{1}{\pi} &= 12 \sum^\infty_{k=0} \frac{(-1)^k (6k)! (13591409 + 545140134k)}{(3k)!(k!)^3 640320^{3k + 3/2}} \\ &= \frac{12}{640320 \sqrt{640320}} \sum^\infty_{k=0} \frac{(-1)^k (6k)! (13591409 + 545140134k)}{(3k)!(k!)^3 640320^{3k}} \\ &= \frac{1}{426880 \sqrt{10005}} \sum^\infty_{k=0} \frac{(-1)^k (6k)! (13591409 + 545140134k)}{(3k)!(k!)^3 640320^{3k}} \\ \end{align} $$

Now let's split it into two independent parts. See how there is a + sign inside the Σ? We can split it into two series which we will call a and b and work out what π is in terms of a and b:

$$ \begin{align} a &= \sum^\infty_{k=0} \frac{(-1)^k (6k)!}{(3k)!(k!)^3 640320^{3k}} \\ &= 1 - \frac{6\cdot5\cdot4}{(1)^3 640320^3} + \frac{12\cdot11\cdot10\cdot9\cdot8\cdot7}{(2\cdot1)^3 640320^6} - \frac{18\cdot17\cdots13}{(3\cdot2\cdot1)^3 640320^{9}} + \cdots \\ b &= \sum^\infty_{k=0} \frac{(-1)^k (6k)!k}{(3k)!(k!)^3 640320^{3k}} \\ \frac{1}{\pi} &= \frac{13591409a + 545140134b}{426880 \sqrt{10005}} \\ \pi &= \frac{426880 \sqrt{10005}}{13591409a + 545140134b} \end{align} $$

Finally note that we can calculate the next a term from the previous one, and the b terms from the a terms which simplifies the calculations rather a lot.

$$ \begin{align} a_k &= \frac{(-1)^k (6k)!}{(3k)!(k!)^3 640320^{3k}} \\ b_k &= k \cdot a_k \\ \frac{a_{k}}{a_{k-1}} &= - \frac{(6k-5)(6k-4)(6k-3)(6k-2)(6k-1)6k}{3k(3k-1)(3k-2)k^3 640320^3} \\ &= - \frac{24(6k-5)(2k-1)(6k-1)}{k^3 640320^3} \\ \end{align} $$

OK that was a lot of maths! But it is now in an easy to compute state, which we can do like this:

def pi_chudnovsky(one=1000000):
    """
    Calculate pi using Chudnovsky's series

    This calculates it in fixed point, using the value for one passed in
    """
    k = 1
    a_k = one
    a_sum = one
    b_sum = 0
    C = 640320
    C3_OVER_24 = C**3 // 24
    while 1:
        a_k *= -(6*k-5)*(2*k-1)*(6*k-1)
        a_k //= k*k*k*C3_OVER_24
        a_sum += a_k
        b_sum += k * a_k
        k += 1
        if a_k == 0:
            break
    total = 13591409*a_sum + 545140134*b_sum
    pi = (426880*sqrt(10005*one, one)*one) // total
    return pi

We need to be able to take the square root of long integers which python doesn't have a built in for. Luckily this is easy to provide. This uses a square root algorithm devised by Newton which doubles the number of significant places in the answer (quadratic convergence) each iteration:

 def sqrt(n, one):
     """
     Return the square root of n as a fixed point number with the one
     passed in.  It uses a second order Newton-Raphson convergence.  This
     doubles the number of significant figures on each iteration.
     """
     # Use floating point arithmetic to make an initial guess
     floating_point_precision = 10**16
     n_float = float((n * floating_point_precision) // one) / floating_point_precision
     x = (int(floating_point_precision * math.sqrt(n_float)) * one) // floating_point_precision
     n_one = n * one
     while 1:
         x_old = x
         x = (x + n_one // x) // 2
         if x == x_old:
             break
     return x

It uses normal floating point arithmetic to make an initial guess then refines it using Newton's method.

See pi_chudnovsky.py for the complete program. When I run it I get this:

Digits Time (seconds)
10 4.696e-05
100 0.0001530
1000 0.002027
10000 0.08685
100000 8.453
1000000 956.3

Which is nearly 3 times quicker than the best result in part 3, and gets us 1,000,000 places of π in about 15 minutes.

Amazingly there are still two major improvements we can make to this. The first is to recast the calculation using something called binary splitting. Binary splitting is a general purpose technique for speeding up this sort of calculation. What it does is convert the sum of the individual fractions into one giant fraction. This means that you only do one divide at the end of the calculation which speeds things up greatly, because division is slow compared to multiplication.

Consider the general infinite series

$$ S(0,\infty) = \frac{a_0p_0}{b_0q_0} + \frac{a_1p_0p_1}{b_1q_0q_1} + \frac{a_2p_0p_1p_2}{b_2q_0q_1q_2} + \frac{a_3p_0p_1p_2p_3}{b_3q_0q_1q_2q_3} + \cdots $$

This could be used as a model for all the infinite series we've looked at so far. Now lets consider the partial sum of that series from terms a to b (including a, not including b).

$$ \begin{align} S(a,b) &= \frac{a_ap_a}{b_aq_a} + \frac{a_{a+1}p_ap_{a+1}}{b_{a+1}q_aq_{a+1}} + \frac{a_{a+2}p_ap_{a+1}p_{a+2}}{b_{a+2}q_aq_{a+1}q_{a+2}} + \cdots + \frac{a_{b-1}p_ap_{a+1}p_{a+2}\cdots p_{b-1}}{b_{b-1}q_aq_{a+1}\cdots p_{b-1}} \\ \end{align} $$

This is a part of the infinite series, and if a=0 and b=∞ it becomes the infinite series above.

Now lets define some extra functions:

$$ \begin{align} P(a,b) &= p_ap_{a+1}\cdots p_{b-1} \\ Q(a,b) &= q_aq_{a+1}\cdots q_{b-1} \\ B(a,b) &= b_ab_{a+1}\cdots b_{b-1} \\ T(a,b) &= B(a,b)Q(a,b)S(a,b) \\ \end{align} $$

Let's define m which is a <= m <= b. Making m as near to the middle of a and b will lead to the quickest calculations, but for the proof, it has to be somewhere between them. Lets work out what happens to our variables P, Q, R and T when we split the series in two:

$$ \begin{align} P(a,b) &= P(a,m)P(b,m) \\ Q(a,b) &= Q(a,m)Q(b,m) \\ B(a,b) &= B(a,m)B(b,m) \\ T(a,b) &= B(m,b)Q(m,b)T(a,m) + B(a,m)P(a,m)T(m,b) \\ \end{align} $$

The first three of those statements are obvious from the definitions, but the last deserves proof.

$$ \begin{align} & B(m,b)Q(m,b)T(a,m) + B(a,m)P(a,m)T(m,b) && \left[T(a,b) = B(a,b)Q(a,b)S(a,b)\right] \\ &= B(m,b)Q(m,b)B(a,m)Q(a,m)S(a,m) + B(a,m)P(a,m)B(m,b)Q(m,b)S(m,b) && \left[Q(a,b) = Q(a,m)Q(b,m)\right] \\ & && \left[B(a,b) = B(a,m)B(b,m)\right] \\ &= B(a,b)Q(a,b)S(a,m) + B(a,b)P(a,m)Q(m,b)S(m,b) \\ &= B(a,b)(Q(a,b)S(a,m) + P(a,m)Q(m,b)S(m,b)) && \left[Q(m,b) = \frac{Q(a,b)}{Q(a,m)}\right] \\ &= B(a,b)\left(Q(a,b)S(a,m) + \frac{Q(a,b)P(a,m)}{Q(a,m)}S(m,b)\right) \\ &= B(a,b)Q(a,b)\left(S(a,m) + \frac{P(a,m)}{Q(a,m)}S(m,b)\right) \\ &= B(a,b)Q(a,b)\left( \frac{a_ap_a}{b_aq_a} + \frac{a_{a+1}p_ap_{a+1}}{b_{a+1}q_aq_{a+1}} + \frac{a_{a+2}p_ap_{a+1}p_{a+2}}{b_{a+2}q_aq_{a+1}q_{a+2}} + \cdots + \frac{a_{m-1}p_ap_{a+1}p_{a+2}\cdots p_{m-1}}{b_{m-1}q_aq_{a+1}\cdots p_{m-1}} + \frac{p_ap_{a+1}\cdots p_{m-1}}{q_aq_{a+1}\cdots q_{m-1}} + \left(\frac{a_mp_m}{b_mq_m} + \frac{a_{m+1}p_mp_{m+1}}{b_{m+1}q_mq_{m+1}} + \frac{a_{m+2}p_mp_{m+1}p_{m+2}}{b_{m+2}q_mq_{m+1}q_{m+2}} + \cdots + \frac{a_{b-1}p_mp_{m+1}p_{m+2}\cdots p_{b-1}}{b_{b-1}q_mq_{m+1}\cdots p_{b-1}} \right)\right) \\ &= B(a,b)Q(a,b)\left( \frac{a_ap_a}{b_aq_a} + \frac{a_{a+1}p_ap_{a+1}}{b_{a+1}q_aq_{a+1}} + \frac{a_{a+2}p_ap_{a+1}p_{a+2}}{b_{a+2}q_aq_{a+1}q_{a+2}} + \cdots + \frac{a_{m-1}p_ap_{a+1}p_{a+2}\cdots p_{m-1}}{b_{m-1}q_aq_{a+1}\cdots p_{m-1}} + \frac{p_ap_{a+1}\cdots p_{m-1}}{q_aq_{a+1}\cdots q_{m-1}} + \frac{a_mp_ap_{a+1}\cdots p_m}{b_mq_aq_{a+1}\cdots q_m} + \frac{a_{m+1}p_ap_{a+1}\cdots p_{m+1}}{b_{m+1}q_aq_{a+1}\cdots q_{m+1}} + \frac{a_{m+2}p_ap_{a+1}\cdots p_{m+2}}{b_{m+2}q_aq_{a+1}\cdots q_{m+2}} + \cdots + \frac{a_{b-1}p_ap_{a+1}\cdots p_{b-1}}{b_{b-1}q_aq_{a+1}\cdots p_{b-1}} \right) \\ &= B(a,b)Q(a,b)S(a,b) \\ &= T(a,b) \\ \end{align} $$

We can use these relations to expand the series recursively, so if we want S(0,8) then we can work out S(0,4) and S(4,8) and combine them. Likewise to calculate S(0,4) and S(4,8) we work out S(0,2), S(2,4), S(4,6), S(6,8) and combine them, and to work out those we work out S(0,1), S(1,2), S(2,3), S(3,4), S(4,5), S(5,6), S(6,7), S(7,8). Luckily we don't have to split them down any more as we know what P(a,a+1), Q(a,a+1) etc is from the definition above.

$$ \begin{align} P(a,a+1) &= p_a \\ Q(a,a+1) &= q_a \\ B(a,a+1) &= b_a \\ S(a,a+1) &= \frac{a_ap_a}{b_aq_a} \\ T(a,a+1) &= B(a,a+1)Q(a,a+1)S(a,a+1) \\ &= b_aq_a\frac{a_ap_a}{b_aq_a} \\ &= a_ap_a \\ \end{align} $$

And when you've finally worked out P(0,n),Q(0,n),B(0,n),T(0,n) you can work out S(0,n) with

$$ \begin{align} S(0,n) &= \frac{T(0,n)}{B(0,n)Q(0,n)} \\ \end{align} $$

If you want a more detailed and precise treatment of binary splitting then see Bruno Haible and Thomas Papanikolaou's paper.

So back to Chudnovksy's series. We can now set these parameters in the above general formula:

$$ \begin{align} p_0 &= 1 \\ p_a &= (6a-5)(2a-1)(6a-1) \\ q_0 &= 1 \\ q_a &= a^3\cdot640320^3/24 \\ b_a &= 1 \\ a_a &= (13591409 + 545140134a) \\ \end{align} $$

This then makes our Chudnovksy pi function look like this:

def pi_chudnovsky_bs(digits):
    """
    Compute int(pi * 10**digits)

    This is done using Chudnovsky's series with binary splitting
    """
    C = 640320
    C3_OVER_24 = C**3 // 24
    def bs(a, b):
        """
        Computes the terms for binary splitting the Chudnovsky infinite series

        a(a) = +/- (13591409 + 545140134*a)
        p(a) = (6*a-5)*(2*a-1)*(6*a-1)
        b(a) = 1
        q(a) = a*a*a*C3_OVER_24

        returns P(a,b), Q(a,b) and T(a,b)
        """
        if b - a == 1:
            # Directly compute P(a,a+1), Q(a,a+1) and T(a,a+1)
            if a == 0:
                Pab = Qab = 1
            else:
                Pab = (6*a-5)*(2*a-1)*(6*a-1)
                Qab = a*a*a*C3_OVER_24
            Tab = Pab * (13591409 + 545140134*a) # a(a) * p(a)
            if a & 1:
                Tab = -Tab
        else:
            # Recursively compute P(a,b), Q(a,b) and T(a,b)
            # m is the midpoint of a and b
            m = (a + b) // 2
            # Recursively calculate P(a,m), Q(a,m) and T(a,m)
            Pam, Qam, Tam = bs(a, m)
            # Recursively calculate P(m,b), Q(m,b) and T(m,b)
            Pmb, Qmb, Tmb = bs(m, b)
            # Now combine
            Pab = Pam * Pmb
            Qab = Qam * Qmb
            Tab = Qmb * Tam + Pam * Tmb
        return Pab, Qab, Tab
    # how many terms to compute
    DIGITS_PER_TERM = math.log10(C3_OVER_24/6/2/6)
    N = int(digits/DIGITS_PER_TERM + 1)
    # Calclate P(0,N) and Q(0,N)
    P, Q, T = bs(0, N)
    one = 10**digits
    sqrtC = sqrt(10005*one, one)
    return (Q*426880*sqrtC) // T

Hopefully you'll see how the maths above relates to this as I've used the same notation in each. Note that we calculate the number of digits of pi we expect per term of the series (about 14.18) to work out how many terms of the series we need to compute, as the binary splitting algorithm needs to know in advance how many terms to calculate. We also don't bother calculating B(a) as it is always 1. Defining a function inside a function like this makes what is known as a closure. This means that the inner function can access the variables in the outer function which is very convenient in this recursive algorithm as it stops us having to pass the constants to every call of the function.

See pi_chudnovsky_bs.py for the complete program. When I run it I get this:

Digits Time (seconds)
10 1.096e-05
100 3.194e-05
1000 0.0004899
10000 0.03403
100000 3.625
1000000 419.1

This is a bit more than twice as fast as pi_chudnovsky.py giving us our 1,000,000 places in just under 7 minutes. If you profile it you'll discover that almost all the time spent in the square root calculations (86% of the time) whereas only 56 seconds is spent in the binary splitting part. We could spend time improving the square root algorithm, but it is time to bring out the big guns: gmpy.

Gmpy is a python interface to the gmp library which is a C library for arbitrary precision arithmetic. It is very fast, much faster than the built in int type in python for large numbers. Luckily gmpy provides a type (mpz) which works exactly like normal python int types, so we have to make hardly any changes to our code to use it. These are using initialising variables with the mpz type, and using the sqrt method on mpz rather than our own home made sqrt algorithm:

import math
from gmpy2 import mpz
from time import time

def pi_chudnovsky_bs(digits):
    """
    Compute int(pi * 10**digits)

    This is done using Chudnovsky's series with binary splitting
    """
    C = 640320
    C3_OVER_24 = C**3 // 24
    def bs(a, b):
        """
        Computes the terms for binary splitting the Chudnovsky infinite series

        a(a) = +/- (13591409 + 545140134*a)
        p(a) = (6*a-5)*(2*a-1)*(6*a-1)
        b(a) = 1
        q(a) = a*a*a*C3_OVER_24

        returns P(a,b), Q(a,b) and T(a,b)
        """
        if b - a == 1:
            # Directly compute P(a,a+1), Q(a,a+1) and T(a,a+1)
            if a == 0:
                Pab = Qab = mpz(1)
            else:
                Pab = mpz((6*a-5)*(2*a-1)*(6*a-1))
                Qab = mpz(a*a*a*C3_OVER_24)
            Tab = Pab * (13591409 + 545140134*a) # a(a) * p(a)
            if a & 1:
                Tab = -Tab
        else:
            # Recursively compute P(a,b), Q(a,b) and T(a,b)
            # m is the midpoint of a and b
            m = (a + b) // 2
            # Recursively calculate P(a,m), Q(a,m) and T(a,m)
            Pam, Qam, Tam = bs(a, m)
            # Recursively calculate P(m,b), Q(m,b) and T(m,b)
            Pmb, Qmb, Tmb = bs(m, b)
            # Now combine
            Pab = Pam * Pmb
            Qab = Qam * Qmb
            Tab = Qmb * Tam + Pam * Tmb
        return Pab, Qab, Tab
    # how many terms to compute
    DIGITS_PER_TERM = math.log10(C3_OVER_24/6/2/6)
    N = int(digits/DIGITS_PER_TERM + 1)
    # Calclate P(0,N) and Q(0,N)
    P, Q, T = bs(0, N)
    one_squared = mpz(10)**(2*digits)
    sqrtC = (10005*one_squared).sqrt()
    return (Q*426880*sqrtC) // T

See pi_chudnovsky_bs_gmpy.py for the complete program. When I run it I get this:

Digits Time (seconds)
10 1.597e-05
100 3.409e-05
1000 0.003403
10000 0.003571
100000 0.09120
1000000 1.760
10000000 30.11
100000000 542.2

So we have achieved our goal of calculating 100,000,000 places of π in just under 10 minutes! What is limiting the program now is memory... 100,000,000 places takes about 600MB of memory to run. With 6 GB of free memory it could probably calculate one billion places in a few hours.

What if we wanted to go faster? Well you could use gmp-chudnovsky.c which is a C program which implements the Chudnovsky Algorithm. It is heavily optimised and rather difficult to understand, but if you untangle it you'll see it does exactly the same as above, with one extra twist. The twist is that it factors the fraction in the binary splitting phase as it goes along. If you examine the gmp-chudnovsky.c results page you'll see that the little python program above acquits itself very well - the python program only takes 75% longer than the optimised C program on the same hardware.

One algorithm which has the potential to beat Chudnovsky is the Arithmetic Geometric Mean algorithm which doubles the number of decimal places each iteration. It involves square roots and full precision divisions which makes it tricky to implement well. In theory it should be faster than Chudnovsk but, so far, in practice Chudnovsky is faster.

Here is a chart of all the different π programs we've developed in Part 1, Part 2, Part 3 and Part 4 with their timings on my 2010 Intel® Core™ i7 CPU Q820 @ 1.73GHz running 64 bit Linux:

Chart of timings for calculating π with various methods

Source spreadsheet for the timings.