In Part 3 we managed to calculate 1,000,000 decimal places of π with Machin's `arctan` formula. Our stated aim was 100,000,000 places which we are going to achieve now!

We have still got a long way to go though, and we'll have to improve both our algorithm (formula) for π and our implementation.

The current darling of the π world is the Chudnovsky algorithm which is similar to the `arctan` formula but it converges much quicker. It is also rather complicated. The formula itself is derived from one by Ramanjuan who's work was extraordinary in the extreme. It isn't trivial to prove, so I won't! Here is Chudnovsky's formula for π as it is usually stated:

That is quite a complicated formula, we will make more comprehensible in a moment. If you haven't seen the Σ notation before it just like a `sum` over a `for` loop in python. For example:

Is exactly the same as this python fragment:

sum(k**2 for k in range(1,11))

First let's get rid of that funny fractional power:

Now let's split it into two independent parts. See how there is a `+` sign inside the Σ? We can split it into two series which we will call `a` and `b` and work out what π is in terms of `a` and `b`:

Finally note that we can calculate the next `a` term from the
previous one, and the `b` terms from the `a` terms which
simplifies the calculations rather a lot.

OK that was a lot of maths! But it is now in an easy to compute state, which we can do like this:

def pi_chudnovsky(one=1000000): """ Calculate pi using Chudnovsky's series This calculates it in fixed point, using the value for one passed in """ k = 1 a_k = one a_sum = one b_sum = 0 C = 640320 C3_OVER_24 = C**3 // 24 while 1: a_k *= -(6*k-5)*(2*k-1)*(6*k-1) a_k //= k*k*k*C3_OVER_24 a_sum += a_k b_sum += k * a_k k += 1 if a_k == 0: break total = 13591409*a_sum + 545140134*b_sum pi = (426880*sqrt(10005*one, one)*one) // total return pi

We need to be able to take the square root of long integers which python doesn't have a built in for. Luckily this is easy to provide. This uses a square root algorithm devised by Newton which doubles the number of significant places in the answer (quadratic convergence) each iteration:

def sqrt(n, one): """ Return the square root of n as a fixed point number with the one passed in. It uses a second order Newton-Raphson convergence. This doubles the number of significant figures on each iteration. """ # Use floating point arithmetic to make an initial guess floating_point_precision = 10**16 n_float = float((n * floating_point_precision) // one) / floating_point_precision x = (int(floating_point_precision * math.sqrt(n_float)) * one) // floating_point_precision n_one = n * one while 1: x_old = x x = (x + n_one // x) // 2 if x == x_old: break return x

It uses normal floating point arithmetic to make an initial guess then refines it using Newton's method.

See pi_chudnovsky.py for the complete program. When I run it I get this:

Digits | Time (seconds) |
---|---|

10 | 4.696e-05 |

100 | 0.0001530 |

1000 | 0.002027 |

10000 | 0.08685 |

100000 | 8.453 |

1000000 | 956.3 |

Which is nearly 3 times quicker than the best result in part 3, and gets us 1,000,000 places of π in about 15 minutes.

Amazingly there are still two major improvements we can make to this. The first is to recast the calculation using something called binary splitting. Binary splitting is a general purpose technique for speeding up this sort of calculation. What it does is convert the sum of the individual fractions into one giant fraction. This means that you only do one divide at the end of the calculation which speeds things up greatly, because division is slow compared to multiplication.

Consider the general infinite series

This could be used as a model for all the infinite series we've looked at so far. Now lets consider the partial sum of that series from terms `a` to `b` (including a, not including b).

This is a part of the infinite series, and if a=0 and b=∞ it becomes the infinite series above.

Now lets define some extra functions:

Let's define m which is a <= m <= b. Making m as near to the middle of a and b will lead to the quickest calculations, but for the proof, it has to be somewhere between them. Lets work out what happens to our variables P, Q, R and T when we split the series in two:

The first three of those statements are obvious from the definitions, but the last deserves proof.

We can use these relations to expand the series recursively, so if we want S(0,8) then we can work out S(0,4) and S(4,8) and combine them. Likewise to calculate S(0,4) and S(4,8) we work out S(0,2), S(2,4), S(4,6), S(6,8) and combine them, and to work out those we work out S(0,1), S(1,2), S(2,3), S(3,4), S(4,5), S(5,6), S(6,7), S(7,8). Luckily we don't have to split them down any more as we know what P(a,a+1), Q(a,a+1) etc is from the definition above.

And when you've finally worked out P(0,n),Q(0,n),B(0,n),T(0,n) you can work out S(0,n) with

If you want a more detailed and precise treatment of binary splitting then see Bruno Haible and Thomas Papanikolaou's paper.

So back to Chudnovksy's series. We can now set these parameters in the above general formula:

This then makes our Chudnovksy pi function look like this:

def pi_chudnovsky_bs(digits): """ Compute int(pi * 10**digits) This is done using Chudnovsky's series with binary splitting """ C = 640320 C3_OVER_24 = C**3 // 24 def bs(a, b): """ Computes the terms for binary splitting the Chudnovsky infinite series a(a) = +/- (13591409 + 545140134*a) p(a) = (6*a-5)*(2*a-1)*(6*a-1) b(a) = 1 q(a) = a*a*a*C3_OVER_24 returns P(a,b), Q(a,b) and T(a,b) """ if b - a == 1: # Directly compute P(a,a+1), Q(a,a+1) and T(a,a+1) if a == 0: Pab = Qab = 1 else: Pab = (6*a-5)*(2*a-1)*(6*a-1) Qab = a*a*a*C3_OVER_24 Tab = Pab * (13591409 + 545140134*a) # a(a) * p(a) if a & 1: Tab = -Tab else: # Recursively compute P(a,b), Q(a,b) and T(a,b) # m is the midpoint of a and b m = (a + b) // 2 # Recursively calculate P(a,m), Q(a,m) and T(a,m) Pam, Qam, Tam = bs(a, m) # Recursively calculate P(m,b), Q(m,b) and T(m,b) Pmb, Qmb, Tmb = bs(m, b) # Now combine Pab = Pam * Pmb Qab = Qam * Qmb Tab = Qmb * Tam + Pam * Tmb return Pab, Qab, Tab # how many terms to compute DIGITS_PER_TERM = math.log10(C3_OVER_24/6/2/6) N = int(digits/DIGITS_PER_TERM + 1) # Calclate P(0,N) and Q(0,N) P, Q, T = bs(0, N) one = 10**digits sqrtC = sqrt(10005*one, one) return (Q*426880*sqrtC) // T

Hopefully you'll see how the maths above relates to this as I've used
the same notation in each. Note that we calculate the number of
digits of pi we expect per term of the series (about 14.18) to work
out how many terms of the series we need to compute, as the binary
splitting algorithm needs to know in advance how many terms to
calculate. We also don't bother calculating `B(a)` as it is always
`1`. Defining a function inside a function like this makes what is
known as a closure. This means that the inner function can access the
variables in the outer function which is very convenient in this
recursive algorithm as it stops us having to pass the constants to
every call of the function.

See pi_chudnovsky_bs.py for the complete program. When I run it I get this:

Digits | Time (seconds) |
---|---|

10 | 1.096e-05 |

100 | 3.194e-05 |

1000 | 0.0004899 |

10000 | 0.03403 |

100000 | 3.625 |

1000000 | 419.1 |

This is a bit more than twice as fast as pi_chudnovsky.py giving us our 1,000,000 places in just under 7 minutes. If you profile it you'll discover that almost all the time spent in the square root calculations (86% of the time) whereas only 56 seconds is spent in the binary splitting part. We could spend time improving the square root algorithm, but it is time to bring out the big guns: gmpy.

Gmpy is a python interface to the gmp library which is a C library
for arbitrary precision arithmetic. It is very fast, much faster than
the built in `int` type in python for large numbers. Luckily gmpy
provides a type (`mpz`) which works exactly like normal python
`int` types, so we have to make hardly any changes to our code to
use it. These are using initialising variables with the `mpz` type,
and using the `sqrt` method on `mpz` rather than our own home made
`sqrt` algorithm:

import math from gmpy2 import mpz from time import time def pi_chudnovsky_bs(digits): """ Compute int(pi * 10**digits) This is done using Chudnovsky's series with binary splitting """ C = 640320 C3_OVER_24 = C**3 // 24 def bs(a, b): """ Computes the terms for binary splitting the Chudnovsky infinite series a(a) = +/- (13591409 + 545140134*a) p(a) = (6*a-5)*(2*a-1)*(6*a-1) b(a) = 1 q(a) = a*a*a*C3_OVER_24 returns P(a,b), Q(a,b) and T(a,b) """ if b - a == 1: # Directly compute P(a,a+1), Q(a,a+1) and T(a,a+1) if a == 0: Pab = Qab = mpz(1) else: Pab = mpz((6*a-5)*(2*a-1)*(6*a-1)) Qab = mpz(a*a*a*C3_OVER_24) Tab = Pab * (13591409 + 545140134*a) # a(a) * p(a) if a & 1: Tab = -Tab else: # Recursively compute P(a,b), Q(a,b) and T(a,b) # m is the midpoint of a and b m = (a + b) // 2 # Recursively calculate P(a,m), Q(a,m) and T(a,m) Pam, Qam, Tam = bs(a, m) # Recursively calculate P(m,b), Q(m,b) and T(m,b) Pmb, Qmb, Tmb = bs(m, b) # Now combine Pab = Pam * Pmb Qab = Qam * Qmb Tab = Qmb * Tam + Pam * Tmb return Pab, Qab, Tab # how many terms to compute DIGITS_PER_TERM = math.log10(C3_OVER_24/6/2/6) N = int(digits/DIGITS_PER_TERM + 1) # Calclate P(0,N) and Q(0,N) P, Q, T = bs(0, N) one_squared = mpz(10)**(2*digits) sqrtC = (10005*one_squared).sqrt() return (Q*426880*sqrtC) // T

See pi_chudnovsky_bs_gmpy.py for the complete program. When I run it I get this:

Digits | Time (seconds) |
---|---|

10 | 1.597e-05 |

100 | 3.409e-05 |

1000 | 0.003403 |

10000 | 0.003571 |

100000 | 0.09120 |

1000000 | 1.760 |

10000000 | 30.11 |

100000000 | 542.2 |

So we have achieved our goal of calculating 100,000,000 places of π in just under 10 minutes! What is limiting the program now is memory... 100,000,000 places takes about 600MB of memory to run. With 6 GB of free memory it could probably calculate one billion places in a few hours.

What if we wanted to go faster? Well you could use gmp-chudnovsky.c which is a C program which implements the Chudnovsky Algorithm. It is heavily optimised and rather difficult to understand, but if you untangle it you'll see it does exactly the same as above, with one extra twist. The twist is that it factors the fraction in the binary splitting phase as it goes along. If you examine the gmp-chudnovsky.c results page you'll see that the little python program above acquits itself very well - the python program only takes 75% longer than the optimised C program on the same hardware.

One algorithm which has the potential to beat Chudnovsky is the Arithmetic Geometric Mean algorithm which doubles the number of decimal places each iteration. It involves square roots and full precision divisions which makes it tricky to implement well. In theory it should be faster than Chudnovsk but, so far, in practice Chudnovsky is faster.

Here is a chart of all the different π programs we've developed in Part 1, Part 2, Part 3 and Part 4 with their timings on my 2010 Intel® Core™ i7 CPU Q820 @ 1.73GHz running 64 bit Linux: