This was my entry for the 20^{th} International Obfuscated C Code
Contest, a contest whose aims are

- To write the most Obscure/Obfuscated C program under the rules.
- To show the importance of programming style, in an ironic way.
- To stress C compilers with unusual code.
- To illustrate some of the subtleties of the C language.
- To provide a safe forum for poor C code. :-)

I've been meaning to enter the contest for quite a few years so was very pleased to see it re-opened for 2012.

Alas my entry didn't win, but maybe you'll enjoy the entry and explanation.

Here is my entry:

```
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<inttypes.h>
#define H uint32_t
#define U uint64_t
#define N void
#define T int
#define P calloc(n,sizeof(U))
#define R return
#define I for(
#define M if(
#define E >>32
#define S i=0;i<n;i++)
U K=-(U)(H)-1,*B,*C
,*x,AC,h;T L,n,F,l,p,D,*_;U e(U x,U
y){x=K-x;U r=y-x;M y<x)r+=K;R r;}U AB(U x,H
w,U*c){U s=x+*c;H t=s<x;*c=(s>>w)+(t<<(64-w));R
s&((((U)1)<<w)-1 ) ;} U Q(U x,U y ) {U r
=x-y;M x<y)r+=K;R r;} U AD(U b,U
a){H d=b E,c=b;M a >= K)a-= K;a=
Q(a,c);a=Q(a,d) ;a=e (a
,((U)c)<<32);R a;}U A( U
x,U y){H a=x,b= x E,c=y,d
=y E;x=b*(U)c;y=a*(U)d;U e =a*(
U)c,f=b*(U)d,t;x+=y;M x<y)f +=1LL
<<32;t=x<<32;e+=t;M e<t)f+= 1;t=
x E;f+=t;R AD(f,e);}U J(U a,U b
){U r= 1;T i;I i=63;i>=0; i--){r=A(r,r);M(b >>i)&1)r=A(r
,a);}R r;}U f(U a){R J(a,K -2);}N G (H n,U*x, U*y)
{T i;I S x[i]=A(x[i],y[i ]);}N g( H n,U*x) {T i;I S x[i]
=A(x[i],x[i]);}N AA(){U d = AC; T k ;I k=
F;k>=1;k--){T m=1<<k,c=m>> 1,j, r
;U w=1;I j=0;j<c;j++){I r= 0;r<n ;r
+=m){T a=r+j,b=a+c;U u=x[a] ,v=x[b] ;x[a
]=e(u,v);x[b]=A(Q(u,v),w);}w= A( w,d );}d=A
(d,d);}}N W(){T k;I k=1;k<=F;k ++ ){T m=1<< k, c=m>>
1,j,r;U z=(U)(1<<((H)F-(H)k)), d= J(h,z),w=1;I j=0
;j<c;j++){I r=0;r<n;r+=m){T a =r+j, b=a+ c;U u=x[a], v=A (w
,x[b]);x[a]=e(u,v);x[b]=Q(u,v);}w=A(w ,d);}}}T O(){T o=0,i, w=L
/n;U s=J(7,(K-1)/192/n*5);AC=J(s,192) ;h=f (AC );M 2 *w +F
>=61)R 0;l=w;D=1<<w;p=w+1;x=P;B=P;C=P; _= P; *B=1;*C=f( n);I
i=0;i<=n;i++){U t=(U)L*(U)i,r=t%n,b=t/n ;H a;M r
E)R 0;M(H)r)b++;M b E)R 0;a=b;M i>0){U c=_[ i-1]
=a-o;M c!=w&&c!=w+1)R 0;M i<n){r=n-r;B[i]=J(s,r);C[i]= f(A(B
[i],n));}}o=a;}R 1;}U V(){T i=0,j=0;U r=0;I;i<64&&j<n;i+=_
[j],j++)r|=x[j]<<i;M r)R r;I r=j=0;j<n;j++)r|=x[j];R r;}
N X(H c,T i){while(c){I;i<n;i++){H y=1<<_[i];x[i]+=c;
M x[i]>=y)x[ i]-=y,c=1; else
R;}i=0;} }N Y(H c){T i;
while(c) {I S{H y=1<< _[i] ;x[i ]-=
c;M x[i]>=y)x[i]+=y,c=1 ;else R ;}}}N
Z(U c){T i;while(c){I S{x[ i]=AB(x[i],_[i],
&c);H t=c;M!(c E)&&t <D){M t ) X(t,i+1);R;}}}}N q()
{T i;U c=0;G(n,x, B);AA(F,x);g(n,x);W(F ,x);G(n,x
,C);I S x[i]= AB(x[i],_[i],&c);M c)Z(c);Y (2);}T main(
T w,char** v){T i,k,j;M w<2) {puts("Usage: p [n]");R 1
;}L=atoi (v[1] );j=w>2 ?atoi(v[2]):L-
2;do F++ ,n=1<<F;while(!O ());I k=0;k
<1;k++){*x=4;I i=0;i<j;i++)q();}printf("0x%016"PRIX64"\n",V());R 0;}
```

I'm going to explain exactly what it does and how, but first here are the hints and remarks I provided with the code. Some of the hints assume you've pre-processed the code into something vaguely legible first.

You stare at the source code... Is that a picture of a monk? What does that strange mathematical writing say? You get a subliminal message... Hunt for primes? But how?

You run the program. It comes up with a strange usage message. What
could `p`

be? You try running the program with a prime number as an
argument.

./prog 23209

It returns after a few seconds with the rather cryptic message

0x0000000000000000

What could that mean? Maybe that 2^{23209}-1 (a 6,987 digit number) is
prime after all?

OK If you haven't guessed by now, this program proves the primality of
Mersenne primes of the
form 2^{p}-1. The biggest known primes are of this form because there is
an efficient test for primality - the Lucas Lehmer
test.

If you run the program and it produces `0x0000000000000000`

(0 in hex)
as an answer then you've found a prime of the form 2^{p}-1. If it
produces anything else then 2^{p}-1 isn't prime. The printed result is
the bottom 64 bits of the last iteration of the Lucas Lehmer test.

Have a look at the source code and see if you can work out how it implements the Lucas Lehmer test...

Your first clue is to work out what the magic number `-(U)(H)-1`

is. (My
favourite bit of obfuscation!)

Got it? It is a prime just less than 2^{64}, `0xFFFFFFFF00000001`

=
2^{64} - 2^{32} + 1 (p from now on), with some rather special properties.

The most important of which is that arithmetic modulo p can all be done without using divisions or modulo operations (which are really slow). See if you can spot the add, subtract and multiplication routines. There are also reciprocal and to the power operations in there somewhere.

The next important property is that p-1 has the following factors

```
2^32 * 3 * 5 * 17 * 257 * 65537
```

All those factors of 2 suggest that we can do a Fast Fourier Transform over the Galois Field GF(p) which is arithmetic modulo p. See if you can spot the FFT code!

To make a truly efficient Mersenne primality prover it is necessary to
implement the
IBDWT
a variant of the FFT using an irrational base. This halves the size of
the FFT array by combining the modulo and multiply steps from the Lucas
Lehmer test into one. You might wonder how you can use an irrational
base in a discrete field where every element is an integer, but as it
happens the p chosen has n-th roots of 2 where n is up to 26, which
means that an IBDWT can be defined for FFT sizes up to 2^{26}.

You'll find the code to generate the roots of 1 and 2 necessary for the IBDWT if you search for 7 in the code. 7 is a primitive root of GF(p) so all follows from that!

This all integer IBDWT isn't susceptible to the round off errors that plague the floating point implementation, but does suffer in the speed stakes from the fact that modern CPUs have much faster floating point units than integer units.

For example to check if 2^{18000000}-1 is prime (a 5,418,539 digit
number) Prime95 (the current speed
demon) uses a 2^{20} FFT size and each iteration takes it about 25ms. On
similar hardware it this program takes about 1.2s also using an FFT size
of 2^{20}! Prime95 is written in optimised assembler to use SSE3 whereas
this program was squeezed into 2k of portable C with a completely
unoptimised FFT!

This program can do an FFT of up to 2^{26} entries. Each entry can have
up to 18 bits in (as `2*18+26 <= 63`

), which means that the biggest
exponent that it can test for primality is 18*2^{26}-1 = 1,207,959,551.
This would be number of 363 million decimal digits so could be used to
find a 100 million digit prime and scoop the EFF $150,000
prize! It would take rather a long
time though...

The second argument to the program allows the number of iterations to be put in, and using this it has been validated against the Prime95 test suite. There is a unit test for the program but I thought including it was probably against the spirit of the contest!

Speed of the FFT could be improved greatly, but unfortunately there wasn't space in the margin to do so ;-)

This should compile on any compiler which supports `inttypes.h`

from the
C99 standard (eg gcc).

It runs a lot quicker when compiled as a 64 bit binary!

I have successfully compiled and run this program without changes or compiler warnings for:

- 64 bit Ubuntu Linux 11.10
- 32 bit Debian Linux (Lenny)
- Windows using MINGW
- OS X 10.7 using Apple's gcc based compiler

If you don't have a compiler which supports `inttypes.h`

from C99
(hello it is 2012 Microsoft!) then you'll need to go platform specific:

- comment out the
`#include <inttypes.h>`

line- change the
`#define`

for`uint32_t`

to be an unsigned 32 bit type`unsigned int`

or`unsigned __int32`

- change the
`#define`

for`uint64_t`

to be an unsigned 64 bit type`unsigned long long`

or`unsigned __int64`

- add
`#define PRIX64 "llX"`

or`"lX"`

or`"I64X"`

I wrote most of this code some time ago as a view to publishing a full description of how my ARMprime code works in easy to understand C rather than a twisty turny mess of ARM assembler macros.

However I never got round to finishing it, so I used the IOCCC as a spur to get me to finish it properly!

I've open-sourced the code and the build chain at github.

As you can imagine it isn't possible to develop the code in the form
above from scratch! I created an unobfuscated version
mersenne.c
first and then wrote two programs, one which compresses the code (it
uses a manual version of LZ77 compression with `#define`

statements) and
another which makes it into the pretty picture.

This means that I can change the code and just type make to create the final pretty code for the entry, and to run the unit tests with.

The first challenge was cutting the code down to size. It needed to be less than 2k of non-whitespace characters and less than 4k total. I went through it getting rid of all extraneous functions, compacting definitions, removing optimisations, obfuscating and it was still far too big.

With the help of crush.py and after many hours of work I managed to get it small enough. I then wrote check.py to see whether the code was within the competition rules and unit_test.py to validate the code against Prime95's test suite.

Finally I wrote artify.py which reads the crushed source code and uses it to paint the image. It treats each character (or sometimes a group of characters so the C remains valid) as a pixel and paints the image with code!

Note that none of the python programs are particularly tidy, but that python is a really good language for doing this in - I'd hate to be doing it in C!

Now I've introduced the unobfuscated version, I can show how each part of the code works.

Firstly all good obfuscated programs have lots of global variables to shorten the code down:

```
u64 MOD_P = -(u64)(u32)-1, /* 0xFFFFFFFF00000001 - the magic number */
*digit_weight, /* Array of weights for IBDWT */
*digit_unweight, /* Inverse of the above */
*x, /* pointer to array of n 64 bit integers - the current number */
MOD_W, /* The root of 2 for the fft so that 2**MOD_W == 1 mod p */
MOD_INVW; /* The inverse of the above so that MOD_W * MOD_INVW == 1 mod p */
int exponent, /* The mersenne exponent - we are testing 2^exponent -1 is prime or not */
n, /* The number of 64 bit numbers in the FFT */
log_n, /* n = 2^log_n */
digit_width0, /* Digit width for the IBDWT */
digit_width1,
digit_width_0_max,
*digit_widths; /* Array of number of bits in each element of the FFT */
```

From the top down there is the main loop which contains the Lucas Lehmer algorithm:

```
int main(int w, char ** v)
{
int i,k,j;
if (w < 2)
{
puts("Usage:@p@[n]");
return 1;
}
exponent = atoi(v[1]);
j = w > 2 ? atoi(v[2]) : exponent - 2; /* iterations */
/* initialise, finding correct FFT size */
do log_n++, n = 1 << log_n;
while(!mersenne_initialise());
for (k = 0; k < 1; k++)
{
*x = 4;
for (i = 0; i < j; i++)
mersenne_mul();
}
printf("0x%016" PRIX64 "\n", mersenne_residue());
return 0;
}
```

It reads the exponent and optional number of iterations from the command
line, then initialises the program, starting with a small FFT size `n`

and increasing until it finds one which fits.

It then does the Lucas Lehmer test setting the initial value to 4, iterating it for exponent times and printing the residue.

The core of the Lucas Lehmer test (and the
IBDWT)
is the `mersenne_mul`

function which squares the current number and
subtracts 2, returning it modulo 2^{p} - 1 :

```
void mersenne_mul()
{
int i;
u64 c = 0;
/* weight the input */
mod_vector_mul(n, x, digit_weight);
/* transform */
fft_fastish(log_n, x);
/* point multiply */
mod_sqr_vector(n, x);
/* untransform */
invFft_fastish(log_n, x);
/* unweight and normalise the output */
mod_vector_mul(n, x, digit_unweight);
/* carry propagation */
for (i = 0; i < n; i++)
x[i] = mod_adc(x[i], digit_widths[i], &c);
if (c)
mersenne_add64(c);
/* subtract 2 */
mersenne_sub32(2);
}
```

This the FFT, point multiply and inverse FFT perform a convolution. This
does the bulk of the work of the multiplication. The weight and unweight
and the start and end is the IBDWT which ensures that the result is
modulo 2^{p} - 1. The carries are propagated and 2 is subtracted from
the end.

The FFT is a completely standard FFT with bit reversed output, the only unusual thing being that all the operations are mod p rather than with complex numbers. If you compare the code for the FFT with a standard implementation you will see it is very similar:

```
void fft_fastish()
{
u64 d = MOD_W;
int k;
for (k = log_n; k >= 1; k--)
{
int m = 1 << k,
c = m >> 1,
j,
r;
u64 w = 1;
for (j = 0; j < c; j++)
{
for (r = 0; r < n; r += m)
{
int a = r + j,
b = a + c;
u64 u = x[a],
v = x[b];
x[a] = mod_add(u, v);
x[b] = mod_mul(mod_sub(u, v), w);
}
w = mod_mul(w, d);
}
d = mod_mul(d, d);
}
}
```

`MOD_W`

is the root of unity the FFT is built on. It happens to be an
integer in GF(p) such that `MOD_W`

^{n} == 1 mod p. You can see the FFT
calculating the twiddles (`w`

). Precalculating them would be better
obviously but not in 2k of code! The inner loop shows the butterfly.

Now for the fundamental operations over GF(p). First thing to note is
that in the code p = `MOD_P`

= 2^{64} - 2^{32} + 1. `MOD_P`

was chosen so
that it has quite a few special properties, one of them being that it is
possible to do modulo p operations without doing divisions (which are
really slow!).

The easiest operation to define is subtraction:

```
u64 mod_sub(u64 x, u64 y)
{
u64 r = x - y;
/* if borrow generated - hopefully the compiler will optimise this! */
if (x < y)
r += MOD_P; /* Add back p if borrow */
return r;
}
```

The C code was designed so that the if statement should be optimised into a jump on carry flag if the compiler is doing its job properly. Addition can then be defined as subtracting a negative number:

```
u64 mod_add(u64 x, u64 y)
{
x = MOD_P - x; /* do addition by negating y then subtracting */
u64 r = y - x; /* y - (-x) */
/* if borrow generated - hopefully the compiler will optimise this! */
if (y < x)
r += MOD_P; /* Add back p if borrow */
return r;
}
```

To do multiplication, first it is necessary to work out how to do reduce a 128 bit number mod p. We use these facts:

- 2
^{64}== 2^{32}-1 mod p - 2
^{96}== -1 mod p - 2
^{128}== -2^{32}mod p - 2
^{192}== 1 mod p - 2
^{n}* 2^{(192-n)}= 1 mod p

Thus to reduce a 128 bit number mod p (split into 32 bit chunks x3,x2,x1,x0):

x3 * 2^{96} + x2 * 2^{64} + x1 * 2^{32} + x0 mod p

= x2 * 2^{64} + x1*2^{32} + x0-x3) [using 2^{96} mod p = -1]

= (x1+x2) * 2^{32} + x0 - x3 - x2 [using 2^{64} mod p = 2^{32} -1]

This is explained in more detail in my ARMprime pages. This division free code then looks like this:

```
u64 mod_reduce(u64 b, u64 a)
{
u32 d = b >>32,
c = b;
if (a >= MOD_P) /* (x1, x0) */
a -= MOD_P;
a = mod_sub(a, c);
a = mod_sub(a, d);
a = mod_add(a, ((u64)c)<<32);
return a;
}
```

Given `mod_reduce`

, multiplication is relatively straight forward:

```
u64 mod_mul(u64 x, u64 y)
{
u32 a = x,
b = x >>32,
c = y,
d = y >>32;
/* first synthesise the product using 32*32 -> 64 bit multiplies */
x = b * (u64)c;
y = a * (u64)d;
u64 e = a * (u64)c,
f = b * (u64)d,
t;
x += y; /* b*c + a*d */
/* carry? */
if (x < y)
f += 1LL << 32; /* carry into upper 32 bits - can't overflow */
t = x << 32;
e += t; /* a*c + LSW(b*c + a*d) */
/* carry? */
if (e < t)
f += 1; /* carry into upper 64 bits - can't overflow*/
t = x >>32;
f += t; /* b*d + MSW(b*c + a*d) */
/* can't overflow */
/* now reduce: (b*d + MSW(b*c + a*d), a*c + LSW(b*c + a*d)) */
return mod_reduce(f, e);
}
```

The definitions for the other arithmetic operations are straight forward now. Power is just repeated multiplication and squaring. Inverse is to the power of -1.

All those things come together in the `mersenne_mul`

function above!

If you want to look at the complete (mostly) unobfuscated code then take look through mersenne.c.

The IOCCC was a fun challenge and it got me to finally complete my C conversion of a lot of really difficult ARM assembly code. Hopefully I'll find the time to demonstrate some more optimised FFTs and an FFT with a factor of 3 in too.

Getting the code small enough was really, really difficult, but helped immensely by Python. It started off life as pretty obscure so that process made it more so! I'm afraid even as explained above it probably isn't accessible to very many people - my apologies for that.

Happy Prime Hunting!

Contents

Share and Enjoy

Links