Compile-time Function Execution in D

This post will describe the basics of a very powerful feature of the D programming language: the Compile-time Function Execution (CTFE), which allows complicated functions to be fully evaluated at compile-time, irrespective of the optimization levels.

C optimizations

If you are an average C programmer, you know that simple code can be trusted to be evaluated at compile-time thanks to optimizers. For instance, if you write something like:


void square(int x) { return x*x; }
 
void foo(void)
{
   int k = square(32);
   /* ... */
}

you trust your compiler to evaluate the square at compile-time, when optimizations are on. When things get more hairy:


int factorial(int x)
{
    int result = x;
    while (--x)
         result *= x;
   return result;
}
 
void foo(void)
{
   int k = factorial(8);
   /* ... */
}

C programmers get immediately less confident about what will happen at run-time. For instance, would you say that your compiler is able to expand the above code at compile-time or not? Actually, the answer is “yes” in this particular case (unless you are using a very old compiler), but the point is still valid: this is not C code that one would write if he wants to be sure that the whole calculation be folded at compile-time.

There is also another issue: since the language does not mandate that the value is folded (and in fact, it is not folded when optimizations are disabled), you cannot create a constant out of it, such as by assigning it to a const variable.

When things get hairy

Now, let’s try with a (very naive and simple) solution of problem #1 of Project Euler:


#include <stdio.h>
 
int euler1(int max)
{
   int i,res=0;
   for (i=1; i<max; i++)
   {
       if ((i % 3) == 0 || (i % 5) == 0)
           res += i;
   }
   return res;
}
 
int main()
{
   int r10 = euler1(10);
   int r1000 = euler1(1000);
   printf("%d %d\n", r10, r1000);
   return 0;
}

This program simply calculates the sum of all divisors of 3 or 5 below 1000. But if you look at the generated code with GCC under -O3, you will see that the actual results are not computed at compile-time, but rather calculated at runtime. I believe any average C programmer would agree that we should not expect this code to be folded at compile time.

Now, meet the equivalent D code:


int euler1(int max)
{
   int i,res=0;
   for (i=1; i<max; i++)
   {
       if ((i % 3) == 0 || (i % 5) == 0)
           res += i;
   }
   return res;
}
 
int main()
{
   int r10 = euler1(10);
   int r1000 = euler1(1000);
   printf("%d %d\n", r10, r1000);
   return 0;
}

Deja-vu? Yes, it is exactly the same, barring the initial include statement that is not required (actually, there is no preprocessor in D and modules refer to each other with the import statement, but printf is a builtin). Of course, the above example was hand-crafted to make it both valid C and D code, but being D an evolution of C, the basic syntax is the same.

Meet CTFE

And now the hattrick: in D, we can request the compiler to evaluate euler1 at compile-time by simply using the static keyword at invocation time:


static int r10 = euler1(10);
static int r1000 = euler1(1000);

Great, isn’t it? Now the result of the above function call are evaluated by the compiler, irrespective of the optimization levels. If the function cannot be evaluated at compile-time (usually because it has side-effects, like any kind of I/O), it will trigger a compile-time error.

We can verify that the above constants really do appear in the generated code by compiling with gdc -save-temps euler1.d and then inspecting euler1.s:


D6euler14mainFZi3r10i:
        .long   23
.globl _D6euler14mainFZi5r1000i
        .align 4
        .type   _D6euler14mainFZi5r1000i, @object
        .size   _D6euler14mainFZi5r1000i, 4
_D6euler14mainFZi5r1000i:
        .long   233168
        .section        .rodata
.LC0:
        .string "%d %d\n"
        .text
.globl _Dmain
        .type   _Dmain, @function
_Dmain:
.LFB3:
        pushq   %rbp
.LCFI3:
        movq    %rsp, %rbp
.LCFI4:
        movl    _D6euler14mainFZi5r1000i(%rip), %edx
        movl    _D6euler14mainFZi3r10i(%rip), %esi
        movl    $.LC0, %edi
        movl    $0, %eax
        call    printf
        movl    $0, %eax
        leave
        ret

Notice how the compiler has calculated the values 23 and 233168 (respectively, results of euler1(10) and euler1(1000)) and put them in the data section of the executable.

If you are curious of what happens when the compiler cannot do the whole evaluation at compile-time, it is sufficient to stick a printf() call somewhere in the euler() function. Since printf() does some I/O, it breaks CFTE, and the compiler will happily tell you about that:


euler1.d:9: Error: cannot evaluate printf("hello, world!\x0a") at compile time
euler1.d:15: Error: cannot evaluate euler1(10) at compile time
euler1.d:16: Error: cannot evaluate euler1(1000) at compile time

Closing words

CTFE is simple yet very powerful. The fact that it is triggered at the call site (rather than being an attribute of the function, like the inline keyword) is a very smart design choice: it makes perfectly sense for the same function to be used at both run-time and compile-time, depending on the inputs.

For the C++ guys reading, C++0x has grown a constexpr keyword that, while looking superficially similar, it is a lot less powerful, since it can only be used on very simple functions (basically, one-liners). In fact, the keyword is meant to be use while declaring a function, and not at the call-site, so it has to apply only on small functions which can be proved to always yield a constant value.

EDIT: This post is being discussed on Reddit