P99 implements many different features through macros and functions, too many to mention explicitly in such an overview. You will find a structured hierarchy of descriptions below the "Modules" tag and the documentation of individual items under "Files" -> "Globals". Here we will introduce some main features:

Default arguments to functions

In section Use of temporary lvalues we saw a way to provide default arguments to functions by overloading them with macros. The general declaration pattern here is as follows

#define NAME(...) P99_CALL_DEFARG(NAME, N, __VA_ARGS__)

Where NAME becomes the name of a macro and where we also suppose that there is already a function of the same name NAME.

The default value for the ::hostname macro above was produced by a macro, namely hostname_defarg_0. The evaluation of the default value is done in the context of the call and not in the context of the declaration. For default arguments that are not constants but expressions that have to be evaluated this is a major difference to C++. There, default arguments are always evaluated in the context of the declaration.

The convention here is simple:

when called, P99_CALL_DEFARG replaces each argument M (counting starts at 0) that is not provided by the tokens
NAME ## _defarg_ ## M ()

that is a concatenation of NAME with the token defarg and the decimal number M
"not provided" here means either
- leaving an empty place in an argument list
- giving fewer arguments than N
to be valid C code this name must then either
1. itself be a macro that is then expanded
2. be a valid function call that can be interpreted by the compiler

As we have seen in the example (a) is computed in the context of the caller. This let us simply use a temporary (here a local compound literal) that was thus valid in that context.

To obtain the same behavior as for C++, namely to provide a default argument that is evaluated at the place of declaration and not at the place of the call we have to use (b), a function call. This will be as efficient as a macro call if we use Inline functions for that purpose.

To ease the programming of this functional approach, P99 provides some machinery. We need three things as in the following example:

P99_PROTOTYPE(rand48_t *, rand48_t_init, rand48_t*, unsigned short, unsigned short, unsigned short);
#define rand48_t_init(...) P99_CALL_DEFARG(rand48_t_init, 4, __VA_ARGS__)
P99_DECLARE_DEFARG(rand48_t_init,
                   ,
                   useconds(),
                   getpid(),
                   atomic_fetch_add(&rand48_counter)
                   );

Namely

a "prototype" of the underlying function, such that P99 knows the name of the function, the return type and the types of the arguments.
the macro definition as we have already seen
a declaration of the default arguments.

Here in the example, a default argument is provided for positions 1 to 3 but not for position 0. All three defaults have the type unsigned short. The above code leads to the automatic generation of three inline functions that look something like:

inline
unsigned short
rand48_t_init_defarg_1(void) {
  return useconds();
}
inline
unsigned short
rand48_t_init_defarg_2(void) {
  return getpid();
}
inline
unsigned short
rand48_t_init_defarg_3(void) {
  return atomic_fetch_add(&rand48_counter);
}

This declaration and definition is placed in the context of the above declaration and not in the context of the caller. Thus the expression is evaluated in that context, and not in the context of the caller. In particular for the third function, this fixes the variable rand48_counter to the one that is visible at the point of declaration.

Scope-bound resource management with for-statements

Resource management can be tedious in C. E.g to protect a critical block from simultaneous execution in a threaded environment you'd have to place a lock / unlock pair before and after that block:

 mtx_t guard;
mtx_init(&guard);
 
mtx_lock(&guard);
// critical block comes here
mtx_unlock(&guard);

This is error prone as locking calls must be provided for each critical block. If the block is longer than a few lines it becomes increasingly difficult to ensure the unlocking of the resource, since the lock / unlock calls are spread at the same level as other code.

Within C99 (and equally in C++, BTW) it is possible to extend the language in order to make this more easily visible and to guarantee that lock / unlock calls match. Below, we will give an example of a macro that will help us to write something like

P99_PROTECTED_BLOCK(mtx_lock(&guard),
                    mtx_unlock(&guard)) {
       // critical block comes here
}

To make this even more comfortable we have

P99_MUTUAL_EXCLUDE(&guard) {
       // critical block comes here
}

There is an equivalent block protection that uses an ::atomic_flag as a spin lock. Such a spin lock uses only atomic operations and can be much more efficient than protection through a ::mtx_t, if the code inside the critical section is really small and fast:

P99_SPIN_EXCLUDE(&cat) {
       // critical block comes here
}

For cases where the ::atomic_flag variable would be specific to the block, you don't even have to define it yourself:

P99_CRITICAL {
       // critical block comes here
}

Generally there should be no run-time performance cost for using such a macro. Any decent compiler will detect that the dependent code is executed exactly once, and thus optimize out all the control that has to do with our specific implementation of theses blocks.

Other such block macros that can be implemented with such a technique are:

pre- and post-conditions
ensuring that some dynamic initialization of a static variable is performed exactly once
code instrumentation

An even more sophisticated tool for scope-bound resource management is provided by the macro P99_UNWIND_PROTECT

double toto(double x) {
 P99_UNWIND_PROTECT {
   // do something
   while (cond0) {
     for (;cond1;) {
        if (cond2) P99_UNWIND(-1);
        // preliminary return
        if (cond3) P99_UNWIND_RETURN 5.7777E-30;
     }
   }
  P99_PROTECT :
   // do some cleanup here
   // if everything went well ::p99_unwind_code has value 0 otherwise it
   // receives a value from P99_UNWIND
 }
 // regular return
 return x * x;
}

In this code fragment the statement P99_UNWIND will ensure that the two levels of loops are broken and that execution continues at the special label P99_PROTECT.

P99_UNWIND_RETURN goes one step further. As for P99_UNWIND, it executes the clause after P99_PROTECT, but when it reaches the end of the P99_UNWIND_PROTECT scope it will return to the caller with a return value as specified after P99_UNWIND_RETURN, here the value 5.7777E-30.

On certain platforms that implement enough of C11 we even now have try-catch clauses that are entirely implemented within C. P99_TRY and P99_CATCH can be used as follows

double toto(double x) {
 P99_TRY {
   // do something
   while (cond0) {
     for (;cond1;) {
        if (cond2) P99_THROW -1;
        // preliminary return
        if (cond3) P99_UNWIND_RETURN 5.7777E-30;
     }
   }
 } P99_CATCH(int code) {
   // do some cleanup here
   // if everything went well "code" has value 0 otherwise it
   // receives a value from ::P99_TRY
 }
 // regular return
 return x * x;
}

The advantage of P99_TRY over P99_UNWIND is that P99_THROW will also work from other functions that called within the try-block.

Multidimensional arrays and parallel loops

We provide some utilities to ease the programming of loop iterations in one or multiple dimensions. The simplest to use is P99_DO, that closely resembles a do loop in Fortran. It fixes the bounds of the iteration once, before entering the iteration itself.

P99_DO(size_t, i, a, n, inc) {
  A[i] *= B[i-1]
}

P99_FORALL allows the generatation of nested for-loops over an arbitrary number of dimensions:

size_t const D[3] = { 20, 17, 31 };
P99_FORALL(D, i, j, k) {
     A[i][j][k] *= B[i][j][k];
}

will iterate over all combinations of i, j, k in the bounds specified by D.

P99_PARALLEL_FOR, where available, will provide a parallelized version of a simple for-loop, and P99_PARALLEL_DO and P99_PARALLEL_FORALL implement nested parallel loops with otherwise the same semantics as for P99_DO or P99_FORALL, respectively.

Preprocessor conditionals and loops

P99 provides you with macro features that can become handy if you have to generate code repetition that might later be subject to changes. As examples suppose that you'd have to code something like

tata = A[0]; tete = A[1]; titi = A[2]; toto = A[3];

typedef int hui_0; typedef unsigned hui_1; typedef double hui_2;

If, over time, there are many additions and removals to these lists, maintaining such code will not really be a pleasure. In P99 you may write equivalent statements and declarations just as

P99_VASSIGNS(A, tata, tete, titi, toto);

P99_TYPEDEFS(hui, int, unsigned, double);

There are a handful of such predefined macros that you may look up under Produce C99 statements or expression lists. Under the hood they all use a more general macro that you may yourself use to define your own macros: P99_FOR. The use of this will be described in more detail under Macro programming with P99.

The predefined macros from above are also able to avoid the nasty special case that the variadic part of the argument list is empty. Something like

P99_VASSIGNS(A);

P99_TYPEDEFS(hui);

would at least cause a warning with conforming preprocessors if the macros were implemented directly with something like

#define P99_VASSIGNS(NAME, ...) do_something_here

#define P99_TYPEDEFS(NAME, ...) do_something_else_here

since the variable length part should not be empty, according to the standard. With P99 you don't have these sort of problems, the above should just result in empty statements or declarations, that are even capable of swallowing the then superfluous semicolon at the end.

P99 avoids this by testing for the length of the argument list as a whole with P99_NARG and by using a macro conditional controlled by that length. Such conditionals like P99_IF_EMPTY ensure that the preprocessor decides which of two different code variants the compiler will see. The fragment

P99_IF_EMPTY(BLA)(special_version)(general_version)

will expand to either special_version or general_version according to BLA. If it expands to an empty token, the first variant is produced, if there is at least one non-empty token the second version results.

P99 also implements logical and arithmetic operations in the preprocessor. Logical operations just evaluate to the tokens 0 or 1. Arithmetic is restricted to small decimal numbers, less than P99_MAX_NUMBER. Some examples

P99_IS_EQ(int, double)    ==> 0
P99_IS_EQ(static, static) ==> 1
P99_ADD(4, 5)         ==> 9

See Preprocessor operations for more about that.

Allocation and initialization facilities

Consistent initialization of variables is an important issue in C. P99 provides some tools to help with that, most importantly a macro P99_NEW. Therefore we have to relay on some assumptions that are specified in Variable initialization, in particular that there is an ‘init’ function for each type that we want to use with P99_NEW.

For the example type of a circular list element

// Forward declare struct elem and elem
typedef struct elem elem;
.
.
.
struct elem { elem* pred; elem* succ; };

we might want to ensure that the fields pred and succ are always properly initialized. An ‘init’ function could be as follows:

#define ELEM_INITIALIZER(HERE, PRED, SUCC) {
 .pred = (PRED) ? (PRED) : (HERE),
 .succ = (SUCC) ? (SUCC) ; (HERE),
}

A static initialization of a 4-element list in file scope can then be done as

extern elem * head;
.
.
static elem L0;
static elem L1;
static elem L2;
static elem L3;
static elem L0 = ELEM_INITIALIZER(&L0, &L1, &L3);
static elem L1 = ELEM_INITIALIZER(&L1, &L0, &L2);
static elem L2 = ELEM_INITIALIZER(&L2, &L1, &L3);
static elem L3 = ELEM_INITIALIZER(&L3, &L2, &L0);
head = &L0;

Dynamic initialization of a 4-element list on the stack in function scope

elem L[4] = {
  [0] = ELEM_INITIALIZER(&L[0], &L[1], &L[3]),
  [1] = ELEM_INITIALIZER(&L[1], &L[0], &L[2]),
  [2] = ELEM_INITIALIZER(&L[2], &L[1], &L[3]),
  [3] = ELEM_INITIALIZER(&L[3], &L[2], &L[0]),
};

For dynamic initialization we would then define something like this:

elem * elem_init(elem* here, elem* there) {
  if (here) {
    if (there) {
       here->pred = there;
       here->succ = there->succ;
       there->succ = here;
       here->succ->pred = here;
    } else {
       here->pred = here;
       here->succ = here;
    }
  }
  return here;
}

Initializations of this type of heap variables in function scope can now simply look like this

elem * a = P99_NEW(elem, P99_0(elem*));
elem * b = P99_NEW(elem, a);
elem * c = P99_NEW(elem, b);

elem * head = P99_NEW(elem, P99_NEW(elem, P99_NEW(elem, P99_0(elem*))));

These define cyclic lists of 3 elements, well initialized and ready to go.

In fact, the P99_NEW macro takes a list of arguments that may be arbitrarily³ long. It just needs the first, which must be the type of the object that is to be created. The others are then passed as supplementary arguments to the ‘init’ function, here the parameter there.

If the ‘init’ function accepts default arguments to some parameters, so will P99_NEW. With Default arguments and types for functions, calls to P99_NEW may then omit the second argument:

#define elem_init(...) P99_CALL_DEFARG(elem_init, 2, __VA_ARGS__)
#define elem_init_defarg_1() P99_0(elem*)
.
.
.
elem * a = P99_NEW(elem);
elem * head = P99_NEW(elem, P99_NEW(elem, P99_NEW(elem)));

Footnotes:: ³ The number of arguments might be restricted by your compiler implementation. Also most of the P99 macros are limited to P99_MAX_NUMBER.

Emulating features of C11

C11 (published in December 2011) introduces some new features that are already present in many compilers or OS, but sometimes with different syntax or interfaces. We provide interfaces to some of them with the intention that once compilers that implement C11 come out these interfaces can directly relate to the C11 feature.

With these emulated interfaces you can already program almost as if you had a native C11 compiler (which doesn't yet exist) and take advantage of the improvements that C11 makes to the language, without giving up on portability in the real world of today's compilers.

Type generic macros

C11 provides a feature to "overload" macros and more generally the result of any type of expression, _Generic. It allows to write template-like expressions with the macro preprocessor. The idea is to generate type generic mathematical function that already had been present in C99:

If you include the "tgmath.h" header you have a macro sin that implements calls to the family of sine functions, e.g

double complex z0 = sin(1.0); // invokes the @em function @c sin

double complex z1 = sin(2.0 + 3*I); // invokes the function @c csin

At compile, these type generic macros decide from the type of the argument which function call to emit.

The new concept of _Generic expressions generalizes this concept. From the usually public domain compilers at the time of this writing (Apr 2012) only clang implements this feature already. On the other hand gcc has extension that can be used to emulate it, and such an emulation is provided through P99_GENERIC.

Atomic operations

Atomic operations are an important contribution of the new standard; these operations are implemented on all commodity CPU nowadays but a direct interface in a higher programming language was missing.

These operations give guarantees on the coherence of data accesses and other primitive operations, even in presence of races. Such races may occur between different threads (see below) of the same application or when a program execution is interrupted, e.g for a signal handler or a longjmp call. Since most instructions on modern CPU are composed of several micro-instructions, in such a context an instruction may only succeed partially and a data may end up in a intermediate state.

In this example

static _Atomic(size_t) n = 0;
atomic_fetch_and_add(&n, 1);
// do something in here
atomic_fetch_and_sub(&n, 1);

the variable n is always in a clean state: either the addition of has taken place or it has not. Multiple threads can execute this code without locking a mutex or so, the value of n will always be well defined.

One of the interesting concepts that come with C11 is ::atomic_flag, that is a simple interface that can implement spinlocks quite efficiently.

Threads

Atomic operations have their major advantage in the presence of threads, that is multiple entities that compute concurrently inside the same application and using a common address space. C11 provides an optional interface for threads and the principal data structures that are needed for them (::thrd_mutex_t and ::thrd_cond_t). This thread interface is a bit simpler than POSIX threads, but implements the main features.

P99 provides a shallow wrapper on top of POSIX threads that provides all the interfaces that are required by C11.