eĿlipsis
a language independent preprocessor
 
All Data Structures Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
Loading...
Searching...
No Matches
Extensions to the preprocessor

There is a wide range of extensions to the preprocessor that eĿlipsis provides. Available for all languages are

There are also some that are specific to C that instrument some punctuators, namely local unique identifiers with __LOC_NEW and __LOC, and the defer feature.

Directives

A directive is what in C would be the commands to the preprocessor that are introduced with a # character at the beginning of a line, such as in

#if all_is_well
# define WELLNESS(X) X*much
#endif

EĿlipsis allows to extend this concept to other programming or text-processing languages by providing a syntax that replaces the # character that is used in C and that is then more adapted to the respective context. Nevertheless, in the following we mostly use C syntax for our examples and talk of directives with their C names, e.g #define.

An overview of the directives that are implemented by eĿlipsis is aggregated in a pseudo source file directives.c. Among the extensions are directives that

  • expand the line, first, and then use as directive #expand
  • restrict the scope of macro definitions #bind
  • list processing #gather and #scatter

These new directives are probably best observed in action, see below.

See also
Recursive inclusion

Standard C directives and their extensions

Include directories

C compilers and preprocessors generally have comand line options to add directories to their include search paths. EL' lipsis has an extension that directly works with the #include directive, by using file names that end with a slash.

#include "my_home/dir/"
#include ⟨my_home/dear/⟩

With this feature configuration of the preprocessor may be simply kept in a header file for the respective language.

Include syntax

C's syntax using < and > for includes is rather weird and does not fit well into the rest of the language concerning tokenization. After all, these are operators, and in particular it is not easy to comprehend if and when arguments to the #include directive are expanded. Also, since other languages such as html even use these punctuators themselves as brackets, we cannot generalize their use.

Therefor we add syntax to avoid part of the problem: we have and (MATHEMATICAL LEFT and RIGHT ANGLE BRACKET) to take that role for all languages.

See also
Special include paths for all languages

Empty include files

In contexts where include file names are expanded or where other complex preprocessor features are used it can be convenient to have empty strings ("" and ⟨ ⟩ (or <> for c)). They have the same effect as including an empty file.

Prefix, suffix and length attributes for #include

These attributes are borrowed from C23's #embed directive. Prefix and suffix are such that they execute their contents as directives before or after the inclusion of the source file. This is particularly interesting for the #bind directive

#include "code.h" prefix(bind foo 37)
__directive__ bind
A local equivalent to #define.
Definition directives.c:162

this defines foo for the duration of the inclusion to the indicated value and then undefines it at the end.

Warning
The #include directive will ingeneral evaluate its arguments. If in the above bind or foo exe already defined as macros weird things might happen. Use the #include_source directive, instead, where possible.

In a similar approach we also provide the length attribute. it accounts for offsets in lines, and not in bytes as for #embed.

Offset attributes for #embed and #include

For timing reasons this #embed attribute had not been included into C23. We also introduce it to #include where, as for length, it accounts for offsets in lines, and not in bytes.

Builtins

The virtual include fine ellipsis-builtins.dirs has the description of builtins that are provided as macro interfaces. In general, they have a behavior that is a bit different from "normal" macros, for example those that are function-like do not expand their argument list but treat that list directly, or they refer to external state such as times or a counter.

Custom macros

The include file ellipsis-predefined.dirs contains several macro definitions that provide features that can be implemented directly as macros in eĿlipsis, but which you probably don't want to implement for each project from scratch.

Language independent includes

The directory The include directory contains several headers that provide more sophisticated features, in particular list processing with ellipsis-foreach.dirs and iteration with ellipsis-do.dirs. All of these are independent of the language that is processed.

Punctuators and operators

Internally, eĿlipsis encodes most punctuators that are used for operators with a single code point. That concerns those punctuators that C has as digraphs with the appropriate Unicode code point, e.g. the ugly <= becomes . As an extension, eĿlipsis accepts these operators in all languages and in particular you may use them in preprocessor expressions.

See also
Punctuators

Exemption from macro expansion

When the input tokens are filtered by a C-like preprocessor nominals (called identifiers in C) that correspond to the name of a macro are expanded. C only has two exceptions from this:

  • If the current filtering is the expansion of a macro named ID, other occurences of that same name ID are not expanded. So preprocessing has no normal recursion. (But see Tail recursion, below.)
  • If a macro name ID of a functional macro (that is a macro with a parameter list) occurs and that name is not followed by an opening parenthesis (possibly preceded by whitespace) then this token is not expanded, either.

The lack of the possibility to protect other nominals can result in a lot of irritation in contexts where names of identifiers are glued together by means of the token join operator (or ## in C). You'd see things such as the following quite commonly in preprocessor oriented C headers:

#define MYFUNC(F) hurlipurtz ## F

The user then expects the following

void* MYFUNC(getone)();

to expand to

void* hurlipurtzgetone();

But in fact it wouldn't always. If this code is located in a header and another header is included that would define hurlipurtz to expand to schnurtz, the above function declaration would read.

void* schnurtzgetone();

not at all what was intended. Obviously for the given example this is highly unlikely but the smaller the name components that are used get, the higher the probability of collision. For example a lot of code would use the token _ to glue together two name components, so whenever some smart colleague thinks they should overload that token with something funny, you are in trouble.

To protect against this effect, as an extension, eĿlipsis adds a third method for identifier protection, the keep and peek brackets. Rewritten with these, the above definition would look

#define MYFUNC(F) ⸤hurlipurtz⸥ ## F

That is, here the funny brackets (BOTTOM LEFT HALF BRACKET and BOTTOM RIGHT HALF BRACKET) protect the nominal hurlipurtz to be expanded when MYFUNC is used. If you are programming in C and are allergic to Unicode characters you could also use the digraphs <| and |> that replace them.

These brackets have the following properties

  • They don't interact with tokenization. That is, the input stream is split up into tokens, first, and possible keep and peek brackets are identified during that process as tokens of their own. They only take their special role in the next phase, filtering, when macros are expanded and directives are interpreted.
  • During filtering they apply to arbitrary long sequences of tokens where they protect all nominals that appear in the sequence.
  • They can be nested as long keep and peek brackets nest properly.
  • If a construct appears between these brackets that is normally ended by an EOL character, the construct extends to the end of the outermost pair of keep and peek brackets.

The latter can be in particular be useful for long macro definition that go over multiple lines

<|
#define SWAP(NAME, T)
⸤void⸥ NAME ## ⸤_swap⸥(T* ⸤a⸥, T* ⸤b⸥) {
T* ⸤tmp⸥ = ⸤a⸥;
⸤a⸥ = ⸤b⸥;
⸤b⸥ = ⸤tmp⸥;
}
|>

or equivalently

<|
#define SWAP(NAME, T)
⸤void NAME ## _swap(T* a, T* b) {⸥
⸤T* tmp = a;⸥
⸤a = b;⸥
⸤b = tmp;⸥
⸤}⸥
|>

or even

<|
#define SWAP(NAME, T)
<|void NAME ## _swap(T* a, T* b) {
T* tmp = a;
a = b;
b = tmp;
}|>
|>

Such a definition ensures that users text editor basically sees a function, and thus code indentation and highlighting should work as expected. The nested used of keep and peek brackets then also guarantees that the local identifiers a, b and tmp are not accidentally overwritten during macro expansion. So using this macro as SWAP(fine, well) would expand to

void fine_swap(well* a, well* b) {
well* tmp = a;
a = b;
b = tmp;
}

Side effects

For the original C and C++ preprocessors the only way to change the internal state are directives:

  • include to temporarily switch to another source file
  • line to change the line number and source file information
  • define to add a new macro to the state
  • undef to remove a macro from the state

As already seen above, eĿlipsis adds several directives that are variants of the above, and that also change the internal state.

With the addition of the __COUNTER__ macro, gcc has changed the game. Indeed, this macro increments a counter at each invocation, and thus changes the preprocessor state at each time.

Our builtins __INCREMENT__, __DECREMENT__, __INSTANT__ and __CLEAR__ move this concept on a higher level, namely

  • They allow you to identify as much changeable state with side effects as we want. Any macro that holds an integer value can be used as a counter.
  • You may revert to a previous state, for example by decrementing a counter that previously has been incremented, or by clearing it to 0.
  • Such variables don't have to pre-exist as macros. All four builtins initialize their argument macro to 0 if it has not been set before and then perform their specific action. For example, you'd use __INSTANT__ to query the current value or set it to 0 if it had not been set before. In contrast to that __CLEAR__ will erase a current value, if it exist, and set and return 0 unconditionally.

These features are used in several header files, for general use or specialized for use in C. Most work by using counters to construct hierarchic unique identifiers for which the generation pattern may follow syntactic properties of the target language, see ellipsis-unique.dirs, ellipsis-defer.h or ellipsis-trigger.h.

Instrumenting punctuators in C

We instrument the following punctuators for C because they have key roles in the grammatical structure of the language:

punctuator roles nesting level
{ start of compound statement __BRACE_LEVEL__
start of struc or union
start of initializer
} end of compound statement
end of struc or union
end of initializer
[ start of index __BRACKET_LEVEL__
start of designator
start of array length
] end of index
end of designator
end of array length
= start of initializer
assignment
; end of statement
end of declaration
separation for for loops
[[ or start of attribute __ATTR_LEVEL__, __BRACKET_LEVEL__
]] or end of attribute
:: attribute separator
name composition

This instrumentation is achieved by making these tokens "nominals" that can then be defined as macros. The default macro for each of these resolves to the punctuator itself and also maintains counters for the nesting level where indicated.

Note that __ATTR_LEVEL__ should only be 0 or 1. Also, attributes increment and decrement the level for brackets, such that we know if a ]] actually closes two array subscripts or if it closes an attribute.

The result of a :: token will be different according to context: if __ATTR_LEVEL__ is 0 it serves for name composition (generally resulting in a replacement by a ) and for [[prefix::suffix]] vendor attributes otherwise.

You cannot undef these builtin macros, don't even try. But you could, in principle, add features to them:

#bind TRIGGER printf("say hello");
#gather { TRIGGER

This would add the printf call after each occurrence of a { in the source, even for struct definitions or initializers. So this is probably not what you want.

We currently use this for two C extension that otherwise would not easily be implementable in the preprocessor:

  • The macros __LOC_NEW and __LOC from ellipsis-loc.h provide unique identifiers within compound statements.
  • The defer feature uses this to ensure that defer clauses are not hidden inside nested scopes that would not be subject to unrolling.

Generating unique identifiers

C has not really an identifier hierarchy and so it is easy to shut yourself by defining macros that interact badly with some code that you don't control. The header ellipsis-loc.h provides tools to circumvent this problem by creating unique identifiers for the whole translation unit that can be referred to locally.

As an example take a function that needs to name a parameter because it is used as an array length in other parameter declarations. With __LOC_NEW and __LOC this becomes as simple as in

void copyem(size_t __LOC_NEW, double [static restrict __LOC()], double const [static restrict __LOC()]);
#define __LOC_NEW
create a new identifier in the current scope of {}
Definition ellipsis-loc.h:3
#define __LOC(POS0, POS1)
refer to an identifier created with __LOC_NEW
Definition ellipsis-loc.h:2

Here the preprocessor then chooses a unique name for __LOC_NEW that then is used two times by __LOC(). Which particular identifier is chosen is not our concern, we just have the guaranty that it doesn't interact with other identifiers in the same unit.

Defer: a lexical extension of the C grammar

There is a proposal for an extension of the C language called defer, that is prospected to either be published in a technical specification or the next revision, C2Y, of the C standard:

Our implementation follows the simplest version that is described here, only some additional restrictions apply, see below.

This feature is motivated by code patterns as the following, which use goto jumps to ensure that certain cleanup code snippets are executed in the correct order and under the right circumstances:

void fun() {
void * const p = malloc(25);
if (!p) goto DEFER0;
void*const q = malloc(25);
if (!q) goto DEFER1;
if (mtx_lock(&mut)==thrd_error) goto DEFER2;
// all resources acquired
// ... use p and q under protection of mut ... then
mtx_unlock(&mut);
DEFER2:
free(q);
DEFER1:
free(p);
DEFER0:;
}

Such spaghetti patterns are quite commonly used in C because they currently provide the only systematic tool to do such cleanup. The pattern has several disadvantages

  • cleanup code appears far from the place where its need is detected
  • we have to invent a naming scheme for labels that has to be unique within the same function
  • a jump over some cleanup code is not detected
  • adding resources and cleanup code easily introduces bugs

The defer feature avoids all of these. Using this inside a compound statement indicates that that code is to be executed at the end of that compound statement:

#include_directives <ellipsis-defer.h>
void fun() {
void * const p = malloc(25);
if (!p) return;
defer { free(p); };
void * const q = malloc(25);
if (!q) defer_skip;
defer { free(q); };
if (mtx_lock(&mut)==thrd_error) defer_skip;
defer { mtx_unlock(&mut); };
// all resources acquired
// ... use p and q under protection of mut ... then
}
#define defer
Mark the depending compound statement as being deferred until the current compound statement (the anc...
Definition ellipsis-defer.h:524
#define defer_skip
terminate the closest surrounding compound statement with defer statements
Definition ellipsis-defer.h:517

By using hierarchic identifiers, we manage to implement the static aspects of this feature, namely to suspend the execution of specific code snippets until the end of the surrounding block.

The semantics of the example are as if we had written something similar to the following.

void fun() {
void * const p = malloc(25);
if (!p) goto DEFER0;
if (false) {
DEFER1:
free(p);
goto DEFER0;
}
void * const q = malloc(25);
if (!q) goto DEFER1;
if (false) {
DEFER2:
free(q);
goto DEFER1;
}
if (mtx_lock(&mut)==thrd_error) goto DEFER2;
if (false) {
DEFER3:
mtx_unlock(&mut);
goto DEFER2;
}
// all resources acquired
// ... use p and q under protection of mut ... then
goto DEFER3;
DEFER0:;
}

Note that the goto DEFER3 statement in the before last line of the compound statement starts a chain that walks backwards to the code snippets that had been introduced by the defer clauses. These clauses are protected from being executed when we pass through the block by hiding them inside if (false) statements. By such a trick the defer code is only executed when jumped to directly.

Evidently, the code that is really produced when you use this feature is a bit more complicated, but I hope you get the idea.

The example above does not allow for us to jump into or out of the compound statement; only a normal return statement would guarantee to execute all deferred statements.

#include_directives <ellipsis-defer.h>
int main(void) {
DEFER_TYPE(int);
void * const p = malloc(25);
if (!p) return EXIT_FAILURE;
defer { free(p); };
void * const q = malloc(25);
if (!q) return EXIT_FAILURE;
defer { free(q); };
if (mtx_lock(&mut)==thrd_error) return EXIT_FAILURE;
defer { mtx_unlock(&mut); };
// all resources acquired
// ... use p and q under protection of mut ... then
return EXIT_SUCCESS;
}
#define DEFER_TYPE(...)
Provide the return type to a function context that uses the defer feature.
int main(void)
Definition predefined_c_attribute_generate.c:3

The other features that allow to finish a particular construct early are break and continue. With this implementation in the preprocessor we are only able to count nested compound statements reliably; other nested statements such as towers of if, switch and for are not so easy to detect. Overall we impose the following restrictions.

Warning
When used with this defer implementation, all loop and switch bodies must be compound statements. The implementation tries to enforce this. You'll get a lot of warnings for code that doesn't stick to this rule.
To implement return correctly in macros we have to ensure that the return value is collected in some local variable before we unwind all the deferred blocks. To do so, our macros need the return type of the function which unfortunately is not available directly in C. The purpose of the invocation of DEFER_TYPE(int) is to provide that information to the macros. The return macro is programmed in such a way that it only changes the semantic to the defer-semantic when there are indeed defer statements; if not, a plain old return remains in the replacement code after preprocessing. If you want to emphasize that a void function uses defer, the macro DEFER_TYPE may be used with an empty argument list.
Jumps over a defer into the corresponding anchor by means of jump statements are prohibited. The detection uses the compiler's support for variably modified types to avoid such jumps.
In general, it is not possible to leave a defer statement other than by reaching. Jump statements break, continue and return are captured at compile time. Others, goto, longjmp or exit and similar are in general not possible without jeopardizing the cleanup mechanism.

As an exception of the latter, the very first defer in a function: a non-regular exit or jump from there cannot jump over another deferred statement. For example the following is valid

int main(int argc, char* argv[argc+1]) {
DEFER_TYPE(int);
thrd_exit(defer_return_value);
}
return EXIT_SUCCESS;
}
#define defer_return_value
access the return value of a function from within a defer statement
Definition ellipsis-defer.h:515

With this code a termination of main by a return will not necessarily terminate the execution but only the thread that corresponds to main. Here the macro defer_return_value provides a read-only access to the return value that had been collected at the return statement. Similarly to the example above that first defer may contain calls to exit, quick_exit, long_jmp or other functions that transfer control out of the function.

Preprocessor state and identifier composition

EĿlipsis provides access to some of the state of preprocessing, in particular it traces the include hierarchy for nested includes. There is one primitive, __INCLUDE_DEPTH__, that allows to track this, and several other builtins that are derived from it.

The top-level source file inherits a "project name" from the environment variable __PROJECT__, that, if set, should be an identifier (such as ellipsis) or a list of identifiers separated by a token (such as ellipsis∷token). Other files that are then subsequently included can use this name to define a __UNIT__ name (most of the time abbreviated as ¤, CURRENCY SIGN, U00A4), for example by appending a new component. If the project is ellipsis∷token the #unit directive

#unit ¤¤∷array

would place the code inside the ellipsis∷token∷array unit. The punctuator (PROPORTION, U2237) is reserved by eĿlipsis for this purpose. According to the parsed language, it might map to another punctuator.

The current unit name is then available as __UNIT__ and the currency character ¤ is available as a shortcut. For example ¤∷grow would then refer to the function ellipsis∷token∷array∷grow.

If the unit feature is used as above to define the unit name (derived with ¤¤∷something, the __PARENT_UNIT__ and ¤¤ still refer to the unit from which the current unit is deduced. For example in the ellipsis∷token∷array unit the PARENT_UNIT macro refers to ellipsis∷token and ¤¤∷cat would be ellipsis∷token∷cat.

An example for this mechanism is the builtin header file ellipsis-move.h. It sets the unit name to the one of its includer with the function name move appended:

#unit ¤¤∷move
#include_directives <ellipsis-move-macro.h>
/*^
** @brief Move a `¤¤` pointed to by the second parameter to the one pointed to by the first.
**
** If `target` is not null before, the old pointed-to object is
** deleted.
**
** @memberof ¤¤
^*/
inline void ¤(¤¤* __LOC_NEW[restrict static 1], ¤¤**restrict __LOC_NEW) {
ELLIPSIS_MOVE(__LOC(1, 1), __LOC(0, 1), ¤¤∷delete)
}
#define ELLIPSIS_MOVE(TARGET, SOURCE, FREE)
Free the contents of TARGET and move the contents of SOURCE to TARGET.
Definition ellipsis-move-macro.h:8
#define ¤
shortcut for __UNIT__
Definition ellipsis-predefined.dirs:114
#define ¤¤
shortcut for __PARENT_UNIT__
Definition ellipsis-predefined.dirs:115

Then the inline function definition uses the macro ELLIPSIS_MOVE to move the contents of the second pointer to the first. If the first already was pointing to some object that object is deleted by using the function ¤¤∷delete which is assumed to exist.

Seen in the context of the ellipsis‿str8 type, expanded function interface looks like ellipsis‿str8::ellipsis‿str8‿move

inline void ellipsis‿str8‿move(ellipsis‿str8* __LOC_ID_0_3[restrict static 1], ellipsis‿str8**restrict __LOC_ID_0_4) {
if (*__LOC_ID_0_3)
ellipsis‿str8‿delete((void*)*__LOC_ID_0_3);
if (__LOC_ID_0_4) {
*__LOC_ID_0_3= *__LOC_ID_0_4;
*__LOC_ID_0_4= nullptr;
} else {
void ellipsis‿str8‿delete(ellipsis‿str8 const *s)
Definition ellipsis-str8.c:624
A structure with a flexible array member of base type ellipsis‿str8‿base.
Definition ellipsis-str8.h:150
*__LOC_ID_0_3= nullptr;
}
}

Another way of achieving the same can be seen here:

#unit ¤¤
#include_directives <ellipsis-move-macro.h>
/*^
** @brief Move a `¤ const` pointed to by the second parameter to the one pointed to by the first.
**
** If `target` is not null before, the old pointed-to object is
** deleted.
**
** @memberof ¤
^*/
inline void ¤∷cmove(¤ const* __LOC_NEW[restrict static 1], ¤ const**restrict __LOC_NEW) {
ELLIPSIS_MOVE(__LOC(1, 1), __LOC(0, 1), ¤∷delete)
}
void ¤∷cmove(¤ const *__LOC_NEW[restrict static 1], ¤ const **restrict __LOC_NEW)
Definition ellipsis-cmove.h:12

Instead of setting the current unit to the new name, we just inherit the parent name; during the include of this header ¤ is set to ¤¤. Indeed, the result is very similar

inline void ellipsis‿str8‿cmove(ellipsis‿str8 const* __LOC_ID_0_5[restrict static 1], ellipsis‿str8 const**restrict __LOC_ID_0_6) {
if (*__LOC_ID_0_5)
ellipsis‿str8‿delete((void*)*__LOC_ID_0_5);
if (__LOC_ID_0_6) {
*__LOC_ID_0_5= *__LOC_ID_0_6;
*__LOC_ID_0_6= nullptr;
} else {
*__LOC_ID_0_5= nullptr;
}
}

Name composition for C

In C, the token :: has an ambiguous role, because it represents name composition most naturally in a similar way that someone coming from C++ would expect. So, in most contexts the token :: is equivalent to , such that you may choose to input the feature the way you prefer. The only context where it is not is within attributes (in the form of [[something]] or ⟦something⟧). Here we have to be more careful: we can't mangle names such as [[gnu::something]] because we have to transfer them as such to the real C compiler at the end.

In the context where it is understood as name composer, the punctuator token is simply replaced by a (UNDERTIE, U203F) character; the resulting combination of identifers with such characters are then valid identifiers for C23 and can be used by compilers as such. For example the above function ellipsis∷token∷array∷grow (or ellipsis::token::array::grow) appears as ellipsis‿token‿array‿grow in intermediate output and for the linker. This is possible because the undertie character is allowed for identifiers, but fortunately yet rarely used by applications.

Tail recursion

There is some support for ordinary macro recursion in eĿlipsis, provided by the header ellipsis-recursive.dirs. This only works for direct recursion (a macro calling itself directly) and is also limited to a small recursion depth. This is so, because of the expansion rules of preprocessing (in particular for the __VA_ARGS__ pseudo-parameter) such a recursive macro quickly blows up to have quadratic complexity, which is not the right thing to do for text processing.

Instead we provide a mechanism to program tail recursion directly, without having to name or expand __VA_ARGS__ at any point. This works with a mechanism that is similar to C23's __VA_OPT__, a pseudo parameter that is only active if the macro has been called with a non-empty variable argument list. The syntax is

__VA_TAIL__(IDENTIFIER)

where IDENTIFIER is the name of a macro to be invoked. This name can be omitted, and then the call refers to the current macro, instead.

Examples

Simple tails

#define MAKE_SEMI(X, ...) X; __VA_TAIL__(MAKE_SEMI)
MAKE_SEMI(A)
MAKE_SEMI(A, B)
MAKE_SEMI(A, B, C)

results in

A;
A; B;
A; B; C;

Here the first part of the definition, X;, appends a semicolon to the first argument. The construct __VA_TAIL__(MAKE_SEMI) then calls the same macro again with the remaining argument list, only that for doing so, that argument list is not expanded again, but just passed through.

To simplify writing such a macro and to possibly accelerate its expansion a bit, the name of the macro can be omitted. The construct then refers to the macro where it is found. So an equivalent definition of the above could be:

#define MAKE_SEMI(X, ...) X; __VA_TAIL__()

where the redundant information of the name is omitted from the tail call.

Nested tails

#define MAKE_PAREN(X, ...) (X __VA_OPT__(+__VA_TAIL__()))
MAKE_PAREN(A);
MAKE_PAREN(A, B);
MAKE_PAREN(A, B, C);

results in the three lines

(A);
(A+(B));
(A+(B+(C)));

This is a bit more complicated. First, if this macro is called with just one argument, the whole part within the __VA_OPT__ construct is not used; the definition is then as if it were just "(X)" and so the first invocation results in "(A)".

Now, if there is an additional argument, this is as if it had been defined (X+__VA_TAIL__(MAKE_PAREN)). So there is prefix "(X+" before the tail call and a suffix ")" after it. The result is then nicely appending the prefixes to the left (three times with arguments A, B and C) and stacking the suffixes to the right, here resulting in a sequence of closing parenthesis.

Reversion

#define REVERT(X, ...) __VA_TAIL__()__VA_OPT__(,) X
REVERT(A);
REVERT(A, B);
REVERT(A, B, C);

results in the three lines

A;
B, A;
C, B, A;

Ping pong

The identifier that appears in the __VA_TAIL__ construct needs not to be the same as the name of the macro that is defined. For example here are two macros LEFT and RIGHT that call each other.

#define LEFT(X, ...) X __VA_OPT__( + (__VA_TAIL__(RIGHT)))
#define RIGHT(X, ...) __VA_OPT__((__VA_TAIL__(LEFT)) + ) X
RIGHT(0, 1, 2, 3, 4, 5);

The result is a ping pong pattern where the odd numbers are pushed to the left and the even numbers to the right.

(1+ ((3+ ((5) + 4)) + 2)) + 0;

Invocator macros

The __VA_TAIL__ construct is handled as the before last step of macro replacement, namely after all arguments have been expanded, but before the final expansion of the overall result.

Thus the following where the parameter F of PARENS is used as the argument to __VA_TAIL__ is valid and results to an invocation of whatever name is found after parameters have been expanded.

#define PARENS(F, ...) (__VA_TAIL__(F))
PARENS(REVERT, A, B, C);
__EXPAND__(REVERT PARENS(REVERT, 1, 2, 3));
#define __EXPAND__(…)
expand the arguments
Definition ellipsis-predefined.dirs:18

This leads to:

(C, B, A);
1, 2, 3;

Note that the effect is to apply REVERT to "1, 2, 3" twice, so the result is the same tokens in the initial order.

How this works

A macro that contains this construct has to fulfill the following requirements:

  • The macro must be a variadic functional macro.
  • There must be at least one named parameter.
  • After parameter expansion and insertion the __VA_TAIL__ construct must only appear once.

Also it is recommended that

  • The __VA_TAIL__ construct and the use of __VA_ARGS__ are exclusive.

This is because this would lead to an expansion of __VA_ARGS__ at every recursion level, thus quadratic complexity.

The easiest way to imagine what is going on with this feature is to imagine it as follows.

  • Normal argument replacement (plus stringification and join operations) takes place for all named parameters.
  • If the __VA_TAIL__ nominal is encountered during that expansion, it is passed through.
  • When all replacement is done, the tokens __VA_TAIL__(INDENTIFIER) are replaced as if IDENTIFIER(__VA_ARGS__) is invoked, only that the __VA_ARGS__ and the result are not expanded.
  • The result is spliced into the position where __VA_TAIL__(IDENTIFIER) had been found.

So if there are several nested tail calls the result is a list of prefixes for each argument, followed by a reverted list of suffixes, also possibly depending on the arguments.

Recursive inclusion

This is not an extension as such, in principle even in C as of today, an include file may include itself. Only that it is quite tedious to limit the recursion depth; see this blog post for a proof of concept.

EĿlipsis changes this because it allows to keep track of state and also to do arithmetic that changes its state. For example the generic includes ellipsis-foreach.dirs and ellipsis-do.dirs use this to provide relatively efficient iteration features.

Consider the Fibonacci.h example:

1
28# ifndef FIBONACCI
29/*
30 * This the initial execution that sets up all the macros. They are
31 * created with @ref bind so they will cease to exist, once this upper
32 * level of recursion is terminated.
33 */
34
35# bind FIB0() 0U
36# bind FIB1() 1U
37# bind FIB2()
38
39/*
40 * This macro collects all the values by means of @ref gather,
41 * later. There are special comments, that help to indent (the ones
42 * with a >) and to skip to the next line (the one with the ^).
43 */
44# bind FIBONACCI() /*>*/0U, /*^*//*>*/1U
45
46/*
47 * And then the top level recursive call. We only want to expand the
48 * directives, so we use @ref include_directives, but we also want to
49 * expand the name of the file to this same file so we prefix this
50 * directive with @ref expand.
51 */
52# expand include_directives __FILE__
53
54/*
55 * @brief An array with the first __NARGS__(FIBONACCI()) Fibonacci numbers
56 *
57 * It calls @ref FIBONACCI twice, once for the number of elements
58 * (counted with @ref __NARGS__) and then for the initializer. Most
59 * probably you will see that the integers here are 64 bit wide and
60 * that there are 94 Fibonacci numbers that fit into 64 bit.
61 *
62 * But what do we know about your platform? So we compute the value of
63 * the biggest unsigned number and ask C23's `typeof` to give us the
64 * correct type where all these Fibonacci numbers fit.
65 */
66constexpr typeof(__EXPAND_DEC_U__(-1U)) Fibonacci[__NARGS__(FIBONACCI())] = {
67FIBONACCI()
68};
69/*
70 * Check for possible overflow. We check if the sum of the two current
71 * values would be greater than the biggest unsigned value that the
72 * preprocessor can handle.
73 */
74# elif FIB1() <= ((-1U)-FIB0())
75/*
76 * Set FIB2() to that new value which is the sum of the two current
77 * values. This sum is evaluated and expanded before being assigned to
78 * FIB2().
79 */
80# undef FIB2
81# expand bind FIB2() __EXPAND_DEC_U__(FIB0() + FIB1())
82/* collect the contents of the expansion of FIB2() to the end of
83 FIBONACCI. Here the macro @ref ELLIPSIS_MARK_LINE is either empty
84 (normal operation of `ellipsis`) or adds the line number operation
85 (when using ellipsis-gnuc.sh) */
86# expand gather FIBONACCI ,/*^*/ELLIPSIS_MARK_LINE/*>*/FIB2()
87/* shift of FIB2 → FIB1 → FIB0. Here these macros are not expanded, it
88 is their definitions that are shifted around. */
89# move FIB0 FIB1
90# move FIB1 FIB2
91/*
92 * And then next recursive call. The ideas are the same as for the
93 * first.
94 */
95# expand include_directives __FILE__
96/*
97 * There is no else clause, so recursion stops as soon as the @ref
98 * elif condition is not fulfilled any more.
99 */
100# endif
#define __NARGS__(…)
expand all the arguments and count the number of items in the resulting list
Definition ellipsis-predefined.dirs:86
#define __EXPAND_DEC_U__(…)
expand all the arguments, evaluate the resulting integer expression and produce a decimal literal
Definition ellipsis-predefined.dirs:51

You should be able to process this with something

bin/ellipsis examples/Fibonacci.h

and in a blink of an eye this should print out a code snippet with the firs 94 Fibonacci numbers, similar to the following.

constexpr uint64_t Fibonacci[94] = {
0U,
1U,
1U,
2U,
3U,
5U,
8U,
13U,
21U,
34U,
55U,
89U,
144U,
… a long list of numbers …
4660046610375530309U,
7540113804746346429U,
12200160415121876738U
};

This header uses two #include_directives to include itself recursively, namely by indicating __FILE__ as the include file.

Bug:
The __FILE__ special macro does not yet always work correctly.
  • The first use of #include_directives is in an environment that sets up the macros that are needed to keep track of the process: FIB0, FIB1, FIB2 and FIBONACCI. After that first inclusion, four lines of C that invoke the macro FIBONACCI() produce the output that we have seen above.
  • The second use of #include_directives is inside an #elif block that ensures the termination condition for the recursive inclusion.

The setup phase of the recursive inclusion

During the setup phase we define the macros that are needed during the subsequent recursive calls. All four macros use

  • #bind such that their scope is limited to the surrounding #ifndef clause.
  • a functional macro with no parameters, such that it is later easier to use. The macro will only be expanded if it is followed by (); otherwise the identifier is left as is in the token sequence.

The macros are set to the defining two values of the Fibonacci sequence, namely 0U and 1U for FIB0() and FIB1(), respectively. FIBONACCI() will (essentially) hold the currently known sequence, so the initial value is 0U, 1U, plus some weird comments to beautify the output a bit (see operators above).

Then the source calls itself recursively. Therefore include_directives is prefixed with #expand to make sure that the __FILE__ macro is expanded to the corresponding string.

Note that when returning from that first include_directives directive, FIBONACCI() is used twice. First, it is used within __NARGS__ to compute the number of elements in the list (for me this results in the literal 94) and then to insert the whole list into the output.

When terminating the #ifndef clause, all macros that had been bound with #bind are automatically undefined, so the macro space of the source that includes this recursive one is not polluted.

The recursive part

First, a termination of the recursion is ensured by testing

FIB1() < ((-1U)-FIB0())

this checks if the next value FIB1() + FIB0() would overflow. Since we don't actually want the computation to overflow, ever, we subtract the smaller value FIB0() from the maximal value that eĿlipsis can handle. Note that 1U is an unsigned 1 and -1U, the negative of that, seen as an unsigned, is a very big number. (The preprocessor only has one type of signed and unsigned numbers each, corresponding to the types intmax_t and uintmax_t in C.)

If FIB1() is still less than the bound that eĿlipsis can handle, we do the processing; if it is not, nothing is done and the inclusion of the source ends.

The important computational part of the recursion #bind's FIB2() to a new value, namely FIB0() + FIB1(). To ensure that the value is really computed and stored this uses

__EXPAND_DEC_U__(FIB0() + FIB1())

to expand the sum to a decimal unsigned integer literal. Again the #expand prefix to #bind ensures that all of this is expanded in place and only assigned to FIB2() thereafter. As written

expand bind FIB2() …
__directive__ expand
Expand the remaining tokens on the line and re-interpret the result as a directive.
Definition directives.c:211

would already expand FIB2 due to the following parenthesis; therefore FIB2 is #undef on the preceding line, such that the #bind directive (after expansion) only sees the FIB2 identifier and not a macro.

Then, the expanded #gather directive appends a comma, some comments and the contents of FIB2() that we just computed to the FIBONACCI macro. Follows the update of the FIB0() and FIB1() macros to the new values and down we go into the rabbit hole.

Effinciency

Note that this is a method to compute the Fibonacci numbers with linear cost. In contrast to that, the usual recursion that is often taken as an example in introductory courses uses double recursion by calling itself recursively twice, once for the LHS of the plus and a second time for the RHS.

Here, we only have a single trailing recursion. Because the overhead on each recursion level is constant, the overall cost is linear in the number of Fibonacci numbers that is produced.

So the disadvantage that we still have with eĿlipsis is that such a loop needs recursion (here a depth of 92 on my 64 bit machine). On the other hand each recursion level is relatively cheap, so a reasonable number of iterations (up to some hundred or even thousand) can be easily handled.

Note
The recursion here is much deeper than necessary. The include file ⟨ellipsis-loop.dirs⟩ provides a technique that avoids deep recursion.