Modern compilers and tool chains are sometimes confronted with very large projects that may demand important compilation times. But compared to the overall processing times, preprocessing should have a neglectable share. Nevertheless, preprocessing as much as any other computing effort should not waste resources.
To stay within reasonable bounds, before even discussing implementation aspects, we have to look at the inherent complexity of preprocessing. Because if we don't take care about complexity, any effort to keep preprocessing scalable would be doomed. A good measure for complexity here is IO cost, namely the amount of data that is read In
(basically the combined size of the source files) and written Out
(the site of the produced token stream).
Preprocessing should not take more than
O(In + 0ut)
resources of any kind.
In particular, it should not take more than linear time (total CPU or wall clock), memory or power.
Standard C macros are not well-equipped for linear processing. This is due to a combination of two features that dominate macro processing of unbounded argument lists:
With C23, C gains a conditional macro operator, __VA_OPT__(XYZ)
. It only expands to the tokens that are given in the parenthesis, here XYZ
, if the variable argument list of the invoking macro is not empty. It has been proposed to use this mechanism in a future version of C (and C++) for recursion. EĿlipsis has tools to emulate such a feature by defining stacks of macros, see ellipsis-recursive.dirs, so you may experiment if this would be an suitable tool for you.
With the proposed syntax, a typical recursive C macro could look similar to the following.
When invoked, this has the following effects:
X
is taken out of the argument list and stringified.RECURSIVE
.In the example, this would take a comma-separated list of words (say for example the constants of an enumeration type) and produce a list of strings that contain the names. This could e.g be convenient for a print function for the enumeration type.
But other than the compact notation suggests, this call mechanism has a high complexity: for each such a call the whole argument list is expanded. This is because that argument expansion is performed before any other token of the replacement list is interpreted. In particular, the closing parenthesis that marks the end of the invocation of RECURSIVE
is only seen once all tokens in __VA_ARGS__
have been expanded.
Indeed, if this macro receives a list of N
elements, it alone (without accounting for the recursive call) will touch 2N +1
tokens, namely the list elements themselves and the commas between them. So, in total, this recursive approach will touch 2N+1
, ... , 5
, 3
, 1
= N²
tokens.
Lets have a look at what could be typical lists that we might want to handle:
elements | touched tokens | ||
---|---|---|---|
small | 10 | 100 | ok |
Fibonacci numbers < 2⁶⁴ | 100 | 10,000 | ~ |
XML entities | 2200 | 4,840,000 | limit |
Unicode character values | 150,000 | 22,500,000,000 | intractable |
For a small enumeration, this is certainly a suitable approach. For a list such as the Fibonacci numbers it is certainly tractable, but recursion already shows a substantial overhead. Treating all XML entities would already be challenging for the recursion itself (2200!) but the complexity of touching more that 4 million tokens is clearly disproportionate. Handling lists of Unicode characters becomes prohibitive for the recursion depth itself and completely intractable from a POV of complexity.
With the current expansion rule in place, C macro recursion is not an adequate tool to process long lists.
So unless we invent a tool that avoids list expansion on all levels, recursion is not the right tool to process even moderately long lists.
Besides macro recursion, include recursion is another approach to perform iterative tasks. EĿlipsis has several extensions that ensure linearity for these, and, that ensure that the recursion stays bounded to some reasonable depth of less than 10, say.
The first of these tools is for iteration with arithmetic. A predefined header file ellipsis-do.dirs and predefined operators such as __EXPAND_DEC__
provide domain iteration within linear time.
Processing argument list (such as all constants of an enumeration) would have the same quadratic complexity as recursion, simply because each recursion level again would have linear cost im the six of the lists that are processed. To face this difficulty eĿlipsis proposes directives and builtins as extensions:
__ANY__(ID)
checks if identifier ID
is either not a macro or if it would expand to a non-empty token list, if it would be expanded.With these extension directives, recursion can be replaced by iteration that has a cost of 0(1)
per element, and thus have an overall cost that is linear. For example, when using a tool such as ellipsis-loop.dirs, the above could be coded by putting something like
in a file named by the macro LOOP_DIRECTIVES
. Since none of these directives expands LIST
, processing even a long list is still linear.
The new C23 feature __VA_OPT__
has the advantage that it allows to query the __VA_ARGS__
list without expanding it. So it is a first tool that allows to stay with a linear complexity as long as __VA_ARGS__
is not passed on into other macros.
We provide an extension that builds upon that idea, __VA_TAIL__
. A call of the form __VA_TAIL__(ID)
(or __VA_TAIL__()
) makes a tail call into the macro ID
and passes the __VA_ARGS__
list to it without further expansion. A macro that uses this must implement the following constraints.
__VA_TAIL__
construct is only inspected at the very end of processing of the macro call. Then, it is expected to have been resolved to the valid forms __VA_TAIL__(ID)
, where ID
is the name of a functional macro, or __VA_TAIL__()
.This ensures that when such a tail recursion goes deeper, the variable argument list gets shorter and shorter as the initial arguments are assigned to the named parameters along with the calls. Once the variable argument list is empty the bottom of the recursion is reached and that last __VA_TAIL__
construct is ignored.
The
__VA_TAIL__
constructs guarantees linear complexity for macro calls (including recursion) in the size of the input list.
Besides ensuring the overall linear complexity of preprocessing, our implementation of eĿlipsis tries to be efficient where this is possible, such that constant factors to the cost of processing are reduced. Preliminary test have shown that eĿlipsis is about 5 to 10 times slower than for example the preprocessing phase of gcc
. In view of what is said above (preprocessing contributing only with a small fraction to the overall cost of compilation) we this is adequate for such a young project.
If and when eĿlipsis becomes a bottleneck in some development process, we'd certainly have to look into the implementation for possible improvement. Among the places to look for are
malloc
and realloc
for the allocation of tokens (see ellipsis‿token) and strings (mostly ellipsis‿str32). If this shows to be a bottleneck, a better application level recycling of such items could be considered.