One of the goals is to show that modern C can be used to efficiently implement a tool such as preprocessing without compromising much on safety. Efficient here is in a relatively broad sense:
Moving to a parallel build improves this to 7 and 1 – 2 seconds, respectively.
EĿlipsis uses Unicode as its base and provides means to use real Unicode more widely even in the source code that it processes.
Based on that, we allow a use of Unicode that is more comfortable and aligned with use in languages in general. For example we use normal characters for features that are represented in Unicode such as ≤
, ∨
, ¬
, →
or …
. Once you get used to these, you'll probably ask how you were able to tolerate the crude digraph replacements <=
, ||
, !
, ->
or ...
for so long.
If you are still living in the last millennium and don't know how to configure your keyboard to work comfortably with such characters (an argument that I hear more often than you think) you may still use the old digraphs. Nobody will be forced.
Also in our own source we use characters such as π
or φ
to name features that are usually noted with these characters in math.
When producing intermediate C sources that are to be compiled by traditional compilers, we translate the punctuators as mentioned above back to the digraphs. We don't do a similar translation of code points that appear in identifiers, though; we suppose that compilers that support C23 are able to deal with these in one way or another. It would probably not be difficult to add an option to eĿlipsis that would translate these to the ugly \u
or \U
forms (such as \u03c0
for π
), though. If this tempts you as a side project, be my guest.
C23 offers a new feature where you can fix the base type of an enumeration type. For example we have
meaning that constants, variables or members of ellipsis‿category only take up one byte, and that enumeration constants of type ellipsis‿fibfac always behave like a size_t
.
Additionally we have an include feature ellipsis-enum-xcode.h that can be used to wrap declarations of enumerations in a wrapper that adds some basic functionality.
Before C23, named constants could only be defined for integer types (not wider than int
) by abusing the enum
feature. With the new constexpr feature we are able to define compile time constants of any type. For example
defines a compile time constant of type long double const
for the golden ratio φ
. These then are not only compile time constants, but also objects, so here taking the address such as in &φ
is possible. Regardless of the context where the above declaration is found the static
storage specifier has it that the storage duration is static.
C23 has a new feature set called attributes. There are not yet many standard attributes defined in C23 and we only use one of these repeatedly, [[deprecated]]
.
As the name indicates this attribute marks a declaration to which it is applied, and issues a warning if you do use it anyhow. We use this to mark structure members that are considered internal and which a user should not access.
This is for example used for the dictionary structure ellipsis‿token‿dictionary, see the source of ellipsis-tdict.h and ellipsis-tdict.c. This has the following properties
There are new keywords for central language features.
nullptr
avoids to use bizarrely typed features such as NULL
, or even more weirdly 0
, as initializers for pointers.false
, true
and the type bool
finally give a satisfactory interface to a Boolean typethread_local
provides variables with thread storage duration.Programming with variadic macros becomes much easier with the __VA_OPT__
feature that provides a simple in-macro conditional.
There are three new features in C23, typeof
, typeof_unqual
and auto
that can be used to infer a type from an expression and ensure that two features automatically have consistent types, regardless of type changes that may occur at some distant code location. This is primarily used for the generic code that is found in the sources/generate directory.
Since several revisions, C has the possibility to use [static]
to describe function parameters that point to an array object with a minimal number of elements. Such parameters may then be assumed to be non-null. Only recently static analysers that are built into compilers have been made capable to take this information into account and to give useful feedback to the programmer.
We use this feature throughout and check for all warnings that for example gcc produces. This has found a lot of problematic places where the API design had to be clarified, namely where it had to be decided if a function interface might receive a null pointer or not. So now all pointer parameters that are not supposed to receive a null pointer use [static]
. Those that may, use *
notation. In all cases where the analyzer is not capable that an argument cannot be guaranteed to be non-null, a null-pointer check is inserted that terminates execution if it triggers.
Flexible array members are not new to C23, but probably not well exploited to the potential they have. We provide a generic interface for such arrays, see also Generic programming with XFiles, below, that is efficient and hopefully easy to use.
A Fibonacci hash functions uses an approximation of the golden ratio φ
to spread consecutive hash keys (such as "variable123" and "variable124") uniformly over a hash array. This function has well understood properties and can be implemented quite efficiently. The only real constraint that has to be ensured is that the integer approximation φ₀
of the golden ratio φ
is co-prime to n
the size of the hash array. We ensure that property for a new hash array by testing values successively until this condition holds.
Such a hash array is then the basis for our dictionary type ellipsis‿token‿dictionary. Here an appropriate Fibonacci hash factor is recomputed each time that the dictionary is resized.
eĿlipsis has a relatively complicated chain of dependencies between different translation units. Namely the support for different languages and different parts of the lexer is initialized dynamically. According to the chosen language more and more features are added to global arrays that hold the strings that are recognized as puntuators or specials.
Since this is dynamic and eĿlipsis is mildly multi-threaded, we have to ensure that the initialization happens only once (see ONCE_DEFINE) and that there are no memory leaks. The tools for such a consistent initialization are provided by ellipsis-once.h.
As a result eĿlipsis has a relatively complicated dependency between translation units for initialization. This is because different units initialize global data such as keywords, token names or punctuators dynamically at startup. These dependencies are handled with a dependency mechanism as described above.
Here rectangular boxes correspond to identified initialization features. These are colored red if they use ONCE_DEFINE_STRONG
, black otherwise.
The implementation of eĿlipsis reuses an old technique of generic programming in C, best named "XFiles". It consists in including a specific include file that is parameterized with some macros. E.g an include file "my_fa_struct_xfile.c"
to define a structure with a flexible array member (FA) could contain code that is parameterized by a type BASE_TYPE
and a name FA_TYPE
. It would then be included such as in
In fact, eĿlipsis itself proposes several extensions that help programming with XFiles. For example the above would typically be coded with eĿlipsis as
Here, using the prefix attribute with bind (instead of define) ensures that the macro definitions are only active during the inclusion. The undef from above are no more necessary. include_source (instead of include) inhibits the expansion of the line; thereby arguments to the prefix attributes are not expanded and are used for the binding as is.
To be able to bootstrap the compilation of eĿlipsis, the sources are organized in two levels. the "normal" 1st-order C sources are already partially expanded, such that you may compile eĿlipsis with any modern C compiler. But these C sources are themselves produced by eĿlipsis from 2nd-order sources that contain special directives for eĿlipsis. Once eĿlipsis is operational on a new machine, processing these 2nd-order sources should produce exactly the same 1st-order sources; git status
should not show any differences.