shnell
– source-to-source compiler enhancement
Table of Contents
Jens Gustedt
1 Overview
1.0.1 Takeaway
We provide a tool to easily develop and prototype compiler and
language enhancements that can be expressed by source-to-source
transformation of C code. It uses #pragma
directives to
- mark parts or all of a code,
- cut out that code,
- pipe it through a transformation and
- splice it back into the same place.
1.0.2 What we get
Many small but convenient directives are already available, such as
1.0.3 How it is done
It is based on two major tools of POSIX systems:
- shell programming (
sh
)
- regular expression streaming (
sed
)
1.0.4 How it is used
- The existing features can be used in daily programming without knowledge of these tools.
- Filter programs that implement the directives can be written in any other programming language that suits the task:
perl
, python
, java
, C
itself …
2 Introduction
2.1 a simple example
2.1.1 Example: code unrolling …
shnell
performs source-to-source transformations
- identify code ranges with the help of directives.
- Example:
- a declaration of an array
A
double A[] = { #pragma CMOD amend foreach ANIMAL:N = goose dog cat [${ANIMAL}] = 2*${N}, #pragma CMOD done };
2.1.2 … and its replacement
- the
#pragma
ensure that the inner line is copied three times - "meta-variable"
${ANIMAL}
iterates over
"goose
", "dog
" and "cat
"
${N}
holds the number of the current copy, starting at0
double A[] = { [goose] = 2*0, [dog] = 2*1, [cat] = 2*2, };
2.1.3 stringification
- A similar code can use the "stringified" parameters
char const* names[] = { #pragma CMOD amend foreach ANIMAL = goose dog cat [${ANIMAL}] = #${ANIMAL}, #pragma CMOD done };
- results in
char const* names[] = { [goose] = "goose", [dog] = "dog", [cat] = "cat", };
2.2 general approach
In the general case such a directive is identified
2.3 identifier im- and export
2.3.1 Example: identifier export
…
#pragma CMOD amend export string : funny string /* secret has no linkage, not exported */ enum secret { /* but size and length become external names */ string::size = 32, string::length = size-1, }; struct string { char s[length]; }; /* external linkage! */ _Thread_local string myown; /* INIT becomes an external name */ #define string::INIT(S) { .s = S, } #define string::EMPTY ((string)INIT(""))
2.3.2 … its replacement as a .c
…
/* secret has no linkage, not exported */ enum secret { /* but size and length become external names */ funny_string_size = 32, funny_string_length = funny_string_size-1, }; typedef enum secret secret; /* convenience */ typedef struct funny_string funny_string; /* convenience */ struct funny_string { char s[funny_string_length]; }; /* external linkage! */ _Thread_local funny_string funny_string_myown; #define funny_string_INIT(S) { .s = S, } #define funny_string_EMPTY ((funny_string)funny_string_INIT(""))
2.3.3 … and the view by others (.h
) …
/* secret has no linkage, not exported */ enum some_unguessable_name_for_secret { /* but size and length become external names */ funny_string_size = 32, funny_string_length = funny_string_size-1, }; typedef struct funny_string funny_string; /* convenience */ struct funny_string { char s[funny_string_length]; }; /* external linkage! */ extern _Thread_local funny_string funny_string_myown; #define funny_string_INIT(S) { .s = S, } #define funny_string_EMPTY ((funny_string)funny_string_INIT(""))
2.3.4 Example: implicit
import …
#pragma CMOD amend implicit : funny string int main(int argc, char* argv[argc+1]) { string* myownp = &string::myown; string little = string::INIT("little"); stdc::printf("my string is %s\n", little.s); }
2.3.5 … and replaced.
#include "stdc.h" #include "funny-string.h" int main(int argc, char* argv[argc+1]) { funny_string* myownp = &funny_string_myown; funny_string little = funny_string_INIT("little"); printf("my string is %s\n", little.s); }
3 Command line tools
3.0.1 shnell
- The central tool is
shnell
:- reads a C file
- processes it
- dumps the result to
stdout
.
- If you want to keep track of the intermediate code, this would be your tool of choice.
- Easy to integrate into a compilation chain by providing
make
rules
3.0.2 executable dialects
To avoid
- to keep track of the modified sources
- to apply the same set of directives to each source file
shnl
files- that group directives together and create
something like dialects of the C language, see
load
3.0.3 example: using trade
- Example:
- apply the
TRADE
policy to a source filetoto.c
during compilation
trade gcc -Wall -c -O3 -march=native toto.c
- we prefix the compiler command line by the command "
trade
".- This filters the file name and task from the command line.
- It performs the source-to-source rewriting.
- It compiles the result to an object file
toto.o
.
3.0.4 example: using trade
- Similarly, without
-c
trade gcc -Wall -O3 -march=native toto.c mind.o mund.o
- Takes the first source (
toto.c
) - Does all of the above.
- Links all the objects into an executable "
toto
" if possible.
3.0.5 example: using trade
- If there are only
.o
files:
trade gcc -Wall -O3 -march=native toto.o mind.o mund.o
- Only the linker phase is performed.
4 Directives, how the C programmer sees them
4.1 amend
, insert
and load
directives
4.1.1 scans
- Directives are found by one or several scans that
shnell
performs on a C source.
- Directives come in three different flavors, that can induce several such scans
4.1.2 amend
- The scope is up to
- the next (nesting) "
done
" directive or - the end of the source file.
- the next (nesting) "
- The code is piped into the command.
- The result is inserted in place.
- The command also receives the argument list of the directive over some side channel.
amend
directives typically modify the code
- Example:
foreach
as above
- the code is repeated several times
- each copy is modified by resolving the meta-variables
4.1.3 insert
- Has no scope.
- The command does not receive input.
- It only receives the arguments.
- The result is inserted in place.
- Scan of the source file then continues directly after.
- The inserted code has no influence on
shnell
for this scan. insert
directive typically just puts some declarations or definitions in place.
- Example:
enum
directive
- Defines an enumeration type and some depending functions.
#pragma CMOD insert enum animal = goose dog cat
4.1.4 load
- Inserts a set of directives that are found in
shnl
files.shnl
files condensate complicated patterns
- Scanning continues from the top of the inserted lines.
- They may contain several
amend
directives that are not terminated. - Possibly multiple scans of the whole source file.
- Example:
CONSTEXPR
directive
- Allows to have several nested "evaluations" of variables.
- The result are expressions that are evaluated at compile time.
4.1.5 recursion
- Nested occurrence of
amend
directives leads to finite recursion.
- Example:
- Two nested
do
directives:
double A[3][2] = { #pragma CMOD amend do I = 3 [${I}] = { #pragma CMOD amend do J = 2 [${J}] = 2*${I} + ${J}, #pragma CMOD done }, #pragma CMOD done };
- start collecting the code
SI
immediately after the firstdo
- when collecting
SI
, the seconddo
directive is encountered collection of the code
SJ
starting after that seconddo
is started, until the firstdone
is encountered.SJ
now has:[${J}] = 2*${I} + ${J},
SJ
is fed into thedo
directive for variableJ
and value2
.The
do
directive replicatesSJ
twice and replaces all occurences of${J}
by0
and1
, respectively, to obtain a codeTJ
:[0] = 2*${I} + 0, [1] = 2*${I} + 1,
TJ
is inserted intoSI
instead of the directive, resulting in a replaced codeRI
.The scan for the first directive is continued until the second
done
is encountered, resulting in a codeQI
:[${I}] = { [0] = 2*${I} + 0, [1] = 2*${I} + 1, },
QI
is fed into thedo
directive for variableI
and value3
.The
do
directive replicatesQI
three times and replaces all occurences of${I}
by0
,1
, and2
, respectively, to obtain a codeTI
:[0] = { [0] = 2*0 + 0, [1] = 2*0 + 1, }, [1] = { [0] = 2*1 + 0, [1] = 2*1 + 1, }, [2] = { [0] = 2*2 + 0, [1] = 2*2 + 1, },
TI
is then inserted in place of the whole#pragma
construct.
So after completion of the inner directive, after step 5, the code as if we had written:
double A[3][2] = { #pragma CMOD amend do I = 3 [${I}] = { [0] = 2*${I} + 0, [1] = 2*${I} + 1, }, #pragma CMOD done };
Only then the outer directive is applied and the over all result after step 10 is
double A[3][2] = { [0] = { [0] = 2*0 + 0, [1] = 2*0 + 1, }, [1] = { [0] = 2*1 + 0, [1] = 2*1 + 1, }, [2] = { [0] = 2*2 + 0, [1] = 2*2 + 1, }, };
4.2 arguments to directives
4.2.1 meta-variables
- Several constructs use meta-variables of the form
${NAME}
. - These are replaced in the processed source with their values.
- The replacement can be modified with
#
and##
operators, similar to what happens in the C preprocessor.#
is "stringification" and##
merges to tokens to the left or to the right.
- For a more detailed discussion have a look into "
regVar
".
4.3 amend.cfg
4.3.1 amend.cfg
a list of approved directives
5 Directives, how the implementor sees them
5.0.1 sh(n)ell programming
- A directive
- receives the code that it has to treat on
stdin
and - sends the modified code to
stdout
.
- receives the code that it has to treat on
- The surrounding tasks of
- cutting the code out of context and
- reinserting the result in place is done by
shnell
.
5.0.2 tokenization
When using the shnell
executable on a given C code prog.c
,
- a tokenizer splits the program into tokens as defined by the C language.
- All intermediate tools see isolated tokens
- identifiers
- keywords
- numbers
- punctuators
- strings
- comments
- … and a lot of control characters and white space
- no further lexical analysis required
5.0.3 detokenization
At the end, after all transformation have been applied
- the tokenization is reverted
- the original line and spacing structure reappears.
- reader friendly:
- keep or inspect intermediate steps of a transformation.
- compiler friendly:
- code is annotated with
#line
directives - trace errors back to the source file.
- code is annotated with
5.0.4 Shell modules
The whole toolbox is by itself constructed from reusable pieces:
- regexp matching (
match
), - temporary files with garbage collection,
- split and join,
- hash tables and
- … many more.
5.0.5 import.sh
This has to be explicitly sourced with a magic line:
SRC="$_" . "${0%%/${0##*/}}/import.sh"
And then other shell modules can be imported as this:
import arguments import tmpd import tokenize import match
5.0.6 … with documentation
terminate such an import and documentation section by a line
endPreamble $*
If run by itself as an executable script with an option --html
will then extract documentation for the module.
6 Install and usage
6.0.1 Compilation
shnell
is almost entirely implemented in script languages- only one tiny bit still needs compilation,
bin/isatty.so
- If you also want the optional complete Unicode support you should
also compile the
tools-c/
directory. - To additionally test
shnell
compile the code incomplements/
All theses steps can be launched by
make -j
Nwhere N is the number of cores of your systems or less.
6.0.2 Installation
shnell
can be used from anywhere where it is installed- The scripts locate the directory in which they reside and look
for other components (
shnl/
,legacy/
andtools-c/
) relative to that. - For example, to operate from
/usr/local
- /usr/local/bin
- /usr/local/shnl
- /usr/local/legacy
- /usr/local/tools-c
- (optional, binaries only)
- copy the corresponding contents, there.
6.0.3 Usage
Any of the following should work
- Use an absolute path name to refer to the tool that you are using.
- Add the
bin/
directory to yourPATH
environment variable. - Use a system-wide place to install the binaries and other directories as indicated above.
6.0.4 Development of directives
- Development of directives should take place in your private copy
- To add a new directive
- write your filter
- install your filter in
bin/
- add your filter to
amend.cfg
- To add a new executable dialect
- test your idea in a example
- move the
#pragma
you need to the top of the example - write an
.shnl
file (e.gTOTO.shnl
) that comprises these#pragma
- copy that
.shnl
file toshnl/
- in
bin/
, establish a softlinktoto
→shneller
- Share your developments with others if you may!
6.0.5 Copyright, license and distribution
- Copyright © 2015-2020 Jens Gustedt
- The shnell project is licensed under a standard MIT license
---------------------------------------------------------------------- Copyright © 2015-2020 Jens Gustedt Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ----------------------------------------------------------------------
- This work is distributed at
https://gustedt.gitlabpages.inria.fr/shnell/
- The sources can be found at