part of shnell – a source to source compiler enhancement tool
© Jens Gustedt, 2019
A tool to introduce prefix naming conventions.
Usage:
with the following definitions:
LPREFIX
is the local name prefix. All identifiers that are prefixed withLPREFIX::
in their declarations (or that are just thisLPREFIX
) are considered to have external linking, unless they arestatic
objects or functions or they are explicitly made private, see below. (Example:string::length
for the combination ofstring
with::
andlength
.)PREFIXx
are identifiers that are used to compose the external name prefix.
All of these have suitable defaults and this directive is best used indirectly together with the implicit
directive through the TRADE
dialect or the trade
compiler prefix.
Defaults
None of the identifiers used above must be strictly reserved, that is none of
LPREFIX PREFIX0 PREFIX1 ...
shall start with an underscore that is followed by a capital letter
or a second underscore.
Any of the two parts above may be empty:
If expanded in a scan prior to this one, the
alias
directive can be used to setLPREFIX
.If there is no
PREFIXx
, the filename is split at-
characters.If the filename is not known,
LPREFIX
is used.If
LPREFIX
is empty, the last component ofPREFIX0 PREFIX1 ...
is used.
So if neither LPREFIX
or PREFIXx
are given (directly or via alias
) the filename must be known and is used to determine all naming conventions.
Linkage of identifiers
Not concerned by this tool are objects, functions and types that are just used (including tag names) but not declared or defined. In particular types that are just forward declared (such as in struct toto;
) . For them we suppose that naming issues are taken care of by whichever TU defines them.
There are several categories of local identifiers, that is identifiers that are defined in the current TU:
0 Some globally exposed identifiers without linkage are unchanged by this tool. Per default these are
struct
orunion
members, function parameters andstruct
,union
orenum
tags that are only declared but not defined. If you want to bless them with linkage, you’d have to use one of the methods above to force it.1 Identifiers with internal linkage. Per default these are all global
static
objects or functions. Identifiers can be added to that by using theprivate
directive.2 Identifiers with external linkage. Per default these are all declared global objects and functions that are not
static
. To that are added all other identifiers for which short or long identifier variants (see below) are used.3 If not made external by one of the rules above,
typedef
names, macros and enumeration constants have internal linkage. The same holds forstruct
,union
andenum
tags for types that are defined within this TU.
If an identifier could have internal or external linkage by (1) or (2), internal linkage prevails. This is so that you may make an identifier private that otherwise would have external linkage. To simplify the use of such an internal identifer you are still able to use the short form, it is rewritten to the internal form.
All identifiers that have internal linkage by these rules are “private”, those that have external linkage are “public”. Those with no linkage in (0) are in a gray zone, but since we force privacy on all macros, there should be no bad interaction between foreign macros and such local identifiers.
Composite identifiers
There are several types of identifiers that are dealt by this tool:
Local identifiers are usual global C identifiers that are defined within this TU. They have no linkage, if none of the above provided some.
Without linkage can only be members or function parameters. Variables, functions, macros, types and enumeration constants always have linkage.
Short identifiers are composed of two parts,
LPREFIX
and a local identifierID
, that are joined by the::
. Declaring a local identifier that is notstatic
or explicitly private with such a form automatically elevates it to have external linkage in the whole TU. If you prefix, e.g, an enumeration constant in its definition with the local prefix, it becomes globally visible and usable by others without creating naming conflicts. For the whole TU, the local identifierID
can be used instead of the short identifierLPREFIX::ID
, and is replaced by it accordingly. Then, for the output of this tool, all short identifiers are replaced by the corresponding long identifiers.Long identifiers come in several flavors, for external linkage, internal linkage, and may be different as used inside the code and as they are visible to the outside. Seen from inside the TU, for external linkage an
ID
is prefixed with the listPREFIX0
,PREFIX1
… and the parts are joined with::
. As for short idenfiers this form in the declaration of the identifiers makesID
an identifier with external linkage, and the short and local forms can be used interchangeably. To make an external symbol from a long identifier,::
is replaced bySHNELL_SEPARATOR
and, unlessSHNELL_SEPARATOR
is “::
”, this is how long identifiers are seen by the outside. E.g ifSHNELL_SEPARATOR
is_
,test::string::length
is presented to the outside astest_string_length
.If
SHNELL_SEPARATOR
is in itself “::
”, the name is mangled in a strategy similar to C++, see below.There are also long identifiers to present internal linkage to the outside. The rules are similar as for those with external linkage, only that things are added to the list of prefixes, such that the long identifier becomes obfuscated. You may use these identifiers only with their short or local form.
A unique choice of prefixes for the TU within the same project guaranties that such an identifier can never clash with another one in the same project.
As mentionned, the separator that will be used for joining external name components is SHNELL_SEPARATOR
. In the case of the special token ::
the output names are also mangled. Mangling is done according to the mangle
shnell-module. In particular, the prefix manglePrefix
is used from there. There are special conventions:
::
C++ mangling_
snail_case_identifiersC
camelCaseIdentifiersP
PascalCaseIdentifiers
Evidently you’d have to be careful that your identifiers fit with the convention you are using. For C
and P
only components that start with a lower case letter and do not contain an underscore are transformed. In particular, components that start with an underscore are left alone.
This defaults to “::
”, but can be set by the corresponding environment variable.
export SHNELL_SEPARATOR="${SHNELL_SEPARATOR:-::}"
main
is special
As for traditional C, the entry point main
can be treated specially. If you name a function stdc::main
(which you will probably do if you also use implicit
) the following strategy is applied:
Your function is still compiled as
main
(without prefix) such that the compiler may apply the special treatment that is reserved for that. E.g not returning a value from the function is not an error.The function symbol with the local name
main
is made “weak” such that it does not clash with similar functions from other TU that may be linked together.A public symbol with the long name for
main
is aliased to your function.The header that is produced for the TU has an entry for that long name.
All of this allows you to have one entry point per TU without creating conflicts. This is particularly useful to implement a test program for your TU in place.
Header files
To determine the identifiers that are defined in the program, a header file is produced and stored in a directory named by the SHNELL_INCLUDE
environment variable, if any:
Objects or functions that are declared or defined
static
are suppressed, unless they arestatic
inline
functions.Function bodies of functions that are not
inline
are removed and replaced by a;
that terminates the declaration.Function bodies of functions that are
inline
are kept, so they appear in the header and in the output. In the output theinline
keyword is removed such that when compiling the output, the function is instantiated.Initializers for objects are removed, such that only the declaration remains.
All function and object declarations that are not
static
are prefixed withextern
. Whereas this would be the default for functions anyhow, for objects this would otherwise create “tentative definitions”, that could provoque linkage problems.All defined identifiers (functions, objects, macros, type tags, typenames, enumeration constants) are renamed to their long form in the whole header.
If the code defines a
stdc::main
function, the declaration of it is removed and replaced by an equivalent one using the long name formain
.The header is protected by an include guard.
The header file is only replaced if it had changes.
Examples:
Without mangling and a simple universal prefix
SHNELL_SEPARATOR
equal to “_
” and
This defines a TU where the naming convention is independent of the filename. All global, non-static
, variable and function names are externally visible to have a string_
prefix, if they don’t have one, yet. If in addition, we have three identifiers (EMPTY
, INIT
, and GET
) that are forced to be public (e.g by using string::EMPTY
, string::INIT
, and string::GET
in their definition) they have external names (string_EMPTY
, string_INIT
, and string_GET
) but within this TU the local names (EMPTY
, INIT
, and GET
) or short names may be used just the same.
In addition to all statically defined identifiers there is the identifier b
that is forced to be internal. Again, within this TU it can be accessed as string::b
or b
, but externally it has a name that is something weird, hidden by name obfuscation.
One additional identifier is special, string
itself. With the setting as described here, it is left alone. This identifier should generally be reserved for the principal feature of this TU, such as the central data type or function that is defined in the TU. It is always external.
Without mangling and a short and long prefix
SHNELL_SEPARATOR
equal to “_
” and
Again, this defines a TU where the naming convention is independent of the filename. All global, non-static
, variable and function names are augmented to interally have a strong::type::
prefix and externally to have a strong_type_
prefix, if they don’t have one, yet. The three identifiers (EMPTY
, INIT
, and GET
) as above, are forced to have external names, so strong_type_EMPTY
, strong_type_INIT
, and strong_type_GET
, but within this TU the long names (strong::type::EMPTY
, strong::type::INIT
, and strong::type::GET
), short names (string::EMPTY
, string::INIT
, and string::GET
) and local names (EMPTY
, INIT
, and GET
) may be used just the same.
As above, in addition to all statically defined identifiers there is the identifier b
that is forced to be internal. Again, within this TU it can be accessed as string::b
or b
, but externally it has a name that is something weird, but distinguishable from the other weird form that b
would have in the previous setting.
Again, string
itself is special. Within the TU, it is left alone, but to the outside it is visible as strong::type
, and that name can also be used internally just as string
.
This naming scheme can also be made dependent on the source filename. If that would be strong-type.c
, the strong type
part above could be omitted.
With mangling
If SHNELL_SEPARATOR
is equal to “::
” the internal names are exactly the same as above. The outside visible forms are mangled by using the PREFIXx
components. For the strong type
example this would result in something like _ZN2_C6strong4type5EMPTYE
, _ZN2_C6strong4type4INITE
, and _ZN2_C6strong4type3GETE
, but you should not care much. Within this TU the short, long and local names (EMPTY
, INIT
, and GET
) may be used just the same as above.
Implementation considerations
We use sorting to have unique lists of symbols. Therefore we must ensure that the collating sequence of the locale is ignored.
export LC_COLLATE=C
SORT="${SORT:-sort}"
SORTUNIQ="${SORTUNIQ:--u}"
Coding and configuration
The following code is needed to enable the sh-module framework.SRC="$_" . "${0%%/${0##*/}}/import.sh"
Imports
The following sh
-modules are imported:
Details