part of shnell – a source to source compiler enhancement tool
© Jens Gustedt, 2019
A tool to extract a header from a C file
Usage: header [IDS] [EXT] [STAT]
Where the C source file is read from stdin
and the header is dumped to stdout
. The intent is to have the same line structure in the header than in the source, such that error indications of compilers may be tracked to the orginal source.
IDS
, EXT
and STAT
are optional file names to which lists of the declared idenfiers are written:
– IDS
are all identifiers without linkage,
typedef
namestag names of declared
struct
,union
andenum
types,enumeration constants
macro names
– EXT
are all identifiers with external linkage, that is global objects and functions that have not been declared static
– STAT
are all identifiers with internal linkage, that is global objects and functions that have been declared static
Procedure
We will be treating declarations (including definitions) as we go by collecting data about them in category
, storage
and id
.
The category
is only used when we have met a definition with {}
. It will be struct
, union
or enum
when defining such a construct, and “function” if we are defining a function. For struct
, union
or enum
the definition will be emitted, for functions (unless they are inline
, see below) the function body is suppressed.
If the category is enum
the definition is in addition scanned for enumeration constants that are then classified to have no predefined linkage.
category=
storage
will hold the storage specifiers (in a broad sense) that we encountered. If it contains static
we assume that this defines an identifier with internal linkage. If it has typedef
it has no predefined linkage. If it contains inline
, the function body is emitted.
Note that in C, parameter lists may also may contain the keyword static
, but this use is detected by the nesting computation, see below.
storage=
id
holds all the magic of C declarations. It is used to cumulate information of the type in its first word. For basic integer or floating types this first word will cumulate a comma separated list of keywords such as long,int
or _Complex,double
(no spaces!). For all other types it will hold the typedef
or tag name, e.g size_t
or string
if the declaration had struct string
. When collecting this information, all qualifiers are simply ignored.
For continuation declarations such as variables that are defined together with a struct
or union
definitions, or comma-separated declarations the first word will just be continuation
.
The second word of id
is then the next identifier in the declaration that is found after this. In C this is always the declared identifier, regardless of the punctuation structure surrounding it.
id=
We have to distinguish two different types of commas in a declaration, commas that separate function arguments and commas that are continuation declarations. The first are always found inside parenthesis, so we count the level of nestedness to distinguish these two situations.
This also helps to decide if the keyword static
is a storage specifier or decorates an array parameter of a function.
nest=0
SRC="$_" . "${0%%/${0##*/}}/import.sh"
import echo
import tokenize
import tmpd
import ballanced
Declared functions
processInit
: Ignore initializers for the header
This dumps all the source to /dev/null
until we find a terminating semicolon or comma.