header

header

part of shnell – a source to source compiler enhancement tool

© Jens Gustedt, 2019

A tool to extract a header from a C file

Usage: header [IDS] [EXT] [STAT]

Where the C source file is read from stdin and the header is dumped to stdout. The intent is to have the same line structure in the header than in the source, such that error indications of compilers may be tracked to the orginal source.

IDS, EXT and STAT are optional file names to which lists of the declared idenfiers are written:

IDS are all identifiers without linkage,

EXT are all identifiers with external linkage, that is global objects and functions that have not been declared static

STAT are all identifiers with internal linkage, that is global objects and functions that have been declared static

Procedure

We will be treating declarations (including definitions) as we go by collecting data about them in category, storage and id.

The category is only used when we have met a definition with {}. It will be struct, union or enum when defining such a construct, and “function” if we are defining a function. For struct, union or enum the definition will be emitted, for functions (unless they are inline, see below) the function body is suppressed.

If the category is enum the definition is in addition scanned for enumeration constants that are then classified to have no predefined linkage.

category=

storage will hold the storage specifiers (in a broad sense) that we encountered. If it contains static we assume that this defines an identifier with internal linkage. If it has typedef it has no predefined linkage. If it contains inline, the function body is emitted.

Note that in C, parameter lists may also may contain the keyword static, but this use is detected by the nesting computation, see below.

storage=

id holds all the magic of C declarations. It is used to cumulate information of the type in its first word. For basic integer or floating types this first word will cumulate a comma separated list of keywords such as long,int or _Complex,double (no spaces!). For all other types it will hold the typedef or tag name, e.g size_t or string if the declaration had struct string. When collecting this information, all qualifiers are simply ignored.

For continuation declarations such as variables that are defined together with a struct or union definitions, or comma-separated declarations the first word will just be continuation.

The second word of id is then the next identifier in the declaration that is found after this. In C this is always the declared identifier, regardless of the punctuation structure surrounding it.

id=

We have to distinguish two different types of commas in a declaration, commas that separate function arguments and commas that are continuation declarations. The first are always found inside parenthesis, so we count the level of nestedness to distinguish these two situations.

This also helps to decide if the keyword static is a storage specifier or decorates an array parameter of a function.

nest=0

SRC="$_" . "${0%%/${0##*/}}/import.sh"

import echo
import tokenize
import tmpd
import ballanced

Declared functions

processInit: Ignore initializers for the header

This dumps all the source to /dev/null until we find a terminating semicolon or comma.

getId: Classify an identifier, if any, into one of the three linkage classes. $1 is the storage, $2, $3, $4 are the three file names, $5 is the type, and $6 is the identifier, if any.

getId: Classify an identifier, if any, into one of the three linkage classes. $1 is the storage, $2, $3, $4 are the three file names, $5 is the type, and $6 is the identifier, if any.