regVar

part of shnell – a source to source compiler enhancement tool

Generate sed expression to replace a meta-variable

This produces a regular expression to evaluate different forms and surroundings of meta-variables given as ${NAME}

#${NAME} is stringification of the content

## ${NAME} joins the content to the left

## ${NAME} ## joins the content to the left and to the right

${NAME} ## joins the content to the right

${NAME} without any surrounding # or ## operators is simple replacement with the contents

String literal and character literal prefixes

Sometimes we might have ${NAME} representing a string or character prefix (such as L) as in ${NAME} ## "something" or ${NAME} ## 'z'. The prefix is then joined to the string literal to form a single string token L"something" or L'z'.

Otherwise, remember that C string literals don’t need concatenation: two string literals that directly follow each other are joined later by the preprocessor.

Precedence

In theory, ## binds to the meta-variable that was defined first. E.g ${HEI} ## ${HOI} binds to the variable HOI if that was defined in an outer scope (or previous in the same bind directive) than HEI. But under normal circumstances you should not notice this, the contents of the two variables should just be glued together.

There is ambiguity for ${HEI}###${HOI}. This stringifies ${HOI} regardles of the order of definition and then glues ${HEI} to it as a string prefix.

There should never be more than three # in a row.

Tokens and white space

Generally, shnell pragmas are broken into tokens by white space. Sometimes it may be convenient to also include spaces or tabs into words, e.g if we want to assign a whole list of tokens to a meta-variable, or if some data naturally contains spaces. This can be achieved by placing a backspace character \ in front of a space such as in L\ u\ U\ u8 for a list of string prefixes, or as in Elster,\ Frau for a combination of data that belongs to a single item.

Such escaped spaces behave differently when such a meta-variable is expanded:

If expanded directly, whitout stringification, the result is a list of tokens that are split at the escaped spaces.
When stringified, the resulting string is still one single token.

Examples

with ${A} containing 5, ${NAME} containing top, ${EXT} containing U, ${FN} containing Elster\ ,\ Frau we obtain

source	replacement
`int a = ${A};`	`int a = 5;`	simple token
`long a = ${A}L;`	`long a = 5L;`	variable and alnum, 1 token
`long a = ${A}##L;`	`long a = 5L;`	same with `##`, 1 token
`unsigned u = ${A}${EXT};`	`unsigned u = 5 U;`	invalid C, 2 token
`unsigned u = ${A}##${EXT};`	`unsigned u = 5U;`	`##` necessary, 1 token
`var${A}`	`var5`	alnum and variable, 1 token
`var ## ${A}`	`var5`	same with `##`, 1 token
`var ${A}`	`var 5`	invalid C, 2 token
`${NAME}${A}`	`top 5`	invalid C, 2 token
`${NAME} ${A}`	`top 5`	invalid C, 2 token
`${NAME} ## ${A}`	`top5`	`##` necessary, 1 token
`char s[] = #${NAME};`	`char s[] = "top";`	stringification
`wchar s[] = L#${NAME};`	`wchar s[] = L"top";`	string prefix, 1 token
`char32_t su[] = ${EXT}## #${NAME};`	`char32_t su[] = U"top";`	`##` necessary, 1 token
`char s[] = #${FN};`	`char s[] = "Elster , Frau";`	stringification, 1 token
`enum { ${FN} };`	`enum { Elster , Frau };`	3 token, including comma

Coding and configuration

The following code is needed to enable the sh-module framework.

SRC="$_" . "${0%%/${0##*/}}/import.sh"

Imports

The following sh-modules are imported: