punctuators

punctuators

part of shnell – a source to source compiler enhancement tool

© Jens Gustedt, 2019

Produce a regexp for C punctuator tokens

The result of this code snippet is a variable punctuators that has an sed regular expression to detect C’s punctuators according to the maximum crunch rule. That is, always the longest possible operator is detected.

Deviantions from C’s token rules

The only operators that are not detected by this expression are ., + and -, because these may be part of a preprocessing number token.

Additionally detected are

[[ and ]] that start and terminate C2x’ new attribute feature. Officially, these would not be tokens of their own, but any sane code should treat them as such. For ]] this could cause some problems, because such a pair can appear by combining two array brackets.

:: is the new shiny C2x separator for attribute names.

!! is my favorite pseudo operator, every language should have this. In C this forces Boolean evaluation, isn’t it nice to have an operator for this.

Coding and configuration

The following code is needed to enable the sh-module framework.

SRC="$_" . "${0%%/${0##*/}}/import.sh"

Imports

The following sh-modules are imported:

Details

separate punctuators according to the maximal crunch rule

This is protected by an “include” guard.

if [ -z "${punctuators}" ] ; then

Single operator consising of 4 characters.

punct4="%:%:"

Three with 3 characters.

punct3="
<<=
>>=
\.\.\.
"

A lot with 2.

punct2="
\!\!
\!=
%:
%=
%>
\&\&
\&=
++
+=
--
-=
->
\/=
::
:>
<%
<:
<<
<=
==
>=
>>
\#\#
\*=
\[\[
\]\]
|=
||
∪=
∩=
×=
÷=
⌫=
⌦=
"

A whole class of one character punctuators, with notable exceptions for the ., + and - characters. These can be part of preprocessing numbers so we should not blindly tokenize at them.

punct1="[[:punct:]]"

join '\|' ${punct4} ${punct3} ${punct2} ${punct1}

export punctuators="${joinRet}"

fi