What good secure string expansion on Unix should look like

July 23, 2011

In yesterday's entry, I covered some options
for how to make string expansion and tokenization of command lines aware
of each other. Before I pick what I think is the best approach, let's
take a step back and talk about what results we want.

Assuming that %s expands to a single argument, the straightforward
reading of what we want to happen is for /opt/avscanner to be
invoked with four arguments if $heloname is set and with only two if
$heloname is unset. The various alternate interpretations and results
are all absurd in various ways.

I think that the simple way to achieve this is to perform string
expansion before tokenization but to mark the result of variable
expansions as being all in a single token. You don't quite want variable
expansion to force token boundaries (otherwise '-h$somevar' would
wind up actually meaning '-h $somevar', and that's absurd in its own
way), but you don't want the tokenizer to split things inside variable
expansions. Fortunately getting this right is only a small matter of
programming.

(Possibly you want to expose an explicit operator to group several
expansions together as a single non-breakable entity. You could call it
'${arg ...}'.)

If you want to tokenize before expansion, clearly the tokenizer needs to
be language aware. Roughly speaking, I think what you wind up wanting to
do is parse the string into an AST that is composed partly of tokenized
literal text, partly of language operators, and partly of variable
expansions. Then you evaluate the AST to generate a stream of tokenized
text, where a straightforward variable expansion like $heloname or
$recipients always gives you a single token regardless of what the
contents are.

(I have ripped this idea off from my understanding of the general
approach that web frameworks usually take to parsing and evaluating
their page templates.)

Sidebar: an alternate tokenization approach

An alternate tokenization approach is to say that the AST should include
explicit token boundary markers instead of pre-tokenized text (and
whitespace normally turns into such a boundary marker). Then the AST
evaluation produces a stream that is a mixture of token boundary markers
and text chunks; you take the stream and fuse all text between two
boundary markers together into a single argument. This naturally handles
cases like '-h$somevar' and '$var1$var2'; in both cases there is
no token boundary marker in the middle, so although the AST has two
separate nodes the end result fuses the text from both nodes together
into a single argument.