Abstract

A new option shall be added to the command exec, that allows the user to
specify that input redirected from immediate data (using <<) and/or the
data received from the external command shall not undergo any translation
within Tcl.

Motivation

External programs may expect binary data, or write out binary data, or neither
or even both. Whether a program reads/writes binary data or platform-encoded
data is generally specific to the particular program and known by the
programmer who intends to exec it from a Tcl script.

Deficiencies of Current State of exec

Problem 1: For passing string-data to external programs, exec now
behaves arguably incorrect, because it does not pass through \0-bytes.

Problem 2: For returning the result of external programs, exec
applies translations based on system-default encoding, which is OK in most
cases, except, of course, for programs that output binary data.

Problem 1 is actually a bug, but for compatibility reasons some internal
function cannot be changed, because it is believed to be used by some
extensions (although it is not officially exported), thus blocking the fixing
of the bug. This TIP goes beyond that, by not having it fixed to some
particular consistent behaviour, but to let the script-developer decide on
what is the right behaviour for his needs.

Problem 2 prevents the output of binary-outputting programs from being
correctly retrieved.

Proposal for a New Option

The exec command already has an interface for options, and currently
supports one option -keepnewline and the end-of-options marker --.
This means that adding a new option will not adversely affect any existing
scripts.

This TIP proposes a new option, -binaryarg. arg can be either a
single boolean value:

if a boolean true then both input (if a << redirection is present)
and return value are passed verbatim between Tcl and the external
program.

if a boolean false (which is the default) then behaviour would be like it
is now, except for input being \0-safe and system-translation taking
place as appropriate (e.g. line-endings).

arg can also be one of the keywords in, out, both (which
is equivalent to 1) or an empty string (equivalent to 0) for more readable
code. The directions are to be seen from external programs perspective.

If arg is any true boolean value, both or out, then the option
-keepnewline is implied.

If some usage of exec does not use << string-redirection, then the
in-bit has no visible effect.

For now, no binary flag is defined for stderr. This might be subject of a
future TIP or left out due to lack of need.

Alternatives

Benjamin Riefenstahl suggested to not make a binary (in the sense of yes/no) decision for each of input and output, but to directly specify the encodings to use (of which "binary" would also be a valid one).

This has some subtle disadvantage for usage.

As proposed, there is *one* option with one argument of effectively 4 different values. From each of these values it is evident, on which channels conversion takes place.

To specify arbitrary encodings independently for two channels, there would need to be a list of encodings. For consistency with other commands, the "stdin"-encoding would have to be first, though it is
used rarelier than output-encoding. So most times one would have to pass -encoding {{} binary} (yuck!) to specify encoding for output only.

Unlike with general channels (as used for open |... or sockets) the data going through exec is always "limited": before actually calling exec it exists completely in memory as argument to exec, and afterwards output is returned as one single returnvalue. Each of these chunks can be handled as binary for exec, and explicitly converted through "encoding convertto" (for input) or encoding convertfrom (for output).

Encoding per system-default is achieved by not adding any -binary option to exec at all, which covers probably >95% of all usages, anyway. (the remaining 5% being the ones for whom this TIP has been written)

Implementation

No implementation exists right now, although it is possible that this will
change in near future.

An implementation would have to change these functions: Those functions that
are not modifyable due to them being used elsewhere, need to be replaced by an
extended version, and the old function can be changed to call the new one with
appropriate extra arguments, and eventually be phased out.

Tcl_ExecObjCmd: Handle the new option and set new bits for the flags argument
of Tcl_OpenCommandChannel or add a new bitset argument.

Tcl_OpenCommandChannel: Deal with the new bits, or with new argument and pass
them/it on.

TclCreatePipeline: needs new arguments flags (for solving bug #768678
along the way) and pass new arguments to a new variant of
TclpCreateTempFile.

TclpCreateTempFile: both Unix and Win version currently get a dumb C-style
char-pointer with no length-information. They need two extra arguments, a
length and the appropriate binary-bit. Since TclpCreateTempFile is in
the stubs-table, we definitely need a new Version of it, e.g.
TclpCreateTempFileEx, which will take and use these new arguments.