Okay, based on the recent discussions, here are concrete proposals for
the last few adjustments to the 3.0 FE/BE protocol.
Let's use int16 (2-byte integers) as format selector codes; this seems a
reasonable compromise between bandwidth and flexibility. As of 7.4 the
only supported values will be 0 = text and 1 = binary, but future versions
can add more codes.
In client-sent messages, format codes appear primarily in the Bind
message. Bind needs to be able to specify two sets of formats: one for
the parameters it is supplying, and one for the query result columns if
any. I propose representing each set as a count N followed by N format
codes. If the count is zero, then all the columns have the default format
(which will always be 0 = text in 7.4, though we might later allow it to
be set to something else). If the count is one, then the single format
code is applied to all columns. Otherwise the count must match the number
of parameters or output columns. (Note that this moves the output format
request from Execute to Bind, so that formats can't be changed from one
row to the next in a portal's result. This allows more server-side
optimization of formatting routine setup.)
FunctionCall likewise needs to specify the format codes for the data it is
supplying and the result to be returned.
In server-sent messages, format codes will be added to RowDescription
messages, one per column. (A RowDescription sent in response to statement
Describe will show the default zero format code for all columns. A
RowDescription sent in response to portal Describe or simple Query will
show the actual format codes in use for the result.) The CopyInResponse
and CopyOutResponse messages will be changed to include a column count and
per-column format codes. (Currently, the per-column codes will all be the
same: all zero for plain COPY and all one for binary COPY. But someday we
might extend COPY to do something different.)
We will move to a single uniform representation of data items at the
protocol level: an int4 byte count (not including self) followed by that
many data bytes. NULL is represented by byte count -1 (and no data bytes,
of course). The interpretation of the data bytes depends on the format
code. This will be used in DataRow output, Bind parameters, FunctionCall,
and FunctionResultResponse messages (the separate representation of
FunctionVoidResponse goes away). This will also become the data
representation in COPY BINARY files. I will change the header signature
for COPY BINARY so that the files can't be mistaken for old-style
server-internal-representation binary files.
The BinaryRow message type goes away; DataRow will serve for all format
codes. The content of DataRow will be a field count N followed by N
fields in the above representation. Note that the null bitmap goes away.
This representation is a little bulkier than the old one for rows
containing many NULLs, but the same or smaller when there are no NULLs.
It has a major advantage over the old representation in that the field
contents can be extracted without any external knowledge --- in the old
layout, if you didn't know the number of fields in advance, you were
completely lost. libpq, for example, cannot support receiving Execute
results without a preceding Describe result unless it can parse DataRow
without knowing the number of columns in advance.
Any objections?
regards, tom lane