The primary implementation of Python, "CPython", is written in a
mixture of Python and C. One implementation detail of CPython
is what are called "built-in" functions -- functions available to
Python programs but written in C. When a Python program calls a
built-in function and passes in arguments, those arguments must be
translated from Python values into C values. This process is called
"parsing arguments".

As of CPython 3.3, builtin functions nearly always parse their arguments
with one of two functions: the original PyArg_ParseTuple(), [1] and
the more modern PyArg_ParseTupleAndKeywords(). [2] The former
only handles positional parameters; the latter also accommodates keyword
and keyword-only parameters, and is preferred for new code.

With either function, the caller specifies the translation for
parsing arguments in a "format string": [3] each parameter corresponds
to a "format unit", a short character sequence telling the parsing
function what Python types to accept and how to translate them into
the appropriate C value for that parameter.

PyArg_ParseTuple() was reasonable when it was first conceived.
There were only a dozen or so of these "format units"; each one
was distinct, and easy to understand and remember.
But over the years the PyArg_Parse interface has been extended
in numerous ways. The modern API is complex, to the point that it
is somewhat painful to use. Consider:

There are now forty different "format units"; a few are even three
characters long. This makes it difficult for the programmer to
understand what the format string says--or even perhaps to parse
it--without constantly cross-indexing it with the documentation.

There are also six meta-format units that may be buried in the
format string. (They are: "()|$:;".)

The more format units are added, the less likely it is the
implementer can pick an easy-to-use mnemonic for the format unit,
because the character of choice is probably already in use. In
other words, the more format units we have, the more obtuse the
format units become.

Several format units are nearly identical to others, having only
subtle differences. This makes understanding the exact semantics
of the format string even harder, and can make it difficult to
figure out exactly which format unit you want.

The docstring is specified as a static C string, making it mildly
bothersome to read and edit since it must obey C string quoting rules.

When adding a new parameter to a function using
PyArg_ParseTupleAndKeywords(), it's necessary to touch six
different places in the code: [4]

Declaring the variable to store the argument.

Passing in a pointer to that variable in the correct spot in
PyArg_ParseTupleAndKeywords(), also passing in any
"length" or "converter" arguments in the correct order.

Adding the name of the argument in the correct spot of the
"keywords" array passed in to
PyArg_ParseTupleAndKeywords().

Adding the format unit to the correct spot in the format
string.

Adding the parameter to the prototype in the docstring.

Documenting the parameter in the docstring.

There is currently no mechanism for builtin functions to provide
their "signature" information (see inspect.getfullargspec and
inspect.Signature). Adding this information using a mechanism
similar to the existing PyArg_Parse functions would require
repeating ourselves yet again.

The goal of Argument Clinic is to replace this API with a mechanism
inheriting none of these downsides:

You need specify each parameter only once.

All information about a parameter is kept together in one place.

For each parameter, you specify a conversion function; Argument
Clinic handles the translation from Python value into C value for
you.

Docstrings are written in plain text. Function docstrings are
required; per-parameter docstrings are encouraged.

From this, Argument Clinic generates for you all the mundane,
repetitious code and data structures CPython needs internally.
Once you've specified the interface, the next step is simply to
write your implementation using native C types. Every detail of
argument parsing is handled for you.

Argument Clinic is implemented as a preprocessor. It draws inspiration
for its workflow directly from [Cog] by Ned Batchelder. To use Clinic,
add a block comment to your C source code beginning and ending with
special text strings, then run Clinic on the file. Clinic will find the
block comment, process the contents, and write the output back into your
C source file directly after the comment. The intent is that Clinic's
output becomes part of your source code; it's checked in to revision
control, and distributed with source packages. This means that Python
will still ship ready-to-build. It does complicate development slightly;
in order to add a new function, or modify the arguments or documentation
of an existing function using Clinic, you'll need a working Python 3
interpreter.

The Argument Clinic DSL is specified as a comment embedded in a C
file, as follows. The "Example" column on the right shows you sample
input to the Argument Clinic DSL, and the "Section" column on the left
specifies what each line represents in turn.

To give some flavor of the proposed DSL syntax, here are some sample Clinic
code blocks. This first block reflects the normally preferred style, including
blank lines between parameters and per-argument docstrings.
It also includes a user-defined converter (path_t) created
locally:

/*[clinic]
os.stat as os_stat_fn -> stat result
path: path_t(allow_fd=1)
Path to be examined; can be string, bytes, or open-file-descriptor int.
*
dir_fd: OS_STAT_DIR_FD_CONVERTER = DEFAULT_DIR_FD
If not None, it should be a file descriptor open to a directory,
and path should be a relative string; path will then be relative to
that directory.
follow_symlinks: bool = True
If False, and the last element of the path is a symbolic link,
stat will examine the symbolic link itself instead of the file
the link points to.
Perform a stat system call on the given path.
{parameters}
dir_fd and follow_symlinks may not be implemented
on your platform. If they are unavailable, using them will raise a
NotImplementedError.
It's an error to use dir_fd or follow_symlinks when specifying path as
an open file descriptor.
[clinic]*/

/*[clinic]
os.access
path: path
mode: int
*
dir_fd: OS_ACCESS_DIR_FD_CONVERTER = 1
effective_ids: bool = False
follow_symlinks: bool = True
Use the real uid/gid to test for access to a path.
Returns True if granted, False otherwise.
{parameters}
dir_fd, effective_ids, and follow_symlinks may not be implemented
on your platform. If they are unavailable, using them will raise a
NotImplementedError.
Note that most operations will use the effective uid/gid, therefore this
routine can be used in a suid/sgid environment to test if the invoking user
has the specified access to the path.
[clinic]*/

This final example shows a Clinic code block handling groups of
optional parameters, including parameters on the left:

/*[clinic]
curses.window.addch
[
y: int
Y-coordinate.
x: int
X-coordinate.
]
ch: char
Character to add.
[
attr: long
Attributes for the character.
]
/
Paint character ch at (y, x) with attributes attr,
overwriting any character previously painter at that location.
By default, the character position and attributes are the
current settings for the window object.
[clinic]*/

All lines support # as a line comment delimiter except
docstrings. Blank lines are always ignored.

Like Python itself, leading whitespace is significant in the Argument
Clinic DSL. The first line of the "function" section is the
function declaration. Indented lines below the function declaration
declare parameters, one per line; lines below those that are indented even
further are per-parameter docstrings. Finally, the first line dedented
back to column 0 end parameter declarations and start the function docstring.

Parameter docstrings are optional; function docstrings are not.
Functions that specify no arguments may simply specify the function
declaration followed by the docstring.

The dotted name should be the full name of the function, starting
with the highest-level package (e.g. "os.stat" or "curses.window.addch").

The "as legal_c_id" syntax is optional.
Argument Clinic uses the name of the function to create the names of
the generated C functions. In some circumstances, the generated name
may collide with other global names in the C program's namespace.
The "as legal_c_id" syntax allows you to override the generated name
with your own; substitute "legal_c_id" with any legal C identifier.
If skipped, the "as" keyword must also be omitted.

The return annotation is also optional. If skipped, the arrow ("->")
must also be omitted. If specified, the value for the return annotation
must be compatible with ast.literal_eval, and it is interpreted as
a return converter.

The "name" must be a legal C identifier. Whitespace is permitted between
the name and the colon (though this is not the preferred style). Whitespace
is permitted (and encouraged) between the colon and the converter.

The "converter" is the name of one of the "converter functions" registered
with Argument Clinic. Clinic will ship with a number of built-in converters;
new converters can also be added dynamically. In choosing a converter, you
are automatically constraining what Python types are permitted on the input,
and specifying what type the output variable (or variables) will be. Although
many of the converters will resemble the names of C types or perhaps Python
types, the name of a converter may be any legal Python identifier.

If the converter is followed by parentheses, these parentheses enclose
parameter to the conversion function. The syntax mirrors providing arguments
a Python function call: the parameter must always be named, as if they were
"keyword-only parameters", and the values provided for the parameters will
syntactically resemble Python literal values. These parameters are always
optional, permitting all conversion functions to be called without
any parameters. In this case, you may also omit the parentheses entirely;
this is always equivalent to specifying empty parentheses. The values
supplied for these parameters must be compatible with ast.literal_eval.

The "default" is a Python literal value. Default values are optional;
if not specified you must omit the equals sign too. Parameters which
don't have a default are implicitly required. The default value is
dynamically assigned, "live" in the generated C code, and although
it's specified as a Python value, it's translated into a native C
value in the generated C code. Few default values are permitted,
owing to this manual translation step.

If this were a Python function declaration, a parameter declaration
would be delimited by either a trailing comma or an ending parentheses.
However, Argument Clinic uses neither; parameter declarations are
delimited by a newline. A trailing comma or right parenthesis is not
permitted.

The first parameter declaration establishes the indent for all parameter
declarations in a particular Clinic code block. All subsequent parameters
must be indented to the same level.

For convenience's sake in converting existing code to Argument Clinic,
Clinic provides a set of legacy converters that match PyArg_ParseTuple
format units. They are specified as a C string containing the format
unit. For example, to specify a parameter "foo" as taking a Python
"int" and emitting a C int, you could specify:

foo : "i"

(To more closely resemble a C string, these must always use double quotes.)

Although these resemble PyArg_ParseTuple format units, no guarantee is
made that the implementation will call a PyArg_Parse function for parsing.

This syntax does not support parameters. Therefore, it doesn't support any
of the format units that require input parameters ("O!","O&", "es", "es#",
"et", "et#"). Parameters requiring one of these conversions cannot use the
legacy syntax. (You may still, however, supply a default value.)

There are four special symbols that may be used in the parameter section. Each
of these must appear on a line by itself, indented to the same level as parameter
declarations. The four symbols are:

*

Establishes that all subsequent parameters are keyword-only.

[

Establishes the start of an optional "group" of parameters.
Note that "groups" may nest inside other "groups".
See Functions With Positional-Only Parameters below.
Note that currently [ is only legal for use in functions
where all parameters are marked positional-only, see
/ below.

]

Ends an optional "group" of parameters.

/

Establishes that all the proceeding arguments are
positional-only. For now, Argument Clinic does not
support functions with both positional-only and
non-positional-only arguments. Therefore: if /
is specified for a function, it must currently always
be after the last parameter. Also, Argument Clinic
does not currently support default values for
positional-only parameters.

(The semantics of / follow a syntax for positional-only
parameters in Python once proposed by Guido. [5] )

The first line with no leading whitespace after the function declaration is the
first line of the function docstring. All subsequent lines of the Clinic block
are considered part of the docstring, and their leading whitespace is preserved.

If the string {parameters} appears on a line by itself inside the function
docstring, Argument Clinic will insert a list of all parameters that have
docstrings, each such parameter followed by its docstring. The name of the
parameter is on a line by itself; the docstring starts on a subsequent line,
and all lines of the docstring are indented by two spaces. (Parameters with
no per-parameter docstring are suppressed.) The entire list is indented by the
leading whitespace that appeared before the {parameters} token.

If the string {parameters} doesn't appear in the docstring, Argument Clinic
will append one to the end of the docstring, inserting a blank line above it if
the docstring does not end with a blank line, and with the parameter list at
column 0.

The Python value to use in place of the parameter's actual default
in Python contexts. In other words: when specified, this value will
be used for the parameter's default in the docstring, and in the
Signature. (TBD alternative semantics: If the string is a valid
Python expression which can be rendered into a Python value using
eval(), then the result of eval() on it will be used as the
default in the Signature.) Ignored if there is no default.

required

Normally any parameter that has a default value is automatically
optional. A parameter that has "required" set will be considered
required (non-optional) even if it has a default value. The
generated documentation will also not show any default value.

Additionally, converters may accept one or more of these optional
parameters, on an individual basis:

annotation

Explicitly specifies the per-parameter annotation for this
parameter. Normally it's the responsibility of the conversion
function to generate the annotation (if any).

bitwise

For converters that accept unsigned integers. If the Python integer
passed in is signed, copy the bits directly even if it is negative.

encoding

For converters that accept str. Encoding to use when encoding a
Unicode string to a char *.

immutable

Only accept immutable values.

length

For converters that accept iterable types. Requests that the converter
also emit the length of the iterable, passed in to the _impl function
in a Py_ssize_t variable; its name will be this
parameter's name appended with "_length".

nullable

This converter normally does not accept None, but in this case
it should. If None is supplied on the Python side, the equivalent
C argument will be NULL. (The _impl argument emitted by this
converter will presumably be a pointer type.)

types

A list of strings representing acceptable Python types for this object.
There are also four strings which represent Python protocols:

"buffer"

"mapping"

"number"

"sequence"

zeroes

For converters that accept string types. The converted value should
be allowed to have embedded zeroes.

Argument Clinic writes its output inline in the C file, immediately
after the section of Clinic code. For "python" sections, the output
is everything printed using builtins.print. For "clinic"
sections, the output is valid C code, including:

a #define providing the correct methoddef structure for the
function

a prototype for the "impl" function -- this is what you'll write
to implement this function

a function that handles all argument processing, which calls your
"impl" function

the definition line of the "impl" function

and a comment indicating the end of output.

The intention is that you write the body of your impl function immediately
after the output -- as in, you write a left-curly-brace immediately after
the end-of-output comment and implement builtin in the body there.
(It's a bit strange at first, but oddly convenient.)

Argument Clinic will define the parameters of the impl function for
you. The function will take the "self" parameter passed in
originally, all the parameters you define, and possibly some extra
generated parameters ("length" parameters; also "group" parameters,
see next section).

Argument Clinic also writes a checksum for the output section. This
is a valuable safety feature: if you modify the output by hand, Clinic
will notice that the checksum doesn't match, and will refuse to
overwrite the file. (You can force Clinic to overwrite with the
"-f" command-line argument; Clinic will also ignore the checksums
when using the "-o" command-line argument.)

Finally, Argument Clinic can also emit the boilerplate definition
of the PyMethodDef array for the defined classes and modules.

A significant fraction of Python builtins implemented in C use the
older positional-only API for processing arguments
(PyArg_ParseTuple()). In some instances, these builtins parse
their arguments differently based on how many arguments were passed
in. This can provide some bewildering flexibility: there may be
groups of optional parameters, which must either all be specified or
none specified. And occasionally these groups are on the left! (A
representative example: curses.window.addch().)

Argument Clinic supports these legacy use-cases by allowing you to
specify parameters in groups. Each optional group of parameters
is marked with square brackets. Note that these groups are permitted
on the right or left of any required parameters!

The impl function generated by Clinic will add an extra parameter for
every group, "int group_{left|right}_<x>", where x is a monotonically
increasing number assigned to each group as it builds away from the
required arguments. This argument will be nonzero if the group was
specified on this call, and zero if it was not.

Note that when operating in this mode, you cannot specify default
arguments.

Also, note that it's possible to specify a set of groups to a function
such that there are several valid mappings from the number of
arguments to a valid set of groups. If this happens, Clinic will abort
with an error message. This should not be a problem, as
positional-only operation is only intended for legacy use cases, and
all the legacy functions using this quirky behavior have unambiguous
mappings.

As of this writing, there is a working prototype implementation of
Argument Clinic available online (though the syntax may be out of date
as you read this). [6] The prototype generates code using the
existing PyArg_Parse APIs. It supports translating to all current
format units except the mysterious "w*". Sample functions using
Argument Clinic exercise all major features, including positional-only
argument parsing.

The prototype also currently provides an experimental extension
mechanism, allowing adding support for new types on-the-fly. See
Modules/posixmodule.c in the prototype for an example of its use.

In the future, Argument Clinic is expected to be automatable enough
to allow querying, modification, or outright new construction of
function declarations through Python code. It may even permit
dynamically adding your own custom DSL!

The API for supplying inspect.Signature metadata for builtins is
currently under discussion. Argument Clinic will add support for
the prototype when it becomes viable.

Nick Coghlan suggests that we a) only support at most one left-optional
group per function, and b) in the face of ambiguity, prefer the left
group over the right group. This would solve all our existing use cases
including range().

Optimally we'd want Argument Clinic run automatically as part of the
normal Python build process. But this presents a bootstrapping problem;
if you don't have a system Python 3, you need a Python 3 executable to
build Python 3. I'm sure this is a solvable problem, but I don't know
what the best solution might be. (Supporting this will also require
a parallel solution for Windows.)

On a related note: inspect.Signature has no way of representing
blocks of arguments, like the left-optional block of y and x
for curses.window.addch. How far are we going to go in supporting
this admittedly aberrant parameter paradigm?

During the PyCon US 2013 Language Summit, there was discussion of having
Argument Clinic also generate the actual documentation (in ReST, processed
by Sphinx) for the function. The logistics of this are TBD, but it would
require that the docstrings be written in ReST, and require that Python
ship a ReST -> ascii converter. It would be best to come to a decision
about this before we begin any large-scale conversion of the CPython
source tree to using Clinic.

Guido proposed having the "function docstring" be hand-written inline,
in the middle of the output, something like this:

I tried it this way and don't like it -- I think it's clumsy. I
prefer that everything you write goes in one place, rather than
having an island of hand-edited stuff in the middle of the DSL
output.

Argument Clinic does not support automatic tuple unpacking
(the "(OOO)" style format string for PyArg_ParseTuple().)

Argument Clinic removes some dynamism / flexibility. With
PyArg_ParseTuple() one could theoretically pass in different
encodings at runtime for the "es"/"et" format units.
AFAICT CPython doesn't do this itself, however it's possible
external users might do this. (Trivia: there are no uses of
"es" exercised by regrtest, and all the uses of "et"
exercised are in socketmodule.c, except for one in _ssl.c.
They're all static, specifying the encoding "idna".)

The PEP author wishes to thank Ned Batchelder for permission to
shamelessly rip off his clever design for Cog--"my favorite tool
that I've never gotten to use". Thanks also to everyone who provided
feedback on the [bugtracker issue] and on python-dev. Special thanks
to Nick Coglan and Guido van Rossum for a rousing two-hour in-person
deep dive on the topic at PyCon US 2013.