Hi swift-evolution,
For the last few weeks, I've been working on introducing some Swift in a pure-C codebase. While the Clang importer makes the process quite smooth, there are still some rough edges.
Here is a (lengthy) proposal resulting from that experience.
Rendered version: https://gist.github.com/Fruneau/fa83fe87a316514797c1eeaaaa2e5012
Introduction
=======
Directly importing C APIs is a core feature of the Swift compiler. In that process, C pointers are systematically imported as `Unsafe*Pointer` swift objects. However, in C we make the distinction between pointers that reference a single object, and those pointing to an array of objects. In the case of a single object of type `T`, the Swift compiler should be able to import the parameter `T *` as a `inout T`, and `T const *` as `T`. Since the compiler cannot makes the distinction between pointer types by itself, we propose to add an attribute of C pointer for that purpose.
Motivation
=======
Let consider the following C API:
```c
typedef struct sb_t {
char * _Nonnull data;
int len;
int size;
} sb_t;
/** Append the string \p str to \p sb. */
void sb_adds(sb_t * _Nonnull sb, const char * _Nonnull str);
/** Append the content of \p other to \p sb. */
void sb_addsb(sb_t * _Nonnull sb, const sb_t * _Nonnull other);
/** Returns the amount of available memory of \p sb. */
int sb_avail(const sb_t * _Nonnull sb);
```
This is imported in Swift as follow:
```swift
struct sb_t {
var data: UnsafeMutablePointer<Int8>
var len: Int32
var size: Int32
}
func sb_adds(_ sb: UnsafeMutablePointer<sb_t>, _ str: UnsafePointer<Int8>)
func sb_addsb(_ sb: UnsafeMutablePointer<sb_t>, _ other: UnsafePointer<sb_t>)
func sb_avail(_ sb: UnsafePointer<sb_t>) -> Int32
```
`sb_adds()` takes two pointers: the first one is supposed to point to a single object named `sb` that will be mutated in order to add the content of `str` which points to a c-string. So we have two kinds of pointers: the first points to a single object, the second to a buffer. But both are represented using `Unsafe*Pointer`. Swift cannot actually make the difference between those two kind of pointers since the C language provides no way to express it.
`sb_addsb()` takes two objects of type `sb_t`. The first is mutated by the function by appending the content of the second one, which is `const`. The constness is properly reflected in Swift. However, the usage of the imported API is Swift might be surprising since Swift requires usage of an `inout` parameter in order to build an `Unsafe*Pointer` object:
```swift
var sb = sb_t(...)
let sb2 = sb_t(...)
sb_addsb(&sb, &sb2) // error: cannot pass immutable value as inout argument: 'sb2' is a 'let' constant
sb_addsb(&sb, sb2) // cannot convert value of type 'sb_t' to expected argument type 'UnsafePointer<sb_t>!'
var sb3 = sb_t(...)
sb_addsb(&sb, &sb3) // works
```
```swift
sb_avail(&sb2) // cannot convert value of type 'sb_t' to expected argument type 'UnsafePointer<sb_t>!'
```
However, Swift also provides the `swift_name()` attribute that allows remapping a C function to a Swift method, which includes mapping one of the parameter to `self:`:
```c
__attribute__((swift_name("sb_t.add(self:string:)")))
void sb_adds(sb_t * _Nonnull sb, const char * _Nonnull str);
__attribute__((swift_name("sb_t.add(self:other:)")))
void sb_addsb(sb_t * _Nonnull sb, const sb_t * _Nonnull other);
__attribute__((swift_name("sb_t.avail(self:)")))
int sb_avail(const sb_t * _Nonnull sb);
```
```swift
struct sb_t {
var data: UnsafeMutablePointer<Int8>
var len: Int32
var size: Int32
mutating func add(string: UnsafePointer<Int8>)
mutating func add(other: UnsafePointer<sb_t>)
func avail() -> Int32
}
```
With that attribute used, there is no need to convert the parameter mapped to `self:` to an `Unsafe*Pointer`. As a consequence, we have an improved API:
```swift
sb2.avail() // This time it works!
```
But we also have some inconsistent behavior since only `self:` is affected by this:
```swift
sb.add(other: &sb2) // error: cannot pass immutable value as inout argument: 'sb2' is a 'let' constant
sb.add(other: sb2) // cannot convert value of type 'sb_t' to expected argument type 'UnsafePointer<sb_t>!'
```
What we observe here is that mapping an argument to `self:` is enough for the compiler to be able to change its semantics. As soon as it knows the pointer is actually the pointer to a single object, it can deal with it without exposing it as an `Unsafe*Pointer`, making the API safer and less surprising.
Proposed solution
================
A new qualifier could be added to inform the compiler that a pointer points to a single object. Then the Swift compiler could use that new piece of the information to generate API that use directly the object type instead of the pointer type. We propose the introduction of a new qualifier named `_Ref`, semantically similar to a C++ reference. That is:
* `_Ref` is applied with the same grammar as the `_Nonnull`, `_Nullable`, family
* A pointer tagged `_Ref` cannot be used to access more than the single pointed object.
* A pointer tagged `_Ref` is non-owning
Parameters qualified with `_Ref` would then be imported in Swift as follows:
* `T * _Ref _Nonnull` is imported as `inout T`
* `T * _Ref _Nullable` is imported as `inout T?`
* `T const * _Ref _Nonnull` is imported as `T`
* `T const * _Ref _Nullable` is imported as `T?`
Example
=======
In the context of the provided example from the motivation section:
```c
typedef struct sb_t {
char * _Nonnull data;
int len;
int size;
} sb_t;
/** Append the string \p str to \p sb. */
void sb_adds(sb_t * _Ref _Nonnull sb, const char * _Nonnull str);
/** Append the content of \p other to \p sb. */
void sb_addsb(sb_t * _Ref _Nonnull sb, const sb_t * _SIngle _Nonnull other);
/** Returns the amount of available memory of \p sb. */
int sb_avail(const sb_t * _Ref _Nonnull sb);
```
Would be imported as follow:
```swift
struct sb_t {
var data: UnsafeMutablePointer<Int8>
var len: Int32
var size: Int32
}
func sb_adds(_ sb: inout sb_t, _ str: UnsafePointer<Int8>)
func sb_addsb(_ sb: inout sb_t, _ other: sb_t)
func sb_avail(_ sb: sb_t) -> Int32
```
Impact on existing code
=================
This proposal has no impact on existing code since it proposes additive changes only. However, opting in for the `_Ref` qualifier on APIs already exposed in Swift will impact the generated code.
* For `const` pointers, the change is always source-incompatible
* For non-`const` pointers, the change will be source-compatible everywhere we use the `&object` syntax to pass the argument from a plain object, but will break sources that passed an `Unsafe*Pointer` as argument.
Alternatives considered
===================
It has been considered to use to qualifiers family instead of the `_Ref`:
- one family to specify the kind of pointer: single object or array
- one family to declare the ownership
This approach has the clear advantage to be more flexible, however it has been found to be less expressive. Considering C API already should use nullability qualifiers on every single pointers, forcing two additional qualifiers on every pointer would be painful and negatively impact the readability of the C APIs.
`_Ref` on the other hand is short and leverage a concept already known by developers, but is also more specific to particular use case.
Discussion
========
* Safety: won't this make developper think they are calling safe APIs from Swift while the API is actually unsafe?
There is certainly a risk a C API make an improper use of `_Ref` (in particular, breaks the non-owning part of the contract). However, this kind of safety issues are already present when using the `swift_name()` attribute of function and mapping one of the pointer parameter of the function to `self:`, or when using the nullability qualifiers.
* What about pointers stored in structures? or pointers returned by functions?
As a qualifier, `_Ref` could also be used on pointers that are not arguments of a function:
```c
typedef struct {
sb_t * _Ref obj;
} sb_ptr_t;
sb_t * _Ref sb_get_singleton(void);
```
Swift, however, cannot import those as `sb_t` but will still be forced to use `Unsafe*Pointer<sb_t>` since `sb_t` is a structure and as such is not stored by reference.
We could also imagine a standard `Reference<T>` type that would wrap a pointer to a `T` (and could exposes the API of `T` on it).
* What about function pointers that take a `_Ref` object?
When an API takes a function pointer whose type includes a `_Ref` qualified parameter, the qualifier applies:
```c
void take_cb(int (*a)(sb_t const * _Ref _Nonnull sb, sb_t * _Ref _Nonnull other))
```
```swift
func cb(sb: sb_t, other: inout sb_t) {
...
}
take_cb(cb)
```
Swift guarantees we cannot break the non-owning contract and that we respect the constness of the parameter. This is safer than using the `Unsafe*Pointer`-based alternative.
* Other use cases than Swift's?
The `_Ref` qualifier could be used by static analysis to check that functions don't access memory it shouldn't access: as long as some code manipulates some memory through a `_Ref` qualified pointer, it shouldn't access memory address bellow that pointer or above that pointer plus the stride of the type (an exception remains for types ending with a zero-length array).
* What about pointers to arrays of objects?
This is another topic. We could imagine a `_Array` qualifier that could take an optional length.
```c
/* The number of elements is statically known or passed as argument */
int main(int argc, char ** _Array(argc) argv)
/* The number of element is unknown. */
int puts(const char * _Array str);
```