1- Historical needs

Mixing C and PHP code in a PHP extension is a long-awaited feature.
Improvements in maintainability are obvious, and it is now widely agreed that
porting a lot of non performance-critical C code to PHP would be welcome. Today,
the PHP 7 performance improvements still provide more potential candidates for a
port to PHP code.

In theory, including PHP code in an extension and executing
it has been possible for a long time. Unfortunately, two important issues were
hard to solve :

A PHP script run from a memory buffer (using zend_compile_string() for
instance) cannot be cached by opcode caches. Code must be re-compiled every
time it is loaded.

Exposing symbols (classes, functions, constants) to the PHP user layer
requires to load every scripts at RINIT time, even if these features are not
used in the request.

Those constraints may be acceptable when considering two or three scripts,
but we are potentially considering hundreds of scripts, recompiled from scratch
at the beginning of every request (note that embedding the MongoDB library in
the MongoDB extension, for instance, represents about 60 scripts).

For these reasons, despite some exceptions, mixing C and PHP code in an
extension is very rare today.

2- Solutions

Unlike other PHP extensions, PCS does not expose features to the user space,
but provides a service to other extensions. The schema below shows how PCS
interacts with other extensions and the PHP core :

Client extensions have the possibility to interact with PCS after the PHP
code registration step but, in most cases, they don't, fully delegating the
management of their PHP code to PCS.

Let's see how PCS solves the issues I was
talking about :

2.1- Opcode cache compatibility

Opcode caches, as any cache, require a key for each object they cache. So,
trapping zend_compile_string() is not an option, as there is no way to get a key
to cache the compiled contents. So, scripts must be executed via
zend_compile_file(), and we need to provide unique and persistent paths to
identify each of them. PCS maintains a stream wrapper for this. This stream
wrapper, using the 'pcs://' prefix, maintains a tree of virtual files registered
by the client extensions. As these files cannot be overwritten, the unicity
between the stream-wrapped path and the file's contents is guaranteed.

This is the first required step but that's not enough. When detecting a
'stream-wrapped' path, opcode caches have no way to know whether the path should
be cached or not. Some should, like 'pcs://' ones, but many are transient by
nature and must not be cached. Today, the 'logic' is to cache everything
belonging to the file/plain and 'phar' wrappers, and to ignore the rest. The
easy way would be to add 'pcs' to the list, but I don't work the 'phar' way .
So, an additional stream operation, named 'cache_key', will be proposed soon for
inclusion in the PHP core. This operation will be used by opcode caches to ask
stream wrappers whether a given URI must be cached, and which key to use (the
key may potentially differ from the URI).

2.2- Minimizing load overhead

Several ways were imagined to avoid loading everything at the beginning of
each request :

Some consider that, using PHP 7 speed improvements and the opcache
extension, the overhead induced by script loading at the beginning of each
request will remain negligible. I have no measurements proving or disproving
such claims. If the measured overhead is really negligible for several
hundreds of scripts, we may decide to remove the whole autoloading stuff
from PCS. Unfortunately, this would require changing most of the
registration API because script load order could not be managed
transparently anymore.

Concatenating scripts and load one big script only is not possible because
of the different namespaces potentially used by the scripts.

Persistent user classes/functions/constants open the 'persistence' can of
worms. This goes far beyond our actual need and would require years of
dicussion and flame wars.

So, PCS combines these constraints and uses two load mechanisms :

PHP scripts defining classes/interfaces/traits only are autoloaded,

and scripts that define functions and/or constants are registered at RINIT
time.

The reasons :

The overhead introduced by a fast map-based autoloader is near-zero, as
the map is stored in persistent memory,

Most API exposed today in the PHP world are object-oriented (the MongoDB
library, for instance, contains 56 100% OO scripts, and only one defining
functions),

When the overhead introduced by RINIT loads becomes unacceptable, we can
easily extend the autoloader to functions and constants in a minor
distribution (can be done with no BC break).

Note that the autoloader is based on a symbol map. File paths and names are
free, and there's no limit to the number of classes/functions/constants defined
in a single file. Symbols are automatically extracted from the PHP source at
registration time.

2.3- Other features

You may also note that :

the structure of original file trees is preserved. So, relative paths
(prefixed with '__DIR__/') may be used to access other files in the tree.

PCS also allows to embed non-script files (aka 'resource' files). Such
file will be recognized as not containing a PHP script and will never be
loaded automatically by PCS. Such file may be used though the stream wrapper
as any other file of the environment (a potential example is the embedded
magic database).

The automatic determination of load modes at registration time may be
bypassed by the calling extension. Generally, this feature will be used to
disable automatic loading of scripts when it is handled by another mechanism
managed by the client.

3- Potential uses

Several have been given in past discussions :

Integrating the MongoDB library in the MongoDB extension is the subject of
this tutorial,