Jan 7, 2015

HHVM Extension Writing, Part III

In Part I of this series, we looked at the basic building blocks of a simple HHVM extension skeleton. Part II continued with three of the five "smart" datatypes used throughout the API. But now we get to have some fun. It's time to start looking into declaring classes with methods and properties and constants (oh my!).

As you should expect by now, compiling your extension with this bit of code should mean that the Example1_Greeter class is now available to all requests and may be invoked like any other class definition. Let's apply what we already know from making native functions, and see how it works with methods...

While you're probably tempted to rush off and add an HHVM_FE() and HHVM_FUNCTION() implementation to the C++ file, you'd only be half right. These aren't functions, they're methods, and as such have a different set of macros.

Similarly, properties and constants may be declared directly on the Hack definition of the class in your systemlib file. I won't bother showing it here, since you all know how to write PHP code, but I'll put an example or two in the git repo.

What's marginally more interesting, and something you can probably guess at from the coverage in Part I, is that you can declare class constants from C++, meaning that they can take on values defined in external headers or computed values. Let's add one from moduleInit().

Binding internal data

What we have so far is all well and good for simple classes, but most extension classes will need to store some opaque pointer from an external library somewhere on the class that can be easily referenced later on. For that, things start to get a little bit more complicated. To help clarify what we're doing, I'm going to wipe the slate clean by moving example1 to its own subdirectory, and starting fresh with example2.

Yeah, kinda messy changing my whole directory structure around midway, but would you rather I ran `git push --force`? Yeah, I thought not.

The code skeleton we're starting out with is at commit: 20dc4ef824 in example2/.

To illustrate something slightly less contrived than earlier examples, I'll be wrapping the POSIX FILE* object in a simple PHP class. Let's start with something basic in our systemlib, containing just a constructor:

A new user attribute has appeared! <<__NativeData("Example2_File")>> tells the runtime that this is no ordinary object. This object should be over-allocated with enough to space to handle some internal C++ object, identified by the quoted name. In practice this is usually the name of the class it goes with, but it doesn't have to be. How does this hook up to internals? That comes next, within ext_example2.cpp by adding an #include "hphp/runtime/vm/native-data.h" and the following code:

We're only opening (and ultimately closing) our file at this point, but these are really important stages in an object's lifecycle, so this is worth going though slowly. When a PHP script calls $o = new Example2_File(__FILE__, "r");, the first thing the engine does is allocate space for the object. This is done by adding sizeof(ObjectData) (the standard, base object size) to the size given by any NativeDataInfo associated with it. We made that association by calling Native::registerNativeDataInfo(StringData* id);, where T is the C++ class type to allocate with the ObjectData, and id is the symbolic name we gave it in the syetemlib file using <<__NativeData("id")>>.

Next, the engine invokes the constructor, providing a pointer to the object via a hidden ObjectData* this_ property in the C++ method's signature. From here, we can get access to our private data structure by using the Native::data() accessor to jump to the correct offset from this_. At this point, we have access to a normal C++ object which just so happens to be bound to a PHP object, and we have constructor parameters as well!

From here, there are two reasons a PHP object might die. In the expected case, it runs out of references and is destructed during the course of request's runtime. In this case, our auxiliary object has its destructor called as well, so that external pointers can be cleaned up nicely. The other time a PHP object can die is when the request is shutting down. This is somewhat more exceptional since the memory manager is sweeping ALL request-local data, not necessarily in the most ideal order. It's up to your auxiliary class to deal with non-sweepable resources, but trust that the runtime will deal with resources which are. This is resolved by having a secondary psuedo-destructor called sweep();. For simple implementations like ours, we want the regular destructor and sweep to do the same time, since the only members of this C++ class are external pointers. If we have sweepable resources such as an HPHP::String however, we'd want to avoid the implicit member destruction which comes with calling ~Example2_File(). It's entirely possible that the internal state of that String is no longer valid because it was sweeped first. Hence the need for a separate sweep() function.

TL;DR? - Just have your destructor call sweep(), and deal with external pointers in sweep(). That's good 90% of the time.

You might also have noticed that I'm throwing an exception in the assignment operator. This is normally used for handling a clone, where you'd probably duplicate the FILE* handle, but I realized midway that POSIX file streams don't really have that notion, so I took the easy way out and threw a standard exception. In practice, the implementation would probably look something like: