Tag Archives: metadata

One of the longest standing feature requests for the API is to add support for modifying post meta. While at first it may seem like a reasonably simple request, it becomes quite the rabbit hole when you begin digging into the issues surrounding it. Here’s a quick summary of the issues surrounding meta, and how we’re looking at solving it.

The Issues

The biggest issue with post meta is the difference between the data model in WordPress, and the most common usage of meta. In general, the main meta usage is as a key-value store; that is, a one-to-one mapping between a meta key and a meta value. However, the data model in WordPress allows multiple values per key. Handling these in a coherent way is challenging, as we want to handle the most common case easily while also making the multiple value model possible.

The other main issue with meta is serialized data. As anyone who’s used post meta in WP knows, WP will store any type of data (except resources or closures) by serializing the data under the hood. This isn’t usually an issue, as the process is transparent for most uses of post meta. However, this presents a problem when accessing the data via the API, as we cannot expose this data transparently.

JSON has no distinction between associative arrays and objects, as associative arrays only exist in PHP. In addition, JSON cannot pass objects and their types; that is, the representation of stdClass and a custom MyAwesomeObject are the same in JSON. Exposing this data using the default JSON encoding semantics would cause data loss. On the flip side, while exposing the raw serialized string would not cause data loss, this would expose protected and private properties on objects, as well as the internal implementation details including the class. This could expose critical internal data.

In addition to these issues, combining the two can cause further problems. With a naive approach of mapping key-to-value in a JSON object and allowing multiple keys and serialization, we could have a result like:

{
"my_key": [
"value1",
"value2"
],
}

However, it’s impossible to tell whether this is a key with multiple values, or a key with a single value of a PHP array.

If we treat this as a key with multiple values, updating the values could prove problematic. How do we distinguish between adding elements, updating existing elements, and removing elements? Simply leaving elements out does not necessarily mean we want to remove them, as we may just want to reduce the amount of data being sent over the wire.

Proposals

With these issues considered, there’s a few resolutions we need to implement.

The first resolution is to not handle serialized data at all. That includes displaying it and allowing modification. For all intents and purposes, serialized data will be treated as protected meta. We cannot avoid this, due to the object data loss issue.

We’ve now come up with two proposals on how to handle updates.

First Proposal

The first proposal by Rachel Baker and Taylor Lovett uses the following format for reading the data:

This approach has the advantage of keeping data access fairly simple, and means the most common case of key-value storage is simply post_meta.my_key[0], while multiple values can iterate post_meta.my_key.

However, it has the disadvantage that the input format does not match the output format. This means that you cannot send the post data straight back to the server without causing an error. In addition, it mixes actions into the data itself that is sent to the server. Multiple values are also not handled by this, however this could be corrected by including a previous_value when updating.

Second Proposal

The second proposal by myself builds on Rachel and Taylor’s work, but changes the format slightly. Data looks like the following when reading a post:

This approach has the advantage that the input format matches the output format. Submitting the post data back to itself will have no effect, as the value will already match. In addition, the action to take is implied from the data, rather than specifying it in the data; updating a value requires just updating the value, adding a value requires adding a value without an ID, and deleting a value requires renaming the key to null (that is, specifying an empty value). This format uses the meta ID from the database as the primary key, allowing manipulating multiple values easily.

One of the disadvantages of this approach is that reading data becomes more complicated. Accessing all data for a key now involves filtering the meta values on the client side, rather than a simple lookup. While this is reasonably easy to achieve, it’s not as obvious as a straight access (and also has worse performance characteristics). The updating format is less obvious, as it’s implied from the data format rather than being spelled out explicitly.

Other Approaches

One approach that isn’t considered above is using the first approach’s data for the post itself, and exposing meta in the second form via another endpoint (e.g. /posts/[id]/meta). While this would enable both simple and complicated uses nicely, it also introduces significant fragmentation and duplication. This means developers would need to learn and support two separate methods of achieving the same result, and also work out internally which to use. In practice, clients would end up simply supporting a single approach for consistency. This approach would also violate the Decisions, Not Options mantra of WordPress.

Decisions

We need to make a decision on how we handle meta data. Personally, I’m biased towards the solution I wrote, but it’s not the perfect solution, and we’ll never have a truly perfect situation. Both approaches are a compromise, and we need to decide on which compromise we want to choose.

I’d love to hear thoughts on which approach people would prefer, and anything we may have missed during consideration.