Wednesday, July 23, 2014

In my last post I talked about how developing hypermedia APIs has a lot in common with developing SQL databases. The key commonality is database schema normalization, or DRYness.

Like REST's Richardson Maturity Model, normalization has four levels (or 'forms') of achievement. Each form builds on the previous form until the database concisely models a reality. Normalization is thought of as an iterative process, where the data architect revisits each table and considers its irreducibility. If the table contains merged concepts, then that table is split into more related tables. As a result, a normalized database consists of many small tables and many more relationships between them.

In the same way, a good REST API consists of many small, irreducible resources that are linked together. One can follow the same iterative process of examining a resource, deciding if any data should live on its own, and then split it.

Why is normalization important? A normalized database can live a long time, possibly longer than the people who created it. It can grow without modifying previous tables. It is 'normal' in the sense that it accurately presents the normal view of the real domain it models, and that reality is not subject to much modification over time. Additions, yes, but redefinition, not so much.

While creating Elastic Path's Hypermedia API we developed two tests for normalizing a REST API. They are thought experiments that require a clear understanding of the reality of what is being modeled rather than the business goals of the API itself or any concerns with performance.

Immutability

A simple test of irreducibility is mutability over time. Does a field of the data change over time, or is it consistent over time? If it is mutable, it probably should live in another resource and be linked to it.

An example of this would be an Address resource. I often see Addresses represented like this:

Name

Street1

Street2

City

State

Zip

Country

Email

Telephone

Consider Name. Does Name change over time WRT the other fields? Name is someone who lives or works at this address, which is naturally mutable. One person moves out, another moves in. This may happen every year in an apartment building. Thus, Name cannot be an integral part of an address, but a field of a Person or Company resource. A link from Person to Address will establish where someone lives or works.

Email's place in Address is more tenuous. Email is closer to the Person, and can be found either on Person or more correctly linked to Person. Email addresses are shareable between Persons, and a Person often has more than one email address. However, emails do not generally transfer from one person to another over time like an address would.

Email reveals that this Address resource was probably designed from a Account Registration form. The business goal of Address is to capture a customer's address for shipping, but the resource to accomplish this goal is flawed. The flaw lies in the fact that it serves two purposes--data collection and data retrieval. In our API we decided to create an inbound Registrations resource to submit new registration data. The results of this submission surfaces across the API into resources like profiles, addresses, emails, telephones, etc. Not to divert the thread here, but this is essentially the CQRS pattern--commands to mutate state have a different model than the queries to view state.

Conceptual Atomicity

Telephone, unlike Email, is not quite as immutable as address. A business telephone will not change that often, if ever. Telephones for people can move around, especially when they are attached to a cell phone. Telephone is clearly not part of Address but a separate linked resource. It is not part of the Person resource because it can be shared by multiple people in a household, or a business.

The atomicity of telephone is challenged by societal changes, however, as people port and move their phone numbers to new providers and plans. It is thus tempting to just make telephone a part of the Person. In reality, though, people don't normally associate a phone number as an integral part of their existence. They will have more than one number, especially a traveller with several SIMs. The phone is a contact point to the person, and that fact drives it out of Person and into a standalone, atomic, immutable Telephone resource. Once you define a telephone number, it won't change. The relationships will change, of course, but not the data itself.

Immutability + Atomicity == Long Term Reliability

Once you achieve a normalized API definition, you achieve long-term reliability. By long-term, I mean years, possibly even decades. A normalized API is standardizable too, given that everyone's experience of reality is highly similar (or not?). Your address and my address share the same shape and semantics.

Yeah, but SQL has SELECT

Anyone who has programmed in SQL knows the secret to success in relational databases is 50% modeling and 100% query design. The SELECT statement gives the consumer of the database a way to create optimized, aggregated, denormalized views of the data itself. Hypermedia APIs do not have such a tool in common use. Two concepts have emerged to solve this problem however:

Expand

A few APIs have added the ability to expand a GET to include related resources in the result. Elastic Path's Cortex API calls this feature zoom. An example would look like:

GET https://api.com/carts/id?zoom=prices,lineitems:item,lineitems:totals

This call retrieves the cart resource and also retrieves specific linked resources: prices and lineitems; lineitems is further followed to retrieve the item and totals associated with each line item. With this one call a consumer can retrieve a denormalized view of the resource to render to the client.

SPARQL

This one is a bit of a stretch, but I have found that the more I develop Hypermedia APIs, the more I refer to RDF concepts. In fact, Brian Sletten's "Resource-Oriented Architecture Patterns..." may be the Gang of Four for Hypermedia API design. SPARQL is an RDF query language that combines data sources, schemas and query statements. It is not hard for me to imagine mature hypermedia APIs providing query capabilities that support SPARQL.

The Silver Bullet

Solving the query denormalization requirement is important, but full of unknown unknowns. There is a Hypermedia query silver bullet yet to be discovered, but no magic wand. Firing silver bullets requires just as much work and thought as regular bullets; the only difference is silver actually kills werewolves. We don't know exactly what kind of bullet will give Hypermedia the query power it needs, but it must be found to give Hypermedia a mainstream future.