GraphQL is for silos

GraphQL is booming.

After Github released its GraphQL API it’s clear that many other services and developers are going to try to adopt the technology. It might have been a turning point.

There are many things to like in GraphQL. For example, the typing system, the idea of having, at last, an effective schema mechanism connecting client and server, or the view of a unified data graph you can query. Many of those ideas are not new at all, but if GraphQL is able to finally make them popular and accepted by development teams, I would be very happy.

However, there’s a main design problem with GraphQL that needs to be addressed: GraphQL is for building silos.

This open issue in GraphQL Github’s repository shows the problem clearly.

GraphQL exposes a single data tree, through a single end-point. All your data is captured in that single data space and cannot reference or be referenced from other GraphQL end-points in a standard way.

Open de-centralised systems don’t exist in the GraphQL world in its current form (which is not a surprise taking into consideration the original authors of the technology).

Of course, most organisations are just building their own public facing silo. Most of them have just a few clients they control directly: a JS web app, mobile apps, etc.
In this context GraphQL might be an attractive solution, specially because the integration between your client and the data layer is more sophisticated than what you can get with the mainstream interpretation of REST as some kind of HTTP+JSON combo.

But even if this is the case in your external facing API, probably in your back-end the landscape looks a lot more like a loosely coupled federation of services trying to work together. In this context, HTTP is still the best glue to tie them together in a unified data layer.

It should be possible to modify GraphQL to solve this issue and make the technology open:

Replace (or map in a standard way) IDs by URIs: If I’m going to reference some object in your data graph I need to be able to refer to it in an unique and unambiguous way. Also your identifiers and my identifiers need to coexist in the same identifier space. Relay global object IDs are half-way there.

Add namespaces for the types: If you are not alone in the data universe, you might not be the only one to define the ‘User’ type. Even better, we might want to re-use the same ‘User’ type. Extra points if the final identifier for the type is a URI and I can de-reference it to obtain the introspection query result for the type.

Add hyperlinks/pointers to the language: I want to hold references to objects in this or other graphs using their IDs/URIs.

With these three changes, and introducing a shared authentication scheme, a single GraphQL end-point could be broken into many smaller federated micro-GraphQL end-points conforming a single (real) data graph. This graph could also span multiple data sources in an organisation or across organisations. In a sentence, it could be a real alternative for HTTP and REST.

The flip side of all this is that the pieces and technologies to provide the same level of experience GraphQL offers to developers have been available for HTTP as W3C standards for more than a decade. From the foundationalcomponents to thelatestbitsandideas to bind them together.

It’s sad, but the surge in popularity of GraphQL makes it more clear our failure in the Linked Data and SemWeb communities to offer value and fix real problems for developers.

6 thoughts on “GraphQL is for silos”

I think the optimism for GraphQL on the server side is overrated. Clients should be the ones implementing GraphQL, and only then, possibly, also servers. The whole discussion of how easy GraphQL makes things, has everything to do with the language, not the protocol. So implementers should give us the language on the client; and we shouldn’t care how this is translated into one or multiple HTTP requests, or to one or multiple servers. (Of course, on the language level, the problems you list above should be tackled.)

My limited experience here, and my intuition, is that if you do it on the client, the round trips will kill performance. You only have time for, maybe three round trips before the user gets annoyed. How can you do any kind of graph traversal, let alone join, like that?

Nice blog Antonio! Whenever I’ve read about the idea of running graph queries joining data from multiple endpoints from different providers I’ve always wondered about the feasibility of relying on the availability of all those providers. If each endpoint has 99% availability, and you use 5 in a query, does your query have ~95% availability? And I guess it’s unlikely the requests could all happen in parallel, so will the request time be ~ the sum of all the individual request times? I’m sure these are problems that have been thoroughly addressed, I just don’t know the domain that well.

Very interesting idea! I like the idea of allowing the client to reach out to a bunch of different servers in order to run a single graphql query. I hope this is done someday. Currently you can have your client make multiple graphql sent to multiple servers, but you can’t combine those into a single query on the client side. However, this is not so bad in practice because you can do the federation on the server side and build your own silo, as you describe with the server pointing to all the different servers and uri’s. And then the client gets the illusion that the data from all the different servers is unified, and this unified api can be queried with a single graphql query. And this doesn’t really need to have a perf penalty once the @defer specification is implemented.