Beyond Data

Most of us are used to building backend systems in a specific way. No matter the stack used, you usually have a web server in front that besides serving static content, routes all the requests to a collection of scripts that implement all the logic in the system (probably using some kind of framework to organize them). At the bottom of the stack sits the database that is only responsible for storing the data. This of course works but is not the only way to build systems and not necessarily the best way. The edges of this system (web server/database) are very powerful tools with a ton of functionality embedded in them and when properly leveraged, can result in a dramatic reduction of code complexity, performance gains and an increase in iteration speed. Just because your logic is in a single language/place does not mean the system is simpler. This starter kit takes a holistic approach when implementing an API and tries to leverage all features of the underlying components to achieve a particular goal.

Try to think of these components, not as separate processes (or even worse, distinct hardware) within your system but rather think of them as modules in your code. Just like in a traditional system control flows through a bunch of modules, an API request flows through these components and at each step you can take certain actions based on the current request. Just like in a traditional stack you would not implement caching in your auth module, or your "active record" module, it's the same here, you do not implement caching in the database but in front of it, even before the request hits the database. You do not send emails from within a stored procedure in the database, you send the email with a separate system that is only triggered by a database event.

By default, subZero will allow the client to filter by all columns and different operators. When your tables are small and you do not have a lot of users, this is not a problem but it can become one if you allow filtering on a text column using like operator for a table that has hundreds of thousands of rows, especially if you do not have an index on that column. As a general rule, you should not allow filtering by a column if that column is not part of an index.
Let's see how we can accomplish this seemingly complicated task.

Open openresty/lualib/user_code/hooks.lua file and add a function that will be called before each request (add it at the top of the file)

localfunctioncheck_filters()print'I will be making sure no one does anything fishy'end-- ...

and make sure to call it in on_rest_request function.

In your dev console, switch to the OpenResty tab and observe the result of the following request

Now that we know our function is being called, let's see how we can accomplish our task.
The best source for writing lua scripts in the context of OpenResty is the github repo README of the lua-nginx-module

Change the function check_filters to this definition

localfunctioncheck_filters()localblacklist={projects_name_like=true}localtable=ngx.var.uri:gsub('/internal/rest/','')localargs=ngx.req.get_uri_args()forkey,valinpairs(args)dolocalcolumn=key;localoperator,value=val:match("([^.]+)%.(.*)")ifoperatorandblacklist[table..'_'..column..'_'..operator]thenngx.status=ngx.HTTP_BAD_REQUESTngx.say('filtering by '..column..'/'..operator..' is not allowed')ngx.exit(ngx.HTTP_OK)endendend

This function is not perfect yet, the pattern matching could use a little more care but you get the idea how a few Lua lines in the OpenResty context can be a powerful thing when you want to change or alter the incoming request before it reaches the database.

All the components in a subZero system are fast and in most cases you don't need to worry about the load on your system and caching, especially if you were careful about restricting filtering capabilities in the section above. Don't go fixing problems you do not have. There are however cases when you have some entities, requests to which can and should be cached. Usually they will be views or stored procedures that implement a complex report that takes some time to generate. For this subZero exposes two lua hooks that allow you to fully control the caching subsystem. The functions are get_cache_key and get_cache_tags, the first one allows you to say "I want this response cached" and the second one allows you to say "I want cached responses with these tags to be invalidated". Let's suppose we have our clients view and it does not change that much and we were asked to introduce a property for each client called profitability which we implemented using a computed column and it's a time consuming computation. We would like to cache the responses for the clients endpoint to reduce the the number of times that complex profitability function is called.

Whenever there is a request to the clients endpoint, we compute a cache key for the requests. This key can be any string that uniquely identifies this request type. Notice we included the value of the Authorization header which identifies the current user so that we do not server a cache for one user to a different one. We also attach a tag to this cache instance so whenever a new client is inserted or a client is updated, we can invalidate it.

Whether we admit it or not, most of the time, 90% of what an API does is CRUD, enforce access rights and validate input, and that is OK :)! Up to this point, we've shown how you can do all that. If it mostly concerns the actual data, PostgreSQL is very flexible and allows you to easily interact with it, PostgREST provides the bridge from HTTP to SQL and OpenResty acts as a router in front of everything and it gives you the possibility to inspect and alter HTTP requests and responses that are in-flight.

However, even the most simple APIs usually have a couple of "paths" that need to interact with some other system and not only the specific data stored in the database. A classic example is sending an email on signup. The first thought is usually that you will have to do it inside a database trigger and actually you can do that, send an actual email. This shows just how versatile Postgres is. But you shouldn't, talking to remote systems is slow and you will be blocking for a good amount of time one database connection/process. Unless you are building a system with only a handful of users, avoid talking to remote systems from functions running in the database. Another possible way to implement this is by leveraging OpenResty and the ability to run custom logic at different stages of an HTTP request. In this specific case, before we send back the response to the client when he accesses signup endpoint, we could look at the response the database sent (and PostgREST forwarded) and if it was a success, we could run our custom function that sends the "welcome" email.

Info

It's important to remember that when in the Lua land running inside OpenResty/Nginx, you have complete freedom, you can do whatever you could have done in Node/PHP/Ruby, you can talk to other systems, run additional queries, make requests to other APIs.

If you only have a handful of these type of cases than doing it in OpenResty, before or after the request reaches the database, is the path you should choose. It's just less moving parts.

But there is a third way! This method allows for greater flexibility, decoupling and separation of concern. When you have to react to all kinds of database events and implement custom logic that talks to other systems or maybe execute some long running computation, you can use this pattern to decouple the main task from additional work that needs to happen. The trick here is to first recognize that the expensive computation you need to run (send email, generate report, talk to remote system) is a thing that can be executed asynchronously from the main request, you don't need to actually send the email to the user before you reply to the request with "signup success", you can send the email a fraction of a second later. If you look carefully, you will notice that this pattern comes up often, a lot of the tasks you need to perform do not have to necessarily be executed before you respond to the API request. This is exactly where a Message queue is useful. Whenever an event occurs in your system (a user just signed up for your service), you send a message to this queue and let a completely separate system (usually a simple script) take care or performing the additional tasks.

While this sounds complicated and fragile it's actually neither. For most of the systems, this pattern can be easily implemented using an events table. Your triggers will insert rows in this table whenever an even occurs triggered by an API request. Another script will read rows one by one from this table and do the additional work. Here is an article explaining how to reliably do this using a table as a queue. You'll might hear that using the database as a queue is bad or does not scale, but this exact method is used by Stripe and it works well for loads up to 100 events/s and most of the time that's all you need.

For simple things like sending emails the table as a queue works great and does not require additional components and if you can get away with it, don't complicate things, use it.

But we all know that everyone wants live updates these days. It is next to impossible to do this with an events table. It would mean that every mutation in your database would have to be mirrored in this table. This is where the final piece of the puzzle comes in.

I am going to do a magic trick now and pull a RabbitMQ out of my hat. Just like the other components in our stack (OpenResty/PostgREST/PostgreSQL), it's an immensely powerful, flexible and battle tested piece of tech. It will give us yet another integration point and among other things will completely solve the "real-time updates" problem for us.

Before we get into that, we have not actually sent that "welcome" email yet. Let's do that.

Starter Kit already has everything in place to make sending messages from PostgreSQL to RebbitMQ easy.
All we need to do now is to attach triggers to tables for which we want to monitor events/changes.

So whenever something happens to a row, this function will take the row, turn it into JSON and generate a new message that will enter RabbitMQ. Before we look at where exactly that message ends up, we need to take a look at the routing_key since this is a very important part of what happens to the message when it is received by RMQ.

Info

If you are completely new to the concept of message brokers and how they work, you can check out RabbitMQ Tutorials

When a new signup event occurs, this will generate an event that will go through the message broker and every interested party can react to it. The message will look something like

The routing key tells us exactly what happened (a row got inserted in the user table) and the message payload contains that actual row.
This message will be sent on the channel events, which by default is linked to amq.topic exchange in RMQ. The message proxying between PostgreSQL and RabbitMQ is done by pg-amqp-bridge utility which is configured here.

As you can imagine, all kind of events will flow through that exchange. When we write a script that wants to send emails on signup, it does not mean that this script will receive every event and have logic to ignore the ones that do not matter to this script. Your script (message consumer) can tell a message broker what type of messages it is interested in, and it will send only the ones that match your specification. In this particular case we are interested only when rows are added to the user table, so we bind to this exchange using the following routing key

row_change.table-user.event-INSERT.#

If we wanted to monitor all events that happen on the user table, the binding key can be

#.table-user.#

or maybe we want to inspect all the events that were triggered by a specific user, in this case, the binding key would be

#.user-10

Info

It is important to understand that the format of the routing key is not something hard coded. You are free to change it to suit your needs. You are also free to send the events to any exchange you like, just remember to configure pg-amqp-bridge to forward them.

So at this stage, we have a system in place so that whenever someone signs up, this will trigger an event that will result in a message entering RabbitMQ. Let's see how we can actually send the email.

You can implement this script in any language you like but just to show how simple this can be, we'll do it using a shell script, just because we can. We could cheat and use something like Mailgun but instead, we'll actually send the email using our own Gmail account.
Let's create our worker script

This script can run on completely separate hardware, decoupled from your main system. Another nice benefit is that even if the script is stopped or the hardware crashes because we specified -q signups --durable, RabbitMQ will hold on to all the signup events that happened in the meantime so when we restart the script, it will process the events retroactively.

subZero and this starter kit will take you 95% of the way when building your API. You will get there in no time and with an almost trivial amount of code. However, there are situations when it's hard to implement some use-cases within the constraints of a subZero based project. This is a general problem with systems that try to automate API creation, there are cases when it feels like you are fighting the tool and not solving your problem. You don't have to make that sacrifice here. It is possible to take full control of the flow.

An example when an automated API creation tool does not quite provide the flexibility needed is the case of integrating a payment processor. Usually, these types of systems provide their own SDKs that you have to use and they generate quite a lot of back and forth with a remote system. Some services embed in their core code integrations with specific well known 3rd party systems (slack, stripe, etc.) but you don't have the freedom to integrate with anything you want, you are limited to maybe triggering a few web hooks here and there.

Let's get back to our example. We've built our API (and frontend) and users are happy with it, it's time to get paid for our hard work. We'll charge our customers a one-time fee of $99 and we'll use Stripe for that. Our front end uses GraphQL (or REST) for communicating with the backend, however when integrating with Stripe, it is convenient to use it's recommended flow and that basically means you have to process a form submission.

Your frontend will display a page with a button like described in the Stripe docs. Once the user goes through the process of entering his credit card, Stripe will hand over control to our system where based on a token, we create the customer and charge his credit card.

Now comes the interesting part, we hijack this URL and handle the request ourselves instead of letting it reach PostgREST and the database.

I will be using Lua directly to implement this custom endpoint but you are free you route this URL to any stack you chose. You can use your traditional stack alongside PostgREST to handle a few tricky bits and to your api users it will be indistinguishable.

Info

To have these types of endpoint also apear in our GraphQL api, create a dummy stored procedures in your api schema like this

Here is a script (which could use some error handling) that will accomplish the task.
Since we are taking over, we are responsible for checking the user is authenticated before making any additional work.

-- openresty/lualib/user_code/charge_customer.lualocaljwt=require'resty.jwt'localcjson=require'cjson'localjwt_secret=os.getenv('JWT_SECRET')localstripe_key='sk_test_xxxxxxxxxxxxxxxx'localstripe_endpoint='https://api.stripe.com/v1'localstripe_auth_header='Basic '..ngx.encode_base64(stripe_key..':')-- check authorisationlocalauthorization=ngx.var.http_authorizationlocaltoken=authorization:gsub('Bearer ','')localjwt_obj=jwt:verify(jwt_secret,token)ifnot(jwt_objandjwt_obj.valid)thenngx.status=ngx.HTTP_UNAUTHORIZEDngx.exit(ngx.HTTP_UNAUTHORIZED)endlocaluser_id=jwt_obj.payload.user_id-- main script logicngx.req.read_body()localargs=ngx.req.get_post_args()localstripe_token=args.stripeToken-- interact with stripe-- we could have used a nicer lua lib https://github.com/leafo/lua-payments but we can do this lo level alsolocalhttp=require"resty.http"localhttp_client=http.new()-- save the customer in stripelocalres=http_client:request_uri(stripe_endpoint..'/customers',{method="POST",headers={['Authorization']=stripe_auth_header,},body=ngx.encode_args({['metadata[user_id]']=user_id,source=stripe_token})})localstripe_customer_id=cjson.decode(res.body)['id']-- charge the clientlocalres=http_client:request_uri(stripe_endpoint..'/charges',{method="POST",headers={['Authorization']=stripe_auth_header,},body=ngx.encode_args({amount=9900,currency='usd',description='Khumbu Icefall Service',customer=stripe_customer_id})})ngx.say('true')-- we could also redirect the user here

You don't have to fight subZero and jump through hoops to accomplish complicated flows, you can just sidestep it completely in a few specific cases and write your own custom code.