Mark Stosberg

rsnapshot nicely illustrates some of the features of systemd timers. As of Ubuntu 16.04, the rsnapshot package doesn't ship with systemd service and timer files, so you have to set them up yourself.

Before I show how to setup rsnapshot with systemd, I want to cover some of the benefits of using systemd timers instead of cron. ## Cron syntax vs systemd timer syntax

systemd timer has a syntax for specifying repeating events than the old cron syntax.

Here's how to specify a "weekly" cron job at 3 AM with the old cron syntax:

# cron syntax
0 3 * * 1

Here's the same thing expressed in a systemd timer file:

*-*-* 00:03:00

If you correctly guess that the right side in the systemd format represents Hour:Minute:Day, then you are well on your way to correctly guessing that the left side represents 'Year-Month-Day' format.

The old cron syntax looks nothing like a date and has to be memorized.

Logging and notification

A traditional well-behaved cron job produces no output unless there is an error, in which case the output will be mailed to you.

Because rsnapshot runs an important backup job, it's useful to have some detailed output about what it did, even on successful runs.

With systemd, output is sent to the systemd journal by default. You don't hav to set up any output redirection or log rotation just to have good logging. Once set up, you can get logs for any of your rsnapshot runs with syntax like this:

journalctl -u rsnapshot@daily

If your login use in the systemd-journald group, you also don't have to be root just to check out the logs. You can still get a nicely formatted email if rsnapshot fails. More on that below.

Easy status reporting

systemd makes it easy to check on the status of your cron job. When did it run last? Was the last run a success or a failure? What were the most recent logs?. All those questions can answered with a quick check of the status command:

systemctl status rsnapshot@daily

You don't have to root to check that either.

Configuring systemd unit files for snapshot

Disable cron files

To avoid conflict, make sure you disable running rsnapshot via cron in /etc/cron.d/rsnapshot

One time Setup: email-on-failure

In Ubuntu 16.04, there's a not a built-in solution for getting email-on-mail when using systemd timers instead of cron. This is the one drawback I've found. However, once you set this up, you can re-use the solution for all your other systemd timers.

The following assumes you've already got outgoing email working on your box.

First, install a script which will mail diagnostic output for a failed systemd service ((more context](://bugs.launchpad.net/ubuntu/+source/systemd-cron/+bug/1583743)):

Creating rsnapshot timer files

Now you can create timers for the daily, weekly, monthly rsnapshot runs. In the example files I've set the daily task to run at 5:30 AM, the weekly task at 4:30 AM and the monthly task at 3:30 AM. Adjust to suit.

The future

If this all seems more complex that it should be for scheduling rsnapshot backups, it is. Both rsnapshot and your OS distribution could offer better support for systemd and sending-email-on-failure with systemd. If they did, you could get benefits using systemd timers with rsnapshot by using setting your MAILTO environment and running systemd enable for the timers you wanted to turn on.

If this appeals to you, considering communicating with the rsnapshot project and possibly submitting a pull request to improve support here.

You rotate your logs and then find the log files are empty. It appears no more data is getting written to them.

Having done server maintenance for several years, this is a common issue that is not specific to the app writing to the log file, such as forever. I understand what's going to be this:

While you appear to be logging to a file, you are really logging to a file descriptor. After log rotation by an external application, the application continues to log to a file descriptor, but now the file descriptor is no longer connected with the file. The file has been re-created through log rotation. While the log file may be empty, your disk space may well be continuing to increase.

logrotate and copytruncate

The copytruncate option to logrotate is one solution. This is designed to workaround the file-descriptor-vs-file issue described above by leaving the relationship intact. Instead of renaming the file, the contents of the file are irst copied to the rotated location and then the original file is truncated back to an empty state, as opposed to renaming the file. This works, but feels like a workaround.

Restart the app

logrotate and similar solutions can help you send a command to restart the app during log rotation so that filename-vs-file-descriptor relationship gets refreshed. This works too. If like me, you are also on-call to respond to problems with apps restarting at midnight, you would probably prefer to find another solution that doesn't mess with your application in the middle of the night. (What could go wrong with simply restarting an app in the middle of the night?)

Build log rotation into your app

You could build log rotation into whatever app is doing the logging, but this is a general problem. Does it make sense for every single server or process supervisor to roll-its-own log rotation solution? Surely there's a more general solution to this.

Log directly from your app over the network to syslog or a 3rd-party service

Logging directly from the app over the network avoids the direct use of log files, but most of the options I've looked for this in Node.js share the same design flaw: They don't (or didn't recently) handle the "sad path" of the remote logging server being unavailable. If they coped with it at all, the solution was to put buffered records into an in-memory queue of unlimited size. Given enough logging or a long enough outage, memory would eventually fill up and things would crash. Limiting the buffer queue size would address that issue, but it illustrates a point: designing robust network services is hard. Your are likely busy building and maintaining your main application. Do you want to also be responsible for the memory, latency and CPU concerns of a network logging client embedded in your application?

For reference, here are the related bug reports I've opened about this issue:

If you are using a module that logs over the network directly, you might wish to check how it handles the possibility that the network or logging service is down.

Log to STDOUT and STDERR, use syslog

If your application simply logs to STDOUT and STDERR instead of a log file, then you've eliminated the problematic direct-use of logging files and created a foundation for something that specializes in logging to handle the logs.

I recommend reading the post Logs are Streams, Not Files which makes a good case for why you should log to STDOUT and shows how you can then pipe to logs to rsyslog (or another syslog server) from there, which specialize in being good at logging. They can do things like forward your logs to a third party service like Logentries, and handle potential networks issues there outside your application.

Log to STDOUT, use systemd

systemd can do process-supervision (like forever, nodemon and pm2 in the Node.js ecosystem), including user-owned services, not just root. It's also designed to handle logging that services send to STDOUT and STDERR and has a powerful journalctl tool built-in. There's no requirement that your process supervisor be written in the same language your app is.

Systemd is included with Ubuntu starting with 16.04 and is already standard in Fedora. CoreOs uses Systemd inside its container to handle process supervision and logging, but also because it starts in under a second.

How to Log to STDOUT when your app is designed to log to a file?

If you aren't using systemd as your process manager, logging to STDOUT may be tricky. Combining pipes with backgrounded services can be hard, and some apps and process monitors only support options for logging to a file, not to STDOUT.

Bash process substition may help you in that case. The syntax of process subsitution in bash is a greater-than or less-than symbol immediately followed by a parenthesis. Within the parens you can put the command that you'd like to pipe STDOUT to.

Bash will then substitute a temporary path to a file descriptor that pipes the content on to your command, like a named pipe. Here's an example using forever:

forever -a -l >(logger -t myapp) start ./myapp.js

When you run forever list, you'll see an entry in the logfile column that looks like /dev/fd/63. Just like a regular log file, this syntax works even when start runs the app in the background. logger is a syslogd client that forwards the logs on to syslogd in this case.

If you are a web developer also involved in managing the servers hosting your apps, then the idea of "serverless" app hosting needs little introduction.

Using AWS Lambda in combination with AWS API Gateway is catching as a combination for deploying simple web apps. Besides the lack of a server to maintain, scaling is handled automatically, and you only pay for the requests actually made to the app.

My task was to review options suitable for serving a simple mix of static and dynamic content using Node.js. We're used to developing with the Express web server, so the new solution needed to be easy to transition to, since we would be maintaining this new service side-by-side with other Express apps.

For an excellent introduction to running an Express-style app on AWS Lambda and API Gateway, I recommend the post Server-free Express.js.

Legedemain and dpl

On the face, it sounds like a great fit: Legedemain helps you run Express apps on AWS Lambda. I soon found it wasn't a complete solution. It doesn't handle deployment to AWS Lambda and AWS Gateway currently.

Not to be immediately deterred, I tried pairing it with aws-lambda-deploy. But this only updates Lambda and not API Gateway.

Trying this approach made me realize I wanted an integrated approach that could manage API Gateway and Lambda together.

serverless

After some more looking I found serverless, which is a very popular framework for managing both AWS Lambda and API Gateway together. Although there a lot of docs, I ran into problems.

serverless is not meant to run Express apps. Instead, you are encouraged to write more "native" logic. In theory, I like this idea. It's simpler with less abstraction and data translation. serverless has it's own Plugin and Hook system which can work like Express Middleware. API API Gateway and AWS Lambda already handle the translation of HTTP request to and from JavaScript objects, so they replace the role of Express. I was willing to adapt to this.

Serverless advocates very small Lambda functions that handle maybe just one route endpoint, or maybe a small collection handle the CRUD actions for a single type of entity. There are some good reasons for this. Small Lambda functions would start up faster. Since they spend less time executing, they cost less. Also, you can upgrade some functions without touching the others, further increasing uptime. serverless also let's you manage several Lambda functions a single project so you don't have to be concerned with how many Lambdas are stored and running on AWS.

Serverless has a name for an Express-style app with lots of routes served by a single server. They call that a "Monolithic app" and don't have many docs for monolithic apps. I opened the linked issue to suggest they improved such docs and my issue got tagged with "priority: low".

While Serverless has a lot of positives, the fact that the model is very different from Express-style apps was a downside for my team. Another downside is that much of the configuration involves editting somewhat complex JSON files. Here their otherwise verbose docs fail with some handwaving that you should be very familiar with the docs for API Gateway and Lambda configuration. So, rather than abstracting away API Gateway and Lambda behind a single interface and set of documentation, now you have three sites of documentation to reference. For something we may touch infrequently, there's too much required context to make changes that would be simple with Express.

The JSON structure includes such keys as name and customName. Since name is already a variable that be set to custom volumes, would you care to guess what the customName is? Better names would have been serverlessName and lambdaName which highlight the difference better.

While Serverless is popular and appears it will have a bright future, I found it would not be a good fit for us. For developers who work with Express 95% of the time, serverless has too many differences from traditional Express development to make it an easy addition to a workflow.

There seems to be a lot of momentum behind the serverless ecosystem and I expect to checking on it again in the future.

aglex

While I'm afraid I may be the #2 user of this tool in the world, I can heartily suggest that perhaps you should be the #3.

By this point I had a clear sense of what I was looking for and aglex has it. Using this simple CLI tool, I was able to quickly deploy a (fairly) normal Express app. aglex than handled configuring and deploying both API Gateway and Lambda to bring the app online.

The documentation is a simple one-page README and it was beautifully sufficient.

After I set up project and it will be straightforward and familiar for co-workers to add a few more Express routes to the app when we need to and deploy the related changes to API Gateway and Lambda. No deep dive into complex JSON structures or the diving into the depths of the API Gateway and Lambda docs is required.

That said, as newer, less popular project aglex has it's own rough edges which I hope get smoothed out soon:

I also had difficulty serving a static JSON file, but was eventually able to do so. It appears primarily designed to return JSON. The configuration file explicitly mentions serving static files, so it appears this is a supported use-case that I expect to improve in the future.

For corporate projects, third-party modules stored on third-party module repos should not be a dependency for building and deploying. Storing third-party dependencies locally in some form solves the problem of third-party modules on npm which disappear. Check the third-party code into source control or use a private NPM repo. Using npm shrinkwrap to require precise versions is a good idea.

For open source modules, I would like to see distribution options that include downloading the entire dependency tree in a single tarball. The tarballs would contain a stack that has been tested and approved by the module maintainer, so you don't end up downloading a combination of dependencies that no one ever tested or intended you to use. A module being unpublished would not affect this case, as an approved version would continue to remain in the tarball.

Open source projects don't need to wait for module repos to offer this feature, they can upload and link to their own copies.

The project tarball could itself contain vetted tarballs of dependencies inside, ready for npm install. And yes, signed packages would help make sure that you all packages you are getting are from the authors you expect.

Npm, Inc has responded with their solution. They take some responsiblity for the difficulties encountered, but they also put themselves at the center of solving the problem. Their solutions alone would only make the main NPM repo a larger single dependency and single-point-of-failure for the JavaScript community.

We'll be a more resilient community if we all take care to make sure that key dependencies for our projects are locally available so that further disruptions at npmjs.org are not felt so widely.