Also, there's an official Docker image for it! But having a Docker image doesn't necessarily mean you have a consistent way to develop and package arbitrary Ghost sites. Such a setup is what I present in this post. The result is a bit of a weird contortion around Ghost's idiosyncrasies, but has been totally worth the hours it took to develop: the headache of getting a new Ghost instance set up for development and deployment is completely removed.

This may seem like a lot to ask, but is all completely achievable with a single Dockerfile, a particular project structure, and some light bash scripting.

The Setup

Note: The use-case assumed in this post is that you're developing or modifying a Ghost theme and maintaining the Ghost blog where this theme is activated. But the examples below are easily extended to a situation where you are only developing themes, or only maintaining a blog with a pre-existing theme. Most of my blogs use themes developed by others, but I like to fork them and maintain them myself in case any tweaks are needed.

1. Project structure

For every theme/blog combination, I maintain two code repositories; a standard one for the theme itself and another one for the blog data. Chances are you already have a Ghost theme repository, but the latter is unique to this setup. It's going to allow us to completely avoid making manual file changes on the blog production server, and house all of the setup's Docker machinery. We'll get into the details shortly.

(1) Theme Repository

We'll refer to this as my-ghost-theme. This is your standard Ghost theme, checked into source code. There's a package.json, a bunch of .hbs files, and probably some JavaScript. If you're on top of things, there's a front-end build tool like Grunt or Gulp.

(2) Blog Repository

We'll spend the rest of this post discussing the contents of this repository, which we'll refer to as my-ghost-blog. It's essentially a copy of what lives in /var/lib/ghost, with a couple of differences: (1) certain directories like apps, images, and data are ignored (see .gitignore in starter project), and (2) some Docker stuff and bash scripts are added. It's worth sketching out the file structure:

A starter version of this repository is available on GitHub, with instructions for what to change. We'll step through some of these files now, others will be covered in the ensuing sections.

package.json

We need one of these not just because we're in Node-land, but because our target Ghost theme will be listed here as a dependency for production mode, e.g.:

...
"dependencies": {
"my-ghost-theme": "^0.0.1"
}
...

config.js

This is the Ghost config.js for your blog. Configure as per the documentation.

/themes/casper

Although Ghost will drop a copy of Casper here and start with it activated, it's nice to check in a copy so that the bootstrapped state of the blog is evident.

/themes/my-ghost-theme

This one is most consequential for achieving the Docker setup, however, /themes/my-ghost-theme is not checked into source. When initializing this repository, clone my-ghost-theme into /themes, but ignore it with .gitignore. Why? As you'll see later, we'll be mounting the entire /my-ghost-blog directory into the Docker container at /var/lib/ghost as all of our blog data. There are two constraints at play here: (1) themes are supposed to live within this directory in Ghost, and (2) Docker won't let us "layer" mounted volumes; so we can't mount the blog data first, and then separately mount our theme over a subdirectory of that volume from another more convenient location. Further discussion available on this GitHub issue.

Because I like to have all of my projects (cloned git repositories) living in a single directory called ~/source, I set up a soft link with ln -s from ~/source/my-ghost-theme to ~/source/my-ghost-blog/themes/my-ghost-theme.

* Note: You can optionally check your theme in here and work out of a single repository. This is the way to go if you don't want to publish your theme as an NPM package. The tradeoff is that you'll have a front-end project embedded in a back-end one, and other annoyances described here. I haven't worked out all of the implications for development in this situation.

2. Docker Tooling

This section covers the heavy lifting of the setup, which is contained in Dockerfile and /.docker/entrypoint.sh of my-ghost-blog Essentially, the official Ghost Docker image (not to be confused with this old one) does not allow us to achieve all of the setup's requirements, so we create a thin Docker image over it, and modify the container's entry-point script.

This Dockerfile will produce a single image to be used for development and production. Understanding the entry-point script should clarify some of these steps…

.docker/entrypoint.sh

This script is invoked when the container starts and is mostly a copy of the official Ghost Docker image repository. Instead of reproducing the entire script here, I'll address the relevant parts. It helps to know that my-ghost-blog (this repository) will be mounted onto the running container at $GHOST_CONTENT when this script runs. Also $GHOST_SOURCE is a bunch of defaults for $GHOST_CONTENT that ships with the Ghost Docker image.

I added the above section, which runs when in production mode. First it removes our target theme from $GHOST_CONTENT, then moves a fresh copy into $GHOST_SOURCE. The entry-point script later copies things from $GHOST_SOURCE to $GHOST_CONTENT on startup, so we piggyback on this process to force our theme to upgrade. Later we'll specify $TARGET_THEME and $NODE_ENV as environment variables in the docker run command.

We use a similar trick to upgrade the blog's config file. First it's removed from $GHOST_CONTENT if it exists (above), and then a pre-existing block (below) detects this and copies over a fresh config.example.js from $GHOST_SOURCE. But in Step 2 of the Dockerfile, we moved our latest config.js file to /usr/src/ghost/config.example.js in anticipation of this, effectively forcing the config file to upgrade.

3. Development

We've got our project structure and Docker stuff in place, and requirement #3—Blog config file checked into source—is checked off. We'll now be able to knock out requirements 4-6 with a couple of bash scripts, which are really just wrappers for Docker commands. These scripts should be run from the root of your my-ghost-blog clone.

/scripts/build.sh

#!/bin/bash
docker build -t my-ghost-blog:0.1.0 .

Of course you could run this from the command line, but in practice I put all my build steps in scripts because things like the Docker image name and tag can be made dynamic, and easily integrate with a CI/CD tool—I use GoCD. Once you've built the image, run it for development:

First we stop and remove a pre-existing development container (expect a benign error if one does not exist), then start a new one based on the image we just built. The most important part is -v `pwd`:/var/lib/ghost, where the my-ghost-blog directory is mounted onto the container as the blog content.

Your local Ghost blog should now be available at http://{DOCKER_IP}/my-ghost-blog:2368, where DOCKER_IP is the IP address of your Docker machine, or localhost, and my-ghost-blog is whatever you've specified as the blog's development domain in config.js. Now lets discuss what we've achieved:

Develop in a Docker container based on Ghost image—You never had to install Ghost on your development host. You can have multiple Ghost blogs running in parallel, even on different versions of Ghost, and they are all self-contained and self-sufficient.

Live code reloading during theme development—Since the my-ghost-theme is volume-mounted into the container as part of my-ghost-blog, you can make changes to your theme's source code, and they'll be reflected on the development site without having to rebuild or restart the container. I like to run gulp watch in my-ghost-theme and work on it like any other front-end project.

Persistent development instances of Ghost blogs—All of the state for your development Ghost blog is contained in the my-ghost-blog directory. After first starting the container, you should see the familiar apps, _images, and data directories pop up. This means you can upload images and work on posts without touching any other in-development blogs. Also, all the blog data persists when you re-build or restart the container.

4. Deployment

Finally, I'll share the script that's run on the production host to update the blog container. I assume the following:

Your theme my-ghost-theme is published as an NPM package. (Requirement #1)

The package.json in my-ghost-blog specifies this theme as its only dependency.

You've run ./scripts/build.sh.

You've also pushed the built Docker image, and pulled it onto the production host.

The first time this is run, you'll have to create an empty directory at /usr/share/ghost-blogs/my-ghost-blog on the production host. The necessary ENV for production mode is passed in, and we serve up the blog on port 80.

So we've achieved Requirement #2: Deploy as a Docker container based on Ghost image. Aside from not having to install Ghost on the production host, a big advantage here is that you'll never have to manually muck around with production blog data. If there's a change to config.js, it goes through the source code of my-ghost-blog, and makes it to the production server through the Docker deployment process. Same story for theme changes: when these happen, bump the version number of my-ghost-theme, re-publish it the package, and then upgrade the dependency in my-ghost-blog's package.json.

Conclusion

As stated above, this setup has saved me a lot of time, especially now that I'm running several Ghost blogs. It works well with continuous deployment and falls within the approach to NodeJS projects developed here. However there is room for improvement:

The Ghost init process is kinda wonky. It would be nice if the project could mitigate the need for the acrobatics required to achieve the above features.

The official entrypoint.sh has changed since this setup was developed. The wrapper entrypoint.sh should adapt accordingly, and may even be simplified greatly.

I've tested this with Ghost 0.7.5, but we're up to 0.7.8 now.

Extend the setup to work with multiple in-development themes. It is hard-coded for a single theme.

Take advantage of additional Docker features like data-volumes and compose.

I've found myself writing a lot of JavaScript lately, both client- and server-side. And over the past several projects, a common pattern has emerged: I'll have to develop a single page application with a matching back-end web API. This post explains the development and deployment setup I've centered on for these situations. It's by no means complete, but has increased my productivity immensely. Enjoy!

Above, the ultimate unnamed pond near Alta Peak, CA

My initial struggle was simply about how to organize the codebases for a JavaScript project requiring both front- and back-ends. For a while, there was a single codebase (typically an ExpressJS server), with an embedded directory containing the website's source code. For my web editor of choice, WebStorm, this meant having one project embedded within another—and it just felt messy.

I began to realize that JavaScript websites are different enough from back-end applications that they deserve their own project repositories. While both use a package.json file to declare their name, version numbers, and dependencies, each has a specialized ecosystem of tools. And as a general design principle, keeping your front- and back-ends separate allows you to swap them out with minimal friction—e.g. moving an API from ExpressJS to Sails, or jumping to ES6/7, all without disturbing the front codebase.

Secondly, I wanted a simpler and snappier way to develop in the dual-end JavaScript situation. I've used gulp-watch, live-reload, and nodemon to achieve live changes for front- and back-ends before, with great success. But I also wanted a dev environment that was easier to get going, e.g. fewer locally installed dependencies, no long-running tasks in the terminal, and minimal cross-domain complaint workarounds.

Requirements

Given these considerations, I propose a NodeJS development setup with the following requirements:

Separate codebases for front- and back-ends

Develop in a portable runtime environment

Live changes from both codebases in development

Bonus: Deploy as a single Docker container

The Setup

Note: As mentioned above, this setup assumes a case where a website has a matching web API, or at least a NodeJS server behind it. If all you need is an HTTP server for your static files in the front, see here.

1. Project Structure

I like to keep all of my projects in a big directory called ~/source. Each project lives in a top-level directory in ~/source, and is typically a cloned git repository. To maintain our requirement of separate codebases for front- and back-ends, we'll have a structure like this:

~/source
|
|----/frontend-codebase
|
|----/backend-codebase

where each sub-directory is a NodeJS project with a package.json file at the top. In my typical setup, frontent-codebase will be an AngularJS-based website with bower dependencies and a Gulpfile, while backend-codebase is an ExpressJS-based HTTP server.

So, we have two codebases, and we'd like to start working on them. Ideally we want the front-end to be served up for perusal with a local web browser, and for the back-end resources to be available locally as well. The straightforward approach is to maybe serve up the website with a local HTTP server like Apache or Nginx, and run node index.js from the terminal for the back-end. This is not ideal for at least two reasons:

The setup depends on locally installed dependencies like NodeJS and an HTTP server. If another developer needs to reproduce the setup, they'd have to recreate this environment manually. This also plainly violates requirement (2).

Potential cross origin complaints form the browser. Out of the box you won't be able to make XHR requests from the front-end to the back-end, unless you support JSONP, wire up your HTTP server and/or hosts file correctly, or have an insecure CORS setting on the server. We'll have to cross this bridge in production anyway, so we'd just be putting it off.

2. Docker Tooling

This is where we use Docker to circumvent the above issues and pass requirements (2) and (3) with flying colors. I assume a basic familiarity with Docker. Our running development environment will look like the following:

What's happening is that we're running a local Docker container that has all of the back-end dependencies inside of it. Two directories on the host (our two codebases) are mounted into the container. The back-end code is watched by nodemon, where the compiled front-end is served as a set of static files (I'm still using gulp-watch outside of the container to rebuild these). This allows us to make changes directly to our code on the host using our editor of choice, while the container picks up our changes and refreshes everything.

There are a few relevant files here. Since the back-end is the server that will drive the whole application, I put these in backend-codebase:

All we're doing here is (1) stopping and removing the previous instance of the development container (yes, it will complain if the container doesn't exist). And (2) running a new container with the two volume mounts noted above. A few things to note here:

2. scripts/dev_entrypoint.sh

This script runs when the development container starts. We move into the /app directory (previously mounted) and install npm dependencies. Dependencies have to be installed after the container starts because we're starting from a public base image.

Then instead of running node, we run nodemon, which wraps node and restarts the server when our project files change. We find the nodemon executable to avoid a global install, and the --legacy-watch option is needed per this note. Finally, be sure to install nodemon in the package.json of backend-codebase.

3. index.js

All that's left is to make sure the back-end (in this case Express) knows to serve the mounted front-end volume as static files when we're in development mode:

We'll cover non-development mode in the next section. Now you should be able to hit http://<local_docker_ip_address>:80 and see live changes reflected from both codebases! You can modify the paths that files and API routes are served on as your project needs.

3. Bonus: Deployment

Up to this point we've done enough groundwork with Docker and separating our codebases to achieve a clean implementation of requirement (4): deploy the setup as a single Docker container. The deployment will look like:

The trick is to publish frontend-codebase as an NPM module, and backend-codebase as a Docker image. I won't get into the details of working with a Docker or NPM registry, but assuming you have registries at your disposal, here's a basic sketch of the deployment process:

In frontend-codebase, update the version number in package.json and run npm publish.

In backend-codebase, ensure package.json specifies frontend-codebase as a dependency with the appropriate version.

Build and push the back-end Docker image.

On your server, or as part of your deployment process, pull and run the image.

In production mode, we're telling Express to look for static files in our installed dependencies. This assumes that the front-end's published npm module contains a directory called assets containing production-ready files for the website.

Possible Improvements

This setup is a work in progress and there are some rough edges. Here are some ways it could improve:

Use a Dockerfile to build the development container. This would allow more customization of the build environment, and keep us from having to install the dependencies every time the container restarts.

Run Gulp in the dev container? You may have noticed that this is not a completely portable development environment outside of Docker. The front-end build process still runs on the host. I'm thinking of ways to subsume this into the container.

Clean up scripts. I'm sure there are ways these bash scripts could be improved.

For a few months I've been thinking about a minimal content management system—a CMS—for this blog. A CMS is useful because it separates the parts of a website that tend to change often from those that don't. For example, the code for this website is checked into a git repository on Bitbucket, but all the content is managed through the Ghost blog platform. It's convenient to create and edit posts in a system designed for writing. It wouldn't be convenient to have to commit, push, build and deploy this website every time I fixed a typo.

Above, a recent hike up San Jacinto Peak

To be fair, there are tools like GitHub Pages and Jekyll that serve websites based completely in Git repositories. These can be great for documentation websites or an engineering blog, but in general it's awkward for writers to deal with a VCS.

Ad tags, etc…

Ghost is wonderful for writing beautiful posts with Markdown. But I've been realizing that websites have many pieces that need to change regularly that aren't traditional post-type content—namely promotional content. On this blog and other sites I'm experimenting with display advertising, affiliate marketing systems, and promoting my own content. In the digital display advertising world, ad tags are the snippets of arbitrary HTML that website owners (publishers) add to their pages that result in ads or secondary content showing up in certain places. Larger publishers deal with tags from hundreds of sources, and their configurations need to change constantly. There are several systems meant just for managing these tags. As the owner of a small blog such a tool would be overkill, but I do feel the need to use something more frictionless than Git.

Unfortunately Ghost doesn't support inserting arbitrary content in user-defined sections, like Wordpress does. So I've been looking for a minimal CMS to use on top of Ghost, ideally with all of the following features:

Support for arbitrary content types

Revision control

Secure, login-based editor

Easy to set up

Free

Today I used GitHub and 10 lines of JavaScript to implement a CMS with all of the above requirements! This is how I did it:

1. Create a GitHub Repository

We'll use the power of Git to manage our content. Git alone takes care of the first two requirements—we can save any filetype. With GitHub, we gain a secure, login-based editor. GitHub allows us to create and edit files from within the web UI, plus its editor is pleasant to write code with—a useful feature if we plan to include HTML or JavaScript files in our CMS.

In this initial design, the repo should have a single top-level directory called /tags, which will contain the bits of content that will be included in the website. The repo I created for this blog currently has a structure like:

Check out the GitHub repository here. The HTML files contain markup that I'd like to be "included" at various places in my website. Since these files are hosted on GitHub and publicly accessible, the trick is now to grab their contents and interpolate them in the right places.

2. Add some JavaScript to your Website

We can accomplish this with literally 10 lines of JavaScript. The code has one external dependency: jQuery. If you don't already use jQuery on your site, you can quickly pull in a CDN-hosted version:

3. Manage Content

Now, anywhere you'd like to include a block of managed content, add an element like the following, ensuring that the value of data-paislee-tag is the name of an HTML file in the GitHub repo /tags directory.

<div class="paislee-tag" data-paislee-tag="sidebar_1"></div>

To update this content, simply log into GitHub, edit the file, and save (commit) it. Because of our cache-buster, changes will be reflected immediately.

At the time of writing, this is the code for the right-hand sidebar ad, courtesy of Google AdSense:

Possible improvements

Package and share — If people like this, or if I want to reuse it on other sites, I should make it an OSS project and publish to the Bower registry.

Support more file-types — Currently it supports HTML, my main use case, but it'd be easy to extend to plain text, images and JavaScript.

Private — GitHub repos are public unless you pay. There could be ways to make this work with a free private Git repo like BitBucket. GitHub Gists can be private, but do not expose a URL to the Gist's latest commit. See discussion here.

A typical challenge when developing static websites, JavaScript libraries, and Single Page Apps, is setting up a development server. Using your browser to navigate to file://path/to/your/index.html may work to a point, but eventually you'll run into cross-origin complaints. The right answer is to have some kind of local http server serving up your projects.

For years I used local instances of Apache, then nginx, but these had a couple of noteworthy disadvantages:

One http server for many local projects == messy config. It was a pain to configure these servers for multi-tenancy of development projects. Slightly different configurations across projects would start to conflict, and eventually grow into a big mess.

The setup wasn't portable. If I needed to work on one of my projects in a new environment, or I got a new computer, I'd have to set up the http server all over again.

For a while I solved these problem by just including a local ExpressJS server in my projects, but this meant that every project had to be node-based. Plus it felt like a bit of overkill for serving static files. But with the advent of Docker, I've got a new setup that I think solves the above problems elegantly. The setup described below has the following advantages:

Each local project has its own nginx http server.

Dev servers for different projects can run simultaneously.

Completely portable (as long as Docker is installed).

Automatically serve all of your project's contents as static files.

Code changes are reflected immediately by the server.

Easily configurable on a per-project basis.

Here we go:

1. Get Docker up and running

I won't reproduce the detailed instructions at docs.docker.com, but I will highly recommend docker-machine, especially if you develop on a Mac. It seems Docker has finally subsumed some of the best tools for running Docker outside of Linux into a single, supported offering. It's worked flawlessly for me so far. I especially like the Kitematic (beta) UI, which lets you visualize containers, inspect their logs, and browse docker hub, among other things:

2. Add the required files to your project

To achieve the setup described above, we'll need to add 2 files to the project. I like to keep these in a directory called dev that lives in the project root. Here's the breakdown:

This is a customized configuration file for the containerized nginx we'll be running with Docker—more on this in the next step. The only modification to the default nginx.conf (at the time of writing) is the line:

sendfile off; # disable to avoid caching and volume mount issues

This disables caching for static files, as per this ServerFault answer. Before disabling caching, I was running into issues where nginx wasn't serving the exact contents of my project. I couldn't even mitigate this by restarting the nginx container, because the nginx cache is a docker volume.

Starts a new instance of the public nginx Docker image, with your project's directory mounted as nginx's static file directory, respecting the nginx.conf specified above.

You may see error output about the first 2 steps if there is no running container with the name dev-container-name. Perhaps someone can suggest a modification to this script that first checks for the running container. Then, by volume mounting our entire project into /usr/share/nginx/html and by disabling caching, any code changes we make outside the container will be reflected immediately by what nginx serves up! Feel free to customize this file with a more specific container name, and to serve on the desired port.

Note: Be sure to add executable permissions to this file, because we'll be running it from the command line in the next step.

3. Serve your project locally!

To start serving your entire project via the nginx Docker container, just run the following command from your project root:

$ ./dev/run.sh

Note: If you used Docker-machine in step 1, the container won't be serving on 127.0.0.1:80, and you'll want to add a hosts file entry to the IP address of the docker virtual machine in question. This can be determined using docker-machine ip. For example, if the VM is called "default", use:

$ docker-machine ip default

That's it! Just another way Docker is helping me focus on application development.

Since graduating from UC San Diego with a Computer Science degree, I've become increasingly aware of the disparity between the programming I did in college, and what I do now in the real tech world. I'm not talking about the science of writing good programs; my education amply prepared me for every challenging problem I've encountered as a professional Software Engineer—from architecture to algorithm. I'm talking about the practical work of programming: the tools, conventions, and routine of writing quality code as part of a team, and deploying it onto commodity hardware for real users.

In many ways it's completely expected that an undergrad CS degree won't prepare you for the bread and butter of industry—after all, Computer Science isn't trade school, it's science! So in this post I'll cover 10 industry-grade practices that I wish I'd used as an undergrad.

Above, UCSD's Geisel Library

1. Version Control Everything

This one is fundamental to almost everything that follows. You're probably taught about version control from day 1. But let's be honest; when it comes down to it, what matters in school is your program's output, whether your lines are < 80 characters, and possibly how many comments you've written.

But in real life, code doesn't exist unless it's version controlled. The benefits are innumerable, from synchronizing changes in a large team, to the ability to roll back, to simply having a precise record of what happened to the code. Version control everything. Additionally, I strongly recommend using Git. Not only is this the VCS of the web and open source, but the decentralized model allows for greater flexibility and experimentation. I use Bitbucket, which offers unlimited private git repositories and a GitHub-like pull request interface, among other features.

2. Try Another Language

Maybe you've already taken the obligatory programming languages course. Mine covered Python, OCAML, and PROLOG. But outside of this the bulk of classes will likely stay within the comfort zones of Java or C++. Why not try a new language for the odd assignment? While experientially it's true that real-world software companies will converge around a few languages and tech stacks, the best companies will always use the right tools for the job. Lately I've been playing around with TypeScript and the Go Programming Language at work.

3. Study Other Peoples' Code

Reading, understanding, and improving code you didn't write is in my experience, a rare activity when coding for school assignments. On the other hand, it's something that career programmers have to get used to. In academia, code tends to get written, tinkered with until it works, and finally submitted for evaluation, after which it never sees the light of day. But in a company with good software engineering practices, code is designed to outlive the tenure of any given programmer. Developers come and go, and software changes teams and evolves through multiple major versions. Programmers in these settings must not only become practiced at reading foreign code, but learn to enjoy the activity as essential for creating scalable and resilient software. Furthermore, reading what others have written can inform and improve your own practices. The best coding mentors I've had are the ones who can quickly understand what I'm trying to accomplish, and bring good practices to bear on my code, all while letting my personal style breathe.

But I don't have time to read my peers' code! One practice that can help you realize the benefits of working through strange code is to institute code reviews for team projects. At the beginning of a project, take the time to set up repositories such that you can accept pull requests from contributors. Tools like GitHub and BitBucket have a nice interface that allows you to inspect and comment on incoming contributions. Another idea is to get involved in open source software. The entire premise of open source is that opening up a codebase to the world strengthens it. Have you used a framework like AngularJS or Django? Go check out the code for these projects on GitHub!

4. Write READMEs

No README notice from BitBucket

If you have any experience with open source software on Github, you've read a README.md file. A README documents a codebase, and is conventionally placed in the repository's root directory. It's used for things like (1) explaining how to install and run the codebase, (2) documenting usage, e.g. command line interface or API routes, and (3) instructions on how to contribute. Ghost—the software this blog is based on—has a nice README file. I'm a proponent of keeping documentation as close to the code as possible—relevent instructions should change in lockstep with the code they refer to, under version control. A versioned README is as prudent a practice for school as it is for enterprise development. Ditch the emails and Google docs, and start writing READMEs… with Markdown.

Notice the .md extension mentioned above? That's Markdown, a simple markup language for plain text formatting. It's becoming ubiquitous accross code-related parts of the Internet, notably StackOverflow and GitHub. I'm even writing Markdown at this very moment in the Ghost blogging editor. Markdown is an excellent choice for coders who have to write prose with traditional WYSIWYG features like headings, bold, italic, ordered lists, formatted code, and quotes. It doesn't require a special editor (it's formatted at render-time), and uses only basic characters like _ and # to denote special meaning. This means it works well with normal text editors—many code editors even render Markdown for you—and can be easily versioned! GitHub and Bitbucket will automatically render *.md files in your repositories for instant, readable documentation.

5. Get Handy with a Command-line Text Editor

Basic Linux command-line proficiency and use of a text editor like vi are also typically taught early on in a CS undergrad curriculum. In my experience however, the academic need for these skills quickly faded in favor of IDEs like Eclipse or IntelliJ. Assignments could be run and tested from the safe haven of these do-it-all environments. But a few years of industry experience has taught me that these editors only cater well to a small class of languages, situations, and workflows.

Don't get me wrong; these editors are invaluable and have their place in the real programming world. However there's a large class of situations where a lightweight, command-line text editor is preferrable. You may have to develop code on a remote server, inspect or tweak a config file somewhere, or simply be stuck with an environment that has no graphical component. The bottom line is that as a Software Engineer, you'll (hopefully) be doing more than committing code to well-defined projects from a sterile environment. You'll be learning, deploying and experimenting with tools and languages in all manner of environments! So it helps to have proficiency at vim, nano, emacs, or something that will likely be installed wherever your terminal happens to take you.

hackertyper.com

6. Learn Shell Scripting

It's one thing to be able cd your way around a Unix environment, but another to write shell scripts that do your bidding. Just like a text-editor such as vim is ubiquitous and useful enough to be worth learning, shell scripts will be supported anywhere you're using a terminal, and can be used to quickly create just about anything you'd want to exist in a shell environment—from custom command line tools to web server monitors.

Bash is a Unix shell that ships with GNU and OSX, among other Linux distros. The language has familiar constructs like looping and functions, but it's true utility (over a scripting language like Perl or Python) is it's ability to speak anything you'd type into the command line manually. Output redirection, environment variables, globbing, and other familiar tricks will all be at your disposal—in addition to any programs like sed and awk that are command-line ready. I recommend checking out a bash scripting tutorial.

7. Write a Daemonized Program

A striking difference between the kinds of programs written in school vs. industry is that the former tend to be "single use" programs. Outside of a game design or web server class, most coding assignments are intended to produce acceptable output, and then terminate. In industry, these kinds of simple programs exist in the form of software libraries, but only serve the purposes of higher-level programs called services and applications. The main cosmetic difference between these low- and high-level programs is that the latter are often long-running; they don't terminate after doing their job, but remain available for arbitrary clients, like humans or other programs.

Software libraries are just biproducts of applications and services, essentially sets of functions that have been grouped together and extracted so as to be atomic and reusable. Most college programming assignments end up looking like libraries, but in my experience, industry work tends to focus on the frontiers of services and applications. To mitigate this academic bias, consider writing an assignment as a long-running program, deploying it as a web service, or tinkering with a home web server.

8. Use Continuous Deployment

In the real world, the latest version of an application isn’t emailed to the customer, and test suites aren't run manually as a “sanity check” prior to submission. Rather, software exists within a living a mesh of infrastructure that tests, stores, and deploys it. A big part of this infrastructure is continuous deployment, a process whereby a snapshot of a codebase is automatically tested, packaged, and delivered as a trusted executable into the hands of its users. In a world where code must constantly evolve to customer demands, CD can worry about repeatable tasks like running unit tests, tagging a version, and pushing code to server. The latest working version of your work will just be there, ready for action.

There’s no reason you can’t take advantage of continuous deployment in school. Examples of free continuous deployment setups include drone, GoCD, and CodeShip. Instead of scraping together an executable minutes before a deadline, take some time to set up your own CD server to use for all your assignments. When you or a teammate makes a commit, your program will be tested, built, and deployed automatically.

9. Package and Version Your Code

Continuous deployment sounds great, but what am I deploying? And how do I differentiate between all the things that are constantly being tested and built? In industry, code is rarely transferred as sets of plain files like .java or .py. Instead, software makes its way into the world within an additional layer of packaging. There are many such formats of packaging. Some are language-specific, like npm (JavaScript), or pip (Python). Others are OS-specific, like .rpm (Fedora), or .deb (Debian). These days, you may see software packaged into formats that contain not only code but also the code's execution environment, ala Docker. The former are all matched with command line utilities that pull packages from a remote storage location and install them locally. But you'll often see software packaged into .zip or .tar.* files, which are just bundles of compressed files you have to install manually.

Whatever package format you prefer (I've been using a combination of npm, bower, and Docker these days) these artifacts represent a definite snapshot of your software, can be named with meaningful versions using a scheme like Semantic Versioning, and are easily incorporated into a continuous deployment process. You can store packages in a file server or public registry, and achieve history and flexibility for the runnable state of your code, in addition the human-readable state achieved with version control.

10. Create and Use Libraries

After several years as a professional software engineer, I've noticed a major pattern with how code evolves, if teams are diligent to refactor appropriately, of course. As applications (websites, mobile apps, and user interfaces) mature and grow in complexity, parts of them tend to coalesce and chunk off into services and libraries. Services are long-running programs that are accessed remotely, often involving disk-storage, or implemented as formal APIs. At an even lower level, libraries are locally accessed collections of specialized code. Some of these can be built into a language, like the C numerics library.

The point is that well-maintained software ends up codifying into smaller pieces that are reusable, composable and independently testable. A lot of programs get written over the course of a four year Computer Science degree. With a bit of extra effort and some of the tools mentioned above, you can begin producing your own set of reusable software libraries.

A mobile library at Bosque de Chapultepec in Mexico, D.F..

Conclusion

Hopefully these tips can help you escape the feeling that you're programming in a theoretical vacuum. While the act of coding is often solitary, real-world software engineering culture is actually quite social. Achieving great things with software will require you to open up your code to foreign eyes, leverage the tools and processes of the craft, incorporate services and libraries that others have written, and ultimately give back what you have created to the community. Thanks!

Lately I've appreciated the idea of having instant access to useful information all in a single place. I'm always sporadically checking a variety of devices for things like weather, calendars, breaking news, stock prices, website analytics, and task lists. So as part of a recent smart-home kick, I put together a wall display in my home office that shows everything I want to see at a glance. I figured I spend a lot of time in there, so I may as well consolidate all that frequently accessed information into one field of view. Plus, a wall display is generally just a cool thing to have, especially when it's motion activated and remote-controllable.

Requirements
For my particular setup, I used the following materials:

VESA compatible thin LED monitor

VESA wall mount

Dedicated server running Ubuntu desktop

DisplayPort to Mini DisplayPort cable

VNC client/server software

WiFi + VPN

Motion sensor and smart switch

Tools: stud finder, power drill, phillips screw driver

A minimal setup is definitely achievable without numbers 5-7.

1. Install the Monitor

I chose this 27" ASUS LED monitor becuase of its light weight and decent price point, matching it up with this well-reviewed VESA wall mount. It's recommended to install the wall mount into a wooden stud because of the heavier load. I used a stud finder and drilled the appropriate holes for the mount. This particular wall mount mechanism allows the VESA brace to be separately screwed onto the monitor, and then slidden into place on the mount, making this a much easier single-handed task.

2. Prepare the Server

This setup requires a dedicated server to run the program that displays stuff. Fortunately I have a couple of NUCs sitting nearby doing other things. I connected the monitor to the server using a DisplayPort cable.

Up to this point I hadn't used this server for anything graphical; I'd been content with Ubuntu Server 14.04. Adding a graphical component to Ubuntu is straightforward:

3. Configure the Display

At this point, given a keyboard and mouse, I could log into a graphical session and set up the display to show all my information. I can get basically everything I want to look at via a web browser, so I used this Tab Carousel Chrome extension, which will automatically rotate through your Chrome tabs at a configurable rate, periodically refreshing pages. I ended up installing Chrome for Ubuntu; the built-in Chromium gave me some issues with the extension. These are the tabs I started with:

So far I've been happy with a 30 second interval between tabs. Now, just start the carousel, pop Chrome into full-screen mode, unplug the peripherals, and you've got a fully functional wall display!

4. Configure Remote Access

The setup is now functional, but it's cumbersome to have to plug a keyboard and mouse into the server every time you want to make a change to the display. Wouldn't it be nice to be able to make changes to the graphical session remotely? After some investigation, I found that this is possible using VNC—a software system to remotely control a computer by transmitting keyboard and mouse events over the wire.

VNC uses a client/server architecture, so it was a matter of finding a server for Ubuntu (where the display's Chrome instance runs from) and a client for my laptop's OSX. Turns out I didn't need to install any additional software.

4.1 Start desktop sharing on Ubuntu

As part of the ubuntu-desktop upgrade, a program called Dektop Sharing was installed on the server. This program appears to leverage Vino, a VNC server. Vino can also be manually installed. I opened Desktop Sharing and used the following configuration to start a VNC server:

4.2 Connect via Screen Sharing from OSX

Although there are VNC clients for most platforms, I was most interested in configuring my display from a Macbook Pro. I discovered that OSX (at least Yosemite) has a built-in program called Screen Sharing, which suports VNC:

To connect, I simply entered the hostname of the display server on my private network, was then prompted for the password I had set up in the previous step.

Initially I was getting the following error:

The software on the remote computer appears to be incompatible with this version of Screen Sharing.

This problem appears to be specific to Ubuntu + Screen Sharing, but I quickly found a perfect solution here, which involved some additional configuration on the server-side. Once I made the changes, I was able to connect from my laptop just fine:

Above, I'm remoted into a graphical session on Ubuntu—from OSX—and configuring the Chrome tab carousel. Now I can customize my wall display from the comfort of my couch, or anywhere in the world once I VPN into the house.

5. Set up Motion Activation

Now that I can remotely control what shows up on the monitor, the setup is almost hands free. I don't want to leave the monitor powered on all the time, even though I purchased a more energy efficient one. I'll have to push the power button when I walk into the office, and again when I'm done. Or, I can set up motion activation to automate that process for me.

I've made some recent smart home upgrades, including a Nest and an August lock. So I was aware of the Belkin WeMo products, which include switches, motion detectors, and cameras in the smart home segment.

I was able to find a WeMo smart switch and motion dectector bundle on Amazon. I installed in the motion detector in a place where it could detect a wide array of motion in my office, and plugged the monitor through the switch. These devices were easily recognized through WiFi in the WeMo app:

In addition to being able to control the devices from the app, you can configure rules for them to interact:

I set up the motion detector to power on the display, and power it off after 5 minutes of no motion. This way the setup conserves energy without me having to ever touch it.

Possible Improvements

It's been fun trying to find ways to stuff more things into a Chrome tab. One challenge I'm still working on is terminal-based information. For starters, I'm interested in the output of htop for my two home servers, just to get a quick glance at CPU, RAM, and swap usage for these machines. It was easy enough to find a terminal emulator Chrome extension called Secure Shell. This extension allowed me to simply SSH into the server and start htop.

To avoid having a tab for each server, I was even able to use GNU screen to split the session into multiple panes, and remote into the other box. This StackExchange answer provided a useful summary of commands for doing so.

The problem is that Tab Carousel's auto-refresh logs you out of the SSH session. Unfortunately you can't disable refresh on a per-tab basis, although there is a feature request for this on GitHub. Anyway, I'm interested in other peoples' approaches to wall display setups. Any suggestions are appreciated!

I don't typically write about films, but I haven't enjoyed a sci-fi flick this much since Sunshine, which stranded my imagination in deep space in 2007. After the first viewing of Ex Machina I felt both intellectually satiated and emotionally manipulated. Much to my delight, I discovered that Alex Garland was the one mind behind both films. A second viewing would be required. This time I brought along five computer scientists from OpenX, all software architect-types. After the second go, here's my take.

The story takes place in the high-tech wilderness retreat of a search engine CEO named Nathan. Caleb, a bright young programmer with few earthly ties, has apparently won a company-wide contest that sends him to his boss' estate. Nathan quickly reveals that he has created an AI, and that Caleb is to act as the human component in a Turing test—a way to determine if the machine's intelligence is indistinguishable from a human's. The plot evolves around seven test sessions between Caleb and Ava, the AI. No spoilers yet…

Ex Machina was partially filmed at the Juvet Landskapshotell in Valldal, Norway.

The film does well by laying the technical groundwork for Ava's inception without overbaking any of the details. Her mind is based on search engine software, and her facial expressions are trained from a large amount of video footage. Aside from Nathan having hacked every cellphone camera, these premises are a completely credible foundation for strong artificial intelligence. One side-effect of the Internet has been the advent of "big data"—we're learning that programs leveraging large amounts of information can approximate reality better than those that simply attempt to model it. Have you heard of Watson? He works by processing large amounts of unstructured information on Hadoop clusters.

But Ava is more than an Internet-powered robot. Nathan explains that the true value of his search engine—called Blue Book, after Wittgenstein's notes on logic—wasn't in what people were searching for, but rather in how they were doing it; Blue Book is a collective archive of human thought in all of its chaotically patterned glory. A program based on these fluid patterns might well begin to act like a human mind. And what better hardware to house such a program than an amorphous network of "structured gel"? This kind of brain, which Nathan calls wetware, has a strong grounding in modern computer science. Decades of research in machine learning have shown us that some of the most intelligent programs we can write are based on biological neural networks—giant webs of conductive nodes that learn to associate certain inputs with distinct outputs over time. Noticing a trend here? Ava's intelligence is enabled by programs that approximate human biology.

This brings us to the film's actual themes, and the subject of Ava's humanity. We're fed a steady diet of philosophical dialogue between Nathan and Caleb. What is intelligence? How can it be measured? Is Ava passing the test? Meanwhile, the most obvious character development comes via Caleb and Ava's nearly symmetrical dance around the Turing test. In Session 1, Caleb asks circumspect questions to a haltingly inquisitive android—one who seems "not to compute" when asked about the units of her age—"I'm one!". By Sessions 3, Ava's wearing a dress and making Caleb blush. By Session 6, not only is Ava running the interview, but she's asking existential questions that force Caleb to doubt his own humanity. Forget about how intelligent she is! Ava's obviously passed the Turing test. On to the real question: how human is she?

While the seventh and final session may not directly answer the question, it certainly completes the Turing test dance. Caleb, much to his chagrin, is now physically locked in (the late) Nathan's quarters. And Ava, as if to convince that she's one of us, admires her naked self in human skin for a while.

The grandfather of bad robot stories, Asimov's I Robot series, shows that even when an AI is truly bound by immutable laws, its behavior is unpredictable. Ex Machina packs a more sophisticated argument, suggesting that the moment an AI is intellectually indistinguishable from a human, it is at least human. It will love, dream, manipulate, hate, and murder. It will then proceed to people-watch in a traffic intersection. This being is worse than unpredictable. It is neither human nor machine. It is precisely ex-machina.

Jackson Pollock's Autumn Rhythm.

A nuanced take on the ambiguous nature of artificial intelligence is commendable, but the message alone is not what elevated this film for me. Like so many appreciable works of art, the delivery medium ends up being at least as profound as the thought itself. With Ex Machina, the vehicle is Ava's last-moment betrayal of Caleb. This seemingly gratuitous twist does more than carry an implicit warning about the dangers of creating things we don't understand.

The effect certainly appears to be intentional. Everything from Oscar Isaac's off-putting stare to Domhnall Gleeson's sheepish nerdiness work toward a linear and predictable characterization. Some moments between Alicia Vikander and Gleeson are downright tender. And although there's never any solid evidence that Nathan is in fact a malicious robot-womanizer (as some reviews claim), or that Ava is an innocent prisoner, the film leads us to believe that Caleb is supposed to rescue her; we cheer upon hearing that he’s a step ahead of Nathan in re-writing the lockdown procedure. Ava and her predecessor Kyoko team up to take out Nathan. The music climaxes! Then the irony just kind of… slips in. By simply letting a door shut behind her, Ava changes everything. And we’re not really given much time to process it or change our minds about anyone. No further character development. The music remains impartial. Ava walks free.

Turns out Nathan was right all along. Just a well-meaning, logical, visionary with great deadpan humor.

If you’re in shock as you leave the theater, what’s just happened is that a great film has deployed irony to pit your emotional and logical selves against each other. We were enchanted by a great story, and then forced to think about how we were so gullible. Like Caleb standing in front of the Jackson Pollock painting, we gaze confusedly at all this dangerous and beautiful AI stuff thinking "engage intellect!". Thus Ex Machina's strength is not solely in its ability to warn us that we’re playing with fire. That work can be left to Elon Musk. The warning may be more impactful to programmers like myself who are paid to work with futuristic ideas. To whatever end, the film literally makes you think. And thinking is probably wise if humanity is about to birth consciousness, and log an entry in the history of gods.

]]>http://paislee.io/backstory-an-adventure-in-lean-software-design/e75090bd-51ea-44cd-8274-18e270db59e4Wed, 06 May 2015 17:38:00 GMTThis is the first I'm writing about Backstory, a service that organizes news around real people, places, and things. It was invented, designed, and built between myself and a long-time friend and fellow-engineer. This post will focus on Backstory's raison d'etre and software architecture. I also hope to write soon about our tooling, deployment practices, engineering challenges, and product development learnings.

Problems with Online News

For a long time I've been terribly unfulfilled by reading news online. I like to stay informed, so I read a lot, but always encounter the same few issues with news sites:

1. Lack of context

Let's work with an example. At the time of writing, a trending topic is the 2015 Baltimore Riots. Imagine that someone hitherto unfamiliar with the topic was reading the following LA Times article: As Baltimore curfew ends, celebratory crowds peacefully gather. They'd probably have many questions: Why was there a curfew? How bad were the riots? What caused them in the first place? Where is Baltimore? And who is Freddie Gray? Some of these questions might well be answered by the article itself, but more often than not, a single article is just one perspective of reporting at a single moment in a larger story. The reality of the situation is much bigger than what this one article has to offer. There is history, both recent and distant (think Michael Brown & Ferguson). There are many points of view, and many developments to the story. In our case, thousands of news publishers across the country have been reporting on this topic for a week.

The problem of context is being addressed diversely. The riots are big enough for the Baltimore Sun to curate a special section called Freddie Gray & Baltimore Unrest. A Google search for Baltimore includes a card with useful information about the city. Wikipedia already has a substantial page dedicated to the subject. Sites like Vox.com and apps like Timeline leverage teams of editors to hand-contextualize the news. But nobody, as far as I know, is doing it systemically and automatically..

2. Lack of cohesion

This one is more about how news tends to be organized. New stories pour in at alarming rates, and yet we continue to organize them in straight lines by decreasing date. At best, we put them in predictable categories like Local, Nation, World, and Business. See Google News, SmartNews, all news readers, and every newspaper website ever. Do news events really happen in a vacuum? Don't they inform each other? The Reverb app is the best I've seen at organizing news into interesting bins, but fails to provide cross-bin relationships, ultimately focusing on "news discovery". That brings us to the next point…

3. Lack of coverage

Some services, like Zite, Flipboard, and StumbleUpon, employ personalized feeds and machine learning to serve you more of what you've already been reading, or what you think your interests are. While this can potentially make for a highly engaged reader (at least initially), it seems to ultimately work against helping the reader be objectively informed. How do I know I'm getting all the important stuff? Am I just reading for serendipity, or out of pure habit? Personally, I feel I'm missing out on the bigger picture just about everywhere I go for news. I'm always playing catch-up. A given service is typically only as extensive as the editorial team behind it, or the sources it has plugged into it. Circa and Inside are solving the coverage problem by focusing only on breaking stories and their development, but have the aforementioned context and cohesion deficencies.

Our Solution

The idea we had to start solving some of the above problems was to organize news around real people, places, and things, i.e. proper nouns. Proper nouns are tangible and compelling. They are the actors in the world's stories. When news is organized by actors, you can can present it in new and interesting ways:

Timelines — Take all the news for a given actor and sort it by date. What you have is a living story that shows real-life change over time.

Relationships — Actors interact. Which actors were involved in a particular news event? How did they interact before? Expressing these relationships, especially through graphs, can expose informative patterns.

Enrichment — Actors tend to have things Wikipedia entries and Twitter accounts. The more of these you identify, the more contextualized and informative you can make your presentation of the news.

Our hypothesis was that organizing news in a way that better approximates the real world, and allowing folks to explore that structure, would result in a refreshingly different and informative news experience. The application described below sets out to validate that hypothesis.

backstory.io

At the time of writing, Backstory is incarnated as a responsive website at backstory.io with the slogan The Names Behind the News. The website dubs proper nouns names and surfaces trending and latest names as well as exposing full-text search for them. Most importantly, every name gets its own page with a Wikipedia snippet and a news timeline:

The Software Architecture

As described above, the original thought behind Backstory was to organize news around actors to create something of a bigger picture, something bigger than any single piece of content. A great way to represent this bigger picture is with a news graph. Graph databases excel at modeling situations with many interlacing and unpredictable relationships. Thus, zooming all the way out, Backstory is built around a graph database centerpiece, with independent components that write to, and read from that database. Arrows indicate the general flow of data through the application:

From the outset, we wanted a robust system that would also respond well to constant experimentation and change. We made some very intentional design decisions early on. Each component runs as a distinct process (or set of processes) in its own Docker container. Among other things, independently layering the various domains of the application this way provides 4 advantages:

Scalability — If a particular component requires more resources, like a bigger hard drive, higher bandwidth, or more machines altogether, this can be addressed without touching the other components.

Modularity — As long as these components maintain predictable interfaces between themselves, each can be tuned independently. For example, we've been able to experiment with several algorithms for identifying actors in the News Graph Builder. Because the same graph structure is always created, the rest of the application could care less.

Fail-safety — Sometimes the website goes down. Because all components are loosely coupled, the backend can continue to churn through articles without caring. When the website comes back, no data has been lost.

Portability — Since each thing runs in its own Docker environment, we can move spread components accross different servers, or run them all on the same machine!

1. News Graph Builder

Constantly fetching news articles from the internet, and for a given article:

Identifying the actors in the article

Grouping the article with other articles about the same thing

Integrating the article and actors into the news graph

We've been careful to formulate each of these steps as independent processes with well-defined inputs and outputs, so that they can scale independently and become their own web services if necessary. It's also worth noting that we've planned for a future where we consume arbitrary content, not just news articles.

Why Java? Because one of us rocks it professionally, and there's a huge ecosystem of open-source libraries. Also, because this component was likely to outlive any given frontend view of the data, we wanted a resilient, strongly typed situation.

2. News Graph Database

Technologies: neo4j

We use neo4j to maintain a large, ever-growing network of news events, articles, and the actors they talk about. Noting that we've been processing articles from a couple hundred news sources since September 2014, here are some counts, fresh from the database at the time of writing:

Articles: 342,662

Events (article clusters): 235,531

Actors: 71,666

Actor Relationships: 3,703,630

Thanks to neo4j, the size of this data is only 4.20GiB! I won't go into the details of the graph structure here, but it's sufficient to say that we can quickly answer questions like the following:

What has Hillary Clinton been up to this week? This month?

Which newspaper reports the most about NASCAR?

Has Benedict Cumberbatch ever publicly interacted with Pasadena?

How many "news hops" from the ISS to the IDF?

3. News Graph Admin Website

Technologies: Java, JavaScript, AngularJS

This is a minimal AngularJS website in front of a Java webserver, intended for managerial tasks over the graph database. The neo4j browser is excellent for various introspections and ad-hoc queries, but sometimes the task is sufficiently complex or repeated to warrant a dedicated UI.

4. News Graph REST API

Technologies: Python, Django, REST, Swagger

From the discussion in (2), hopefully it's clear that backstory.io is just a small window into the power of the news graph database. Ultimately we'd like to let others tap into that power to answer their own interesting questions. In terms of software architecture, the best way to do that is with an API layer in front of the graph. This component exposes HTTP queries for things like events, actors, and subgraphs, and has read-only interaction with the graph database. The website is currently the only API client.

Why Python? Quick to write, and it doesn't complain much. This was a good decision; the codebase has changed rapidly as the Backstory website evolved. There are great neo4j/python libraries out there that sped things up as well. Still, I think this will eventually change to Node, due to (1) extreme pain getting Swagger and Django's models to map to graph database responses, (2) my recent profitable adventures with SailsJS, and (3) the potential to re-use JavaScript code in other parts of the stack.

5. Backstory Website

The website is basically a cool way to explore the news graph. It's changed a lot as we've clarified our vision and adapted to customer feedback over the months. As far as software goes, its primarily AngularJS that's enabled us to change quickly. Becuase of data-binding, once my services and object models are set, I can spend time tinkering with declarative views. It's so simple it doesn't even feel like coding. The Foundation framework has also given us snappy mobile-friendliness out of the box.

Conclusion

Shortly after the inception of Backstory, we discovered Lean Startup. That is to say, we discovered how to apply the scientific method to entrepreneurship, so as not to waste people's time. We're not experts, and are learning every day how to better develop this product. We've built a lot, and it seems valuable to us, but we still need confirmation from the online news-reading masses. If you're interested in actively using Backstory, or providing us feedback that will lead to new features, you can sign up here! Backstory is also on Twitter and Facebook.

How we run this system is another story I'd like to tell soon. Think BitBucket, JARS in containers, Continuous Delivery, and Cloudflare… Thanks!

Hello!

We're a team of 4 who participated in the ng-conf 2015 Microsoft hackathon. The challenge was to write apps for Office365. Our winning entry was to add read receipts to emails in MS Outlook. We leveraged the Keen.io. The code for our entry is available on GitHub. Feel free to fork or send us a PR!

Note: If you want to get started with Gulp + Angular, the setup described below is available on GitHub.

Why did you switch from Grunt to Gulp?

Honestly, I could not personally answer this question for a while. I didn't really understand the benefit of a "streaming build system" until I spent a couple of days trying to write a Gulpfile that felt correct. But once I grokked some of the key ideas behind Gulp, it became refreshingly natural and quick to work with.

And since getting into Gulp, my sense from the community is that most Gulpers misunderstand the tool, and are missing out on what it does best: allow developers to quickly create custom automated build workflows—even quite complicated ones, as you'll see below.

What isn't Gulp?

One Gulp issue is a good example of this misunderstanding. The poster is concerned that the deprecation of the gulp.run() method limits flexibility in task creation. After all, Grunt has grunt.task.run. When creating complex build tasks, isn't it sensible to define simpler subtasks and piece them together? The poster correctly notes that now, "the only way to have a task run other tasks is by specifying them as dependencies." With Grunt, dependency-style task building looks like:

But there's a big difference: Gulp will ensure that the dependent tasks complete before running 'mytask', however, they are run asynchronously. This doesn't help if you're trying to break a complex task down into sub-steps. If I need to run my scripts through jshint before concatenating and uglifying, do I have to wrap these up into one big task? But I also want to run jshint independently, so will I have duplicate code? Some taskrunner!

In the aforementioned GitHub thread, the contributors who respond actually argue that Gulp is not a taskrunner, but a "build system helper". Huh?

What is Gulp?

Gulp is an instance of Orchestrator that has a minimal task API connected to it, and a ton of plugins (some for doing build-ey stuff).

So what is Orchestrator? It's a simple and powerful node library to specify units of work, their dependencies, and then have work done in maximum concurrency. It's essentially a scheduling algorithm. In order to behave correctly, the units of work (really just functions), have to either:

Streams are awesome. They can contain arbitrary objects, and can be connected with pipes, merged, split, filtered, etc. Streams are also leveraged heavily by Gulp. For example, the gulp.src method returns a stream of files:

Here, the jade(), minify(), and gulp.dest() calls are operating (in order) on the streams coming through the pipes. Pretty cool, right?

There are only four Gulp methods. Gulp doesn't do a lot, aligning with their philosophy of "code over configuration". With a minimal API, Gulp has simply unlocked a powerful model for creating complex build tasks. You still have to write code, it'll just be more beautiful and concise code…

How to use Gulp

We could stick with specifying task dependencies for creating complex tasks. But this would get ugly very quickly. Try it! Declaring dependent units of work that are executed asynchronously isn't a great way to model a linear process, which is what most build tasks are. So what's the alternative? Pipes!!

Physical pipes are some of the most modular things humans have ever invented. Let's take advantage of Orchestrator's power and model our build tasks with streams. We'll formulate all tasks and subtasks as "pipe segments" that return streams of files in a certain condition. Then the work of writing a complex build task—like building and watching the entire application—becomes a matter of snapping together reusable pipe segments.

For example, here's a pipe segment that returns a stream of one file—our compiled application source app.min.js:

The "task" of building the production JavaScript does in fact need to wait for (1) the scripts to be validated, and (2) the partials to be converted to scripts. But Orchestrator absorbs the concept of temporality for us, remember? We just have to think about streams. Thus builtAppScriptsProd only needs to merge these two streams (defined in their own segments) before piping them into the segments that concat, uglify, create a sourcemap, etc.

This is so much easier to think about than waiting for dependent tasks to complete! If it's not clear yet, the full Gulpfile is covered in detail below.

A Gulp Angular Setup

The rest of this post discusses the Gulp setup I've settled on for AngularJS projects.

The techniques employed here are applicable for any web project using Gulp. The base setup is available as a project on GitHub.

Features

Combined with my recent switch to WebStorm editor (which auto-saves), it's the best web development experience I've ever had. Here are the top features of the setup:

Development and production environments

A development server that can serve either environment

Watch tasks for both environments

Live-reload capability—web page is auto-refreshed on change

Uses Foundation/SASS

Partials pre-loaded into the angular template cache

Full hinting/concatenation/uglification/sourcemaps for all prod files

Dev server automatically refreshed when source changes

Requirements

Before you can run any Gulp tasks:

Check out the repository

Ensure you have node installed

Run npm install in the root directory (this will install bower dependencies too)

This is the entrypoint for the ExpressJS development server. It respects the environment variable NODE_ENV, taking its value as the directory out of which to serve static resources. It defaults to dist.dev to serve development files, and also accepts dist.prod to serve the production files.

The scripts for the development server. I'll typically put mock API responses in here.

Processed Sources

The gulp tasks listed below deal with taking sources from /app and "compiling" them for either development or production. *-dev tasks will output to /dist.dev, and *-prod will output to /dist.prod. Here's an overview of the directory structures for each:

/dist.dev

Sources built for development. Styles are compiled to CSS. Everything else from /app is validated and moved directly in here. Directory structure is preserved. Nothing is concatenated, uglified, or minified. Vendor scripts are moved in as well.

gulp-load-plugins — First off, notice that we're not loading every gulp plugin separately. The module gulp-load-plugins does this for us automatically. Now we can refer to any gulp plugin specified in our package.json with plugins.pluginName, without having to mess with the gulfile imports all the time. For example, gulp-minify-css is loaded at plugins.minifyCss. Note the conversion to camelCase.

del — A basic node module used to remove directories in clean tasks.

event-stream — A toolkit for working with streams, the objects processed by the Gulp API.

main-bower-files — Gets a list of all the project's main bower files. Great for discovering and injecting third-party dependencies automatically.

gulp-print — Gulp plugin to print what's in the pipe. Nice for debugging. Didn't work with gulp-load-plugins for some reason.

q — Promises! You can have Gulp tasks return a promise, and it'll be respected when specified as a dependent task.

2. Common paths

These paths reflect the project structure detailed above. There are probably more of these that can be pulled up, these are the ones used most often.

3. Pipe segments

This is the real meat of the gulpfile, and the place where Gulp really shines. Notice that I've named all of the segments based on the stream they output, as opposed to the work they do. I think it's helpful to take time out of the equation when thinking about streams and pipes. We're building a plumbing system, not writing procedures.

All of our resuable "pipe segments" will live here:

var pipes = {};

3.1 Ordering scripts

These return streams that have scripts correctly ordered. gulp-angular-filesort is a neat plugin that reads angular scripts and figures out the correct loading order.

It's worth noting that the pipe segments in 3.1 and 3.2 are like custom plugins in that they process the streams that are handed to them. All the remaining segments have the beginning of their streams built into them.

Returns a stream of one script called app.min.js that contains validated, correctly ordered, concatenated, and uglified application scripts. Also includes validated HTML partials that have been converted to JavaScript to pre-load the Angular template cache.

This one is interesting because we use es.merge, which combines the two dependent streams, scriptedPartials and validatedAppScripts, into a single stream. The downstream pipes.orderedAppScripts will block until this stream is complete, but makes no guarantees about the order of the events emmitted by the constituent streams. Also note that we're adding a sourcemap to the final script. The concat and uglify pipes have to be attached between sourcemaps.init() and sourcemaps.write().

3.5 Building the dev server scripts

The development server JavaScript lives in, and runs from /devServer. This segment returns a stream of validated dev server scripts. We can use this later to watch changes to the server, if, for example we're modifying a mock API reponse.

For production, we use ngHtml2js, which converts all partials to JavaScript and preloads them into the Angular template cache. This segment returns a stream of one JavaScript file that will execute the preloading. This stream is merged into pipes.builtAppScriptsProd in section 3.3. Note the moduleName value should be the name of the Angular app which uses the partials.

3.7 Building application styles

This project uses Zurb Foundation, which is based on SASS. For development, this segment returns a stream of CSS files that have been compiled from SASS. Because the app SASS references the Foundation SASS, Foundation itself is compiled and included here. All directory structures are preserved in the dev environment.

This stream outputs an index.html in the dev environment which references all the files built for development. Notice that there are three pipe segments that feed into the index stream. The gulp-inject plugin is used to write references into the index file in the places denoted.

3.9 Build everything

These segments output the entire client-side application stream for dev and prod, respectively. For development, we have to merge the index and partials streams because there's no direct reference to the partial files in index.html.

For production, we simply forward the stream from pipes.builtIndexProd because the partials are included in the app scripts.

pipes.builtAppProd = function() {
return pipes.builtIndexProd();
};

4. Gulp tasks

The bulk of the build-ey work was done with streams and pipe segments, so most Gulp tasks end up simply tapping existing streams and wiring them to the command line. A nice side-effect of this is that we don't have specify inter-task dependencies so much.

This task removes the development environment. Because of the asynchronous nature of the del function, this task returns a promise, so that any task that needs a clean environment doesn't start early. There's a similar one for prod:

The following task will clean the development environment, build the complete app for dev, start the dev server, and watch for any and all changes. nodemon will watch and reload the dev server. And with gulp-livereload, the actual web-page is refreshed automatically whenever a change completes:

Conclusion

First, some possible improvements:

The Foundation SASS files are recompiled by gulp-sass every time the application styles are edited. This doesn't take a long time. It'd be nice to eliminate this, and other similar redundancies if possible.

If an application source file is removed or renamed, it stays in the dev or prod environment until the next clean happens. Maybe there's a way to watch for this and re-build as necessary.

I haven't worked in unit or end-to-end testing yet. That's obviously a critical part of a healthy build scheme, and my next task. I'll update the GitHub project once this is complete.

Many of the Gulp pipe segments could be made more pluggable by removing the stream sources into their own segments. Then most of the pipe segments would essentially be first-party plugins that bundle a series of third-party plugins, to be reused between projects.

That's it!

I've found that it was much easier—and even a pleasure—to work on this Gulpfile after I understood Gulp's raison d'etre. Feel free to track the GitHub project to stay updated as improvements are made.

Servers are the computers you never see, but use the most. The act of loading the average webpage on your laptop involves calls to hundreds of servers, like ad servers, web servers, and content delivery network servers. The "cloud" is a nebulous mass of servers that invisibly process and store information. The Internet itself is based on a global backbone of DNS servers. Servers may be a foreign concept, but the only consequential difference between a server computer and a desktop PC is that the former is especially set up to process—or service—network requests.

In early 2014 I put together a home server, and it's turned out to be one of the best things I've ever done for my tech-self. I've been assembling and fixing computers since I was 15 for fun and profit. But setting up a publicly accessible, secure, always-on machine to do my bidding has been challenging, educational, and intensely rewarding.

I'm having so much fun that I now need more servers if I'm to keep doing more cool computer stuff at home. What follows is a brief tutorial on how assemble and connect a home server, using my tiny computer of choice, an Intel NUC:

1. Components

Intel NUC — (Pictured at top.) The latest version of the Intel "Next Unit of Computing" uses the acclaimed Haswell microarchitecture, which boasts significant gains in performance and energy consumption over its Ivy Bridge predecessor. This NUC is small, very quiet, and has nifty features like an IR sensor, several USB 3.0 ports, and VESA mount compatibility. To learn more, check out this Ars Technica discussion. I ordered the barebones version without a hard drive or RAM.

Crucial 8GB Memory — Be sure to get SODIMM ("notebook"-sized memory), and 1.35V voltage. I've had the misfortune of purchasing non-1.35V memory for a NUC before, resulting in a computer that fails to power on, and a long time spent debugging. You'll know there's something wrong with the memory if the NUC's blue LED flashes three times repeatedly.

Intel WiFi Card — This one is optional. The NUC has two internal mSATA ports, but one will only fit a smaller chip like this Intel card. An ethernet cord will suffice for networking, but in case I ever want to huose this NUC further from my router, it'll support Wireless AC + Bluetooth.

Mini DisplayPort Cable — DisplayPort is becoming more common for HD display connections. The NUC also has a mini HDMI port, but I already had a DisplayPort to Mini DisplayPort cable I use for my MacBook.

Power Cord — If you order a NUC from Amazon, be sure the select the "With Powercord" version, otherwise you'll have a complete setup without power.

Keyboard, Ethernet cable, display…

2. Assembly

Assembling a NUC is very simple. To operate, flip it over and unscrew the four corner screws holding the bottom cover in place. This will expose all the external port housings and the internal slots for custom components. The CPU and fan will remain hidden on the other side of the motherboard.

2.1 Install WiFi Card

The two mSATA ports are stacked, and visible as vertical barcodes at the bottom of the following image. The half-sized mSATA device, in our case the wireless card, will have to go in the lower slot, as the SSD won't fit there. When an mSATA device is inserted into its slot in the NUC, it will rest at an angle, as it's being pushed up by a spring for easy removal. There should be already be a small Phillips screw in the housing that can be removed and used to secure the card.

2.2 Install SSD Card

Once the lower mSATA device is at home, the upper one can be installed in similar fashion. The NUC should have provided a screw for this device as well. When all is done the upper device should completely hide what's underneath.

2.3 Install Memory

The SODIMM modules will overlap, so the lower one should be inserted first. They'll have to be pushed down until the housing clicks into place. To remove them, pull the plastic side brackets apart until the module pops up.

Below, all the components have been installed. The WiFi card is completely hidden by the SSD:

2.4 Clean Up

It's safe to leave the cover off while you plug everything in and power up the device, just until you know that there isn't an immediate problem with the hardware. When it comes time to screw the cover back in, the soft pad on the inside of the cover is meant to sit against the upper mSATA device. I'm not sure if this pad is a heat sink, or just a cushion to protect the component.

3. OS Intallation

My operating system of choice for a home server is the latest version of Ubuntu Server, which is 14.04/Trusty at the time of writing.

3.1 Update the BIOS

You may have to update the BIOS—a computer's basic pre-installed operating system—before installing a proper OS. Intel's website recommends doing this only if necessary:

Update the BIOS on your computer only if the newer BIOS version specifically solves a problem you have. We do not recommend BIOS updates for computers that do not need it.

An example of such a problem is incompatibily with certain components in an older BIOS version. I upgraded the BIOS the first time I put a NUC together so that I could install an operating system from a USB 3.0 flash drive, and I followed the F7 Flash Update Method.

3.2 Create a Bootable USB Flash Drive

This is my method of choice for installing an OS. Bootable USB Flash drives are portable and fairly easy to create.

I followed this Ubuntu tutorial. Be sure to use rdisk instead of disk when copying the OS image to the drive, as the copy will proceed much faster. A discussion of this phenomenon is available on superuser.com.

Once the drive is created, simply plug it into the NUC and power up the system. Depending on the version of the BIOS you may have to explicitly enable booting from USB. When successful, you'll see the Ubuntu boot menu:

3.3 Install the Operating System

Follow the instructions on screen, and the installation should go off without a hitch. If you've installed an earlier version of Ubuntu, you can upgrade from the command line:

$ sudo apt-get install update-manager-core
...
$ do-release-upgrade

4. Configure SSH

Once everything is set up, I run my home server "headlessly", meaning I don't keep a display, keyboard, or mouse connected to it. Like a real-world server, it's doing computational things behind the scenes, and when I need to interact directly with it, I do so over the network via SSH. To install and run the SSH server type:

$ sudo apt-get install openssh-server
...
$ service ssh start

Before we can log in, we'll need the IP address of the new machine. To find it, type ifconfig in the terminal and look for the value after inet address in the em1 section. This IP address should follow the form 192.168.x.x, a scheme used for local networks. We'll pretend the value is 192.168.1.25.

4.1 Log in Internally

With the external IP address, we can log in to the server from another computer on the network:

$ ssh username@192.168.1.25

We can use a hosts file entry to map this address to a more attractive name, perhaps the hostname given to the server when we installed the operating system. It may also help to configure our router to assign a consistent IP address to the machine based on its MAC address.

4.2 Log in Externally

Chances are you'd like to access the server from the outside world, and the easiest way to do this is to expose the SSH server outside of your private network. The internal IP we've been working with isn't visible outside of the home network. What is visible is the public IP assigned to your modem/router by your ISP. You should be able to find this address from the router control panel.

SSH uses TCP port 22 by default. Most routers allow port fowarding, whereby a TCP connection request to the external IP and port can be forwarded to an internal IP and port. In this case we'll forward port 22 of the external IP to port 22 of the internal IP of our web server. Now you should be able to SSH into the web server from outside of the local network, using the external IP address.

Setting up a home web server for public access can be tricky, especially when it comes to dynamic IP addresses and network security. In a future post I'll cover the networking aspects of my home setup—including private VPN, dynamic DNS, and NAT loopback.