This is a story that grew in the telling. It’s also a story of firsts; my first time using Django, first time using AWS, first time using Nginx…like I said lots of firsts. So let me start at the beginning.

Background

I had an idea for a web application and I decided to build it using Django. I hadn’t used Django before so there was a bit of a learning curve, but all in all it went really well and after a bit of development I had something to share with the world. The question was how to deploy my application ? When you’re developing with Django things are pretty straight forward. Django ships with a basic application server that can serve your code that works great for testing, but it’s not designed for production use. For production you need 2 things; a WSGI compliant application server for your Python code and a regular web server for static files and resources. This usually boils down to a choice of Apache or Nginx for the web server and 1 of mod_wsgi (if you’re using Apache), uWSGI or Gunicorn. I decided for my application to use Nginx (for no other reason than I had been reading good things about it and wanted an excuse to play with it) and Gunicorn (because there seemed to be good documentation). The next question was how to host the application and for that I chose Amazon Web Services (AWS). The problem was when I started looking for documentation on how to set everything I was able to find descriptions for each of the component pieces but nothing that put it all together. The remainder of this post does just that.

AWS

NB: The following assumes you’re signed into the AWS web console.

AWS offers a plethora of services for developers to build applications, but with so many options it can be confusing for noobs like me to know which services they should use. For example, should I run my own EC2 instances or should I use Amazon’s PaaS offering (Elastic Beanstalk), should I install a database myself or use Relational Database Service (RDS) ? So many choices and to be honest the only way to make an informed decision is to read the documentation for the available services. For my requirements I chose to use an Ubuntu 14.04 EC2 instance to deploy Django and run a PostgreSQL instance on RDS. I had investigated using Elastic Beanstalk, but it wouldn’t have given me the level of control I wanted for the Django installation.

So the first thing I needed to do was signup for an AWS account. When you sign up you get a basic level of resources free for the first year and as long as you don’t use more than those resources you won’t be billed for anything. Once I had an account setup I created a second account using Identity & Access Management (IAM) to use to log into the console rather than the root account. Once that was done I logout of the root account and log back in using the new account. Now I was ready to setup my security groups.

VPC & Security Groups

When you create an AWS account Amazon create a default Virtual Private Cloud (VPC) for you. Your VPC allows you to isolate your services and control access to the Internet. You can create separate subnets for services to isolate groups of resources and then define what visibility they have to the Internet and to other subnets. The default VPC Amazon provide you with has 1 subnet defined which is accessible from the Internet. This was sufficient for my setup, but if you have more complex requirements you can setup what ever subnets you need.

Once you’ve configured the VPC the way you want you need to setup the access rules for the services you’ll be running. You do this using Security Groups. Security Groups are essentially firewall rules that you define which you can apply to any server or service. The great thing is that they’re reusable so if you apply the same group to several services and then make a change that change is automatically applied every where. To create Security Groups:

Select VPC from the Services menu.

Select Security Groups from the sidebar menu.

Click ‘Create Security Group’ and fill in the details.

I created 2 groups, 1 for accessing the database and 1 for accessing the web server.

After creating the 2 groups I selected the database group from the list and selected the Inbound Rules tab. Since I’ll be running PostgreSQL I needed to allow access on the default PostgreSQL port, 5432. I did that by creating a new Custom TCP rule and setting the port to 5432. I also needed to set the source address range that can access the rule. (If you want to set this completely open you can set it to 0.0.0.0/0 but if you want to limit it to a specific subnet you can enter the subnet address range here.)

I then did the same thing for the web server group, but this time used the default values for HTTP and HTTPS on the Inbound Rules tab. I also added an SSH rule so that I could connect to the server remotely. I was now ready to setup my database instance.

PostgreSQL RDS

Amazone Relational Database service allows you to run relational database instances in the cloud. You can select from a number of database types and once you have an instance created AWS can handle things like backups, data replication, automated failover etc.

To setup an RDS instance you should select ‘Database’ and ‘RDS’ from the Services menu and click ‘Launch a DB Instance’. You then select the type of database you want to setup, in my case it was PostgreSQL, and complete the rest of the setup, I selected the default VPC and the database security group I setup earlier. AWS will now create your database instance. Once the instance is running I logged into it using the root username and password I set during setup (I used pgadmin to connect to the database, but you can use any tool that works for you). Once connected to the database I created a new user account for use by the web server. I granted this account all privileges except making it as super user since Django will need to create tables, users etc.

Ubuntu 14.04 EC2

The next step was to setup a server instance to host Django. I did this by selecting EC2 from the Services menu and clicked ‘Launch Instance’. I then needed to select a virtual machine image to launch, in my case this was a Ubuntu 14.04. Next I had to select the instance size. If you want to stick within the free tier choose the smallest size, in this case t2.micro and then configured the instance details such as the network (VPC), subnet, public IP assignment etc. I then allocated storage for the instance, tagged it (to make it easy to find if you have a lot of instances) and set the Security Group to be the web group I created earlier. Finally I reviewed the instance details and clicked Launch. AWS then created and initialised my instance.

Once the instance was up and running I SSH’d into it using the default username for Ubuntu instances (which is ubuntu) and the keyfile I downloaded from AWS. The command should look like this:

> ssh -l ubuntu -i yourkeyfile.pem host.or.ip.address

You can use either the public IP address or the public DNS name AWS assigns to the instance. One thing to note though is that the public IP and DNS name get reset if the instance is restarted. To get around the IP changing you can assign a static IP to the instance using Elastic IP.

Once logged into the server I installed some additional packages to complete the setup. The first thing I did was run:

> sudo apt-get update

Followed by

> sudo apt-get install python3 pip3 nginx git libpg-dev

These 2 commands installed Python v3.4 and pip, Nginx, Git and the PostgreSQL development libraries which are needed to install Psycopg2, Python’s PostgreSQL binding. The next step was to create a directory to hold the project and checkout the code from my source repository using git e.g.

> git clone http://url.of.repo

Finally I had to install the project’s Python dependencies by running:

> sudo pip3 -r requirements.txt

from the project’s root folder. Now I was ready to configure Django for production.

Django

By default Django stores configurable settings in a file called settings.py. This is where you put things like your installed apps, database connection details, secret keys, middleware and generally anything that you might want to be able to configure. This works great when your developing, but as soon as you need to deploy to more than 1 machine you have a problem. For example, if you use different databases for development and production (which you should) you will need 2 different configurations. There are several ways of working around this, but what I did is create 3 settings files:

base_settings.py which holds settings that are common to all environments

dev_settings.py which contains settings specific to my development environment

prod_settings.py which contains production specific settings

Both dev_settings.py and prod_settings.py import base_settings.py to make them available in both environments. I then had to tell Django which settings file to use. That can be done via an environment variable, but the manage.py utility seems to be hard coded to look for settings.py and if this doesn’t exist you can’t run migrations or any other commands provided by the file. The way I work around this (which works as long as your running a Linux/UNIX based OS) is to create a symlink for settings.py that points to the correct environment file. Once I did that I ran

The next step was to setup a web application server. Python offers several application servers that are WSGI (Web Server Gateway Interface) compliant, the one I chose to use was Gunicorn (Green Unicorn). Gunicorn is a light weight reasonably fast Python port of Ruby’s Unicorn server. I had already installed it through requirements.txt so now I just need to run it. The simplest way to run Gunicorn is to execute the following at the command line

> gunicorn -w 2 myapp.wsgi:application

where -w specifies the number of worker processes. This will start the server listening at http://127.0.0.1:8000.

This is usually fine for a development environment, but for production you probably want to customise the setup such as setting the IP address or port binding, changing the number of workers or threads etc. Fortunately Gunicorn allows you to make these customisations either on the command line or by specifying a configuration file. The config file is just a Python file where you specify the parameters you want to set as variables and you tell Gunicorn which file to use by passing the path via the -c command line option, e.g.

where we set the IP address and port to run the server on, the number of workers processes and the number of threads per process. Gunicorn has many options that can be set and I recommend reviewing the documentation on the web site for more details.

Running Gunicorn with these settings will got my application server up and running, but there’s a problem; if the server fails or the instance is restarted Gunicorn will not be restarted automatically. To get around this I needed to setup Gunicorn as a service. On Ubuntu 14.04 this is done using Upstart. So what I did was create a config file that defines how I wanted Upstart to treat my Gunicorn service. Here’s an example config file

This tells Upstart when it should start the service, restart the service if it isn’t running and the command to execute (in this case Gunicorn). I created this file in /etc/init and then ran

> sudo service gunicorn start

Now if the server crashes or the instance restarts Gunicorn will always be restarted.

Nginx

The final step was to setup a web server to serve static resources and to act as a reverse proxy if required. For this I used Nginx which was installed as part of setting up my instance. Nginx comes with a default site setup so if you access your public IP address after you install it you should see a welcome page. What I then did was replace the default site with my application so that Nginx served the static resources, such as javascript, CSS, images etc, and all other traffic got routed to the web application. Fortunately Nginx is simple to configure and setting up my application involved creating a configuration in Nginx’s ‘sites-available’ directory (usually at /etc/nginx/sites-available) and then creating a symlink to it in the ‘sites-enabled’ directory (usually /etc/nginx/sites-enabled). By convention you usually name the configuration after the domain it will be hosting e.g. myapp.com. Here’s an example configuration for connection to Gunicorn

Here I’m setting the server to listen on port 80 and then pointing any requests for static resources to my static files directory (note this can be anywhere on the machine). Finally I specify that any other requests should be passed to the uWSGI server running on port 8000 on the localhost (obviously this has to match the IP and port Gunicorn is running on). The ‘include uwsgi_params’ adds some pre-configured settings Nginx has defined for uWSGI servers.

Finally, to get Nginx to apply the new configuration, I needed to execute

> sudo service nginx restart

My application was now be up and running and by entering the public IP address in a browser i was able to see my site.

Optional / Bonus Step – DNS Routing

One extra step you can apply is to setup a domain name so you don’t have to enter the instance IP address all the time. To do this you first have to register a domain name and then go to your providers DNS control panel and set the @ record to point to your AWS Elastic IP (this assumes you don’t need email or are hosting your own server, if you’re using a hosted email service you’ll need to setup your DNS records to route your mail correctly). Once that’s done you need to modify your Nginx config to recognise the domain name. You do that by adding

Recently Techcrunch published this article, On Secretly Terrible Engineers. I’ve been thinking about it and digesting it for several days because there are somethings in it I agree with, but a lot that I don’t and I’ve been trying to think about how to phrase my objections. David Harney does an excellent job of deconstructing and refuting the article on his blog and I recommend reading it. I largely agree with what David’s written so I won’t rehash it, but I do want to talk about some issues I have with the article in the large.

Let me summarise what I took from the article, “Stop wasting time asking irrelevant technical questions in interviews, if the candidate has worked in the industry for more than a handful of years they must know what they’re doing.”

I’m going to address this in two parts. First, “Stop wasting time asking irrelevant technical questions in interviews.” Ok, this I agree with. Too often when interviewing candates we revert to asking them to implement FizzBuzz or Quicksort or some other algorithm that can easily be looked up and memorised before the interview. Now don’t get me wrong, if your company is developing a commercial FizzBuzz application or you require your engiineers to repeatedly implement sorting then by all means ask these questions as they’re key to your business, but if you’re not doing either of those things why not ask something more relevant to the problems your engineers face every day.

Let me tell you about the best technical interview I’ve taken part in. A few years back I interviewed at an ecommerce company. As part of the interview process I was required to write some code, but instead of one of the usual contrived exercises the company provided me with a small sample of their production code and asked me to implement a minor feature. I then had to defend my solution in a code review with two of their engineers. This was a great test. For me, it gve an insight into their code base and engineering team, how the code was structured and the sort of things I could be working on as an employee. For the company the test was able to show them not only if I could code, but also did I to understand the requirements, did I ask questions to uncover the implicit requirements that weren’t in the spec, how I went about designing a solution and ultimately whether I would be a good fit with their team. And all that from a problem that wasn’t any more complex than FizzBuzz or Quicksort. I don’t know whether the root cause here is laziness or lack of imagination, but it isn’t helping anyone. If coming up with a standard test is proving hard pick a feature from your backlog or issue tracker or better yet use a feature you’ve just implemented…it’ll be something you’re familiar with and you’ll be better able to judge how someone else approached the problem.

Now the second part, “If a candidate has worked in the industry for more than a handful of years they must know what they’re doing.” This is the part I have real problems with. The assumption that someone must know what they’re doing because they’ve been in the industry for several years is just wrong. Why is it wrong ? Well for starters assuming that time spent doing equates directly with ability doesn’t stack up. True, you would expect that someone with 5 years experience would know more than someone with 1 year, but it depends on context. If the person with 5 years experience has been building web-based CRUD applications but the person with 1 year has been building desktop applications, but you’re building a mobile app then who is the most competent ? You don’t know, you have to test them. Ok you say, but what if you are building a web-based CRUD app, surely then the first person would be the most competent ? Maybe, but you still have to look at what they’ve done and what they’re able to do and you can only do that by testing them.

The second problem is how do we define cometence ? In the article the author seems to equates competence with the ability to write code, the argument being that since the candidates have been in the industry for so long they must be able to write code and therefore they must be competent, but this argument is flawed. Writing code isn’t end goal of what we do it’s an artefact of how we do it. What I mean is what we do is solve problems using computers and it sometimes the way we do that involves writing code. Equating knowing how to code with technical competence confuses the medium with the message, it’s equivalent to saying that knowing how to write English is sufficient to write great literature…it’s certainly a pre-requisite, but there’s more to it than that. Similarly the ability to code is table stakes in our industry, but it’s just a starting point.

So if coding is just a start then what is a fair way to judge someone’s ability ? Well frankly it’s all the other stuff that goes on around the code, things like knowledge of design patterns, architectural patterns, being able to design a solution to a given problem and most importanly knowing the trade offs you’re making in your solution, what the alternatives are and being able to explain why those trade offs are the right ones for your solution. What it really comes down to is that competence isn’t down to knowing any particular language or framework, that’s mostly just syntax which, if you have the right mental models of how everything works, you’ll pickup quickly any way, but it is about demonstrating that you have that mental model and can apply it…and that’s what you need to determine in a technical interview.

NB: This isn’t really aa tutorial on writing Clojure macros, it’s a description of a macro I wrote and how I went about it. If your looking for an introduction to writing Clojure macros there’s an excellent one at Clojure for the Brave and True.

I’ve been working on a library for managing users and I’ve found I’ve been writing a lot of code validating parameters, it looks like this:

What I really want is a way to wrap the code that depends on the username and password so that it only executes if the values are valid or else throws the relevant exception. This sounds like a job for a macro.

Now that’s a big improvement, there’s less code, it’s easier to read, the intent is clearer and we can use any thing we want as a validation function. Ok, so now I know what I want how do I get it ? Well my first thought was that this is a bit like cond.

The cond Macro

To quote the cond docstring:

Takes a set of test/expr pairs. It evaluates each test one at a time. If a test returns logical true, cond evaluates and returns the value of the corresponding expr and doesn’t evaluate any of the other tests or exprs. (cond) returns nil.

That’s pretty good, it’s certainly an improvement on what I’d been doing previously, but I’m not sure it’s so easy to understand the intent of the code. However given cond is a macro if we look at its source code it might give us an idea of where to start. The source of the cond macro looks like this:

(defmacro cond
"Takes a set of test/expr pairs. It evaluates each test one at a time. If a test returns logical true, cond evaluates and returns the value of the corresponding expr and doesn't evaluate any of the other tests or exprs. (cond) returns nil."
{:added "1.0"}
[& clauses]
(when clauses
(list 'if (first clauses)
(if (next clauses)
(second clauses)
(throw (IllegalArgumentException. "cond requires an even number of forms")))
(cons 'clojure.core/cond (next (next clauses))))))

Here we can see that it first checks if it has any clauses before creating an if clause and then, if it has more clauses, recursively applies itself to the remaining clauses. The code it generates is not unlike what I’d been writing originally so it definitely looks like the right approach. Whilst the cond code doesn’t give me a solution to how to write my macro it does make me think maybe I should try rewriting my it using cond.

First, (map #(list not `(~pred ~%1)) s) creates a sequence of cond terms by wrapping each string with the predicate term and negating the result. Then the sequence is interleaved with the sequence of handlers that need to be called if a particular string fails validation. Finally, it inserts the terms into a cond expression and adds the body of code to execute if all the strings are valid as the :else clause of the cond expression.

Some issues

It’s looking pretty good, but there are some issues. Firstly, the macro assumes the strings and handlers will be sequences…pass it a single value and it blows up. The easiest way to fix that is to ensure that we’re always dealing with sequences, for example:

Secondly, the macro assumes the sequence of strings will be the same length as the sequence of handlers. In most circumstances they will be so this seems to be a reasonable assumption, but what about the case where you have 2 or more strings to validate but only want to provide an error handler for the first string ? Well because we use interleave to combine the strings with their handlers the macro will only interleave to the length of the shortest sequence and so in this case only the first string will be validated. One way to get around this is to manually ensure that the sequences are the same length by padding the handlers sequence with nil, but that’s messy and prone to errors. A better way is to automatically pad the sequences so they’re the same length. To do that we first need a padding function, such as:

This function takes a sequence, the length we want to pad the sequence to and a function to generate the additional elements and returns a sequence containing the original values padded to the required length. So adding that to the macro we get:

The last issue with the macro is that in its current form it requires us to provide a vector of error handlers, even if the vector is empty. If we don’t provide the vector the macro blows up. So we need to be able to handle the case where no error handlers are provided.

The easiest way I can think to do that is to create a multi-arity macro so we can handle the situation where no error handlers are passed as a special case. It turns out that if we don’t have to worry about the error handlers the macro becomes much simpler since all we need to do is ensure that the predicate is valid for all elements of the input sequence and as it happens Clojure provides us a function to do just that,every?. So adding in our special handling we get:

Conclusion

So there we have it a macro we can use for to ensure code is only executed if a sequences of values are all validated and as an aside we’ve also got a function to pad sequences so they end up the same length. There are probably some things we could do to improve the macro, but for the moment it works as required and that’s good enough for me.

I put together this small library for doing Google Code Jam with Clojure and just uploaded it to Github in case anyone else might find it useful.

Code Jam

Code Jam is Google’s annual coding competition. It consists of a series of rounds and in each round a series of problems that must be solved in a limited time. The problems usually consist of reading data from an input file and processing it in some way o get the required results with points being awarded for completing the problem for small and large datasets as well as how quickly the problem is solved.

I wanted to try solving the problems using Clojure and while working through the prior years problems I put together this library to handle the mundane stuff like reading the input and writing the output so I could focus on solving the actual problem.

Usage

So how do you use it ?

First you need to include the library by adding the following to your project.clj:

[amanoras/clojam "0.1.0”]

Next you need to include the library in your code, for example:

(:use [clojam core cases utils])

Finaly you should (but you don’t have to) define a main function that can be called from the command line e.g. using lein run, so that you can pass in the names of your input and output files. The core of the library is the jam function which takes as arguments:

the path to the input file

a vector that describes how the data in the input file should be combined into cases

In this example the data in the input file should be grouped into cases 3 lines at a time, but sometimes the input file can contain several fixed lines before the cases start. In this case you can pass a nested vector as the last element of the vector describing the file structure. This is best illustrated with an example:

So I was playing around with an application recently and wanted to integrate Chas Emerick’s excellent Friend authentication library. I setup a simple User service to get users, roles etc in a format that could be consumed by Friend and wanted to test the code using Midje. The question was how to represent the user data repository ?

One option was to setup a test database usaing H2 or something similar and testing the code against it, but that would mean having to reload the database each time the tests were run to ensure the data was consistent between test runs. Another option was to store the data as a map in memory, but that seemed to have the same limitations as a test database. In the Java world we’d get around these sorts of limitations using a mocking framework like jMock or EasyMock, but how could I do this in Clojure ?

Fortunately Midje provides some great support for doing this kind of thing through prerequisites and meta-constants. Prerequisites are great, they allow us to specify the return value of a function without having to specify the implementation. For example, let’s say I have a function get-user-roles that returns the roles for a given user as a set which in turn calls a function retrieve-user-roles that gets the user data from a database and we want to test get-user-roles without having to worry about the database, well in this case we can use a prerequisite to mock retrieve-user-roles. Here’s what our test could look like:

which basically says that if we call get-user-roles passing in ..user-id.. the result should be a set containing ..user.. . The important part is:

(provided (retrieve-user-roles ..user-id..) => [{:name ..user..}]

This is our prerequisite, specified by the (provided) form, and it basically says that when retrieve-user-roles is called with a parameter of ..user-id.. it will return a vector containing a map of {:name ..user..}. Another cool feature of prerequisites is that when we specify a (provided) form not only are we specifying the return value, but also that the code under test must call the function in the prerequisite with the stated parameters. If it doesn’t then the test will fail.

Now some of you may have noticed the odd parameters that were used in the test case, e.g. ..user-id.. . This is an example of a meta-constant, the second thing that Midje provides to assist in mocking. Meta-constants allow us to defer decisions about what data we want to use in our tests, essentially they allow us to substitute the data for a symbol and then refer to the symbol rather than having to worry about the actual value. For example, in the sample code above we pass a meta-constant, ..user-id.., to get-user-roles rather than passing in an actual user ID since we don’t really care about what value is passed to the function only that when ..user-id.. is passed a specific result should be returned. True in this instance we could hard code a value in the test, but using a meta-constant gives us a couple of advantages. Firstly it makes explicit that we aren’t concerned with the actual value that is passed to the function whilst making it clear the value under test is a user ID and secondly it makes it easier to catch typos and errors where the meta-constant is used as the test will fail if the meta-constant name is wrong or used inconsistently within the test.

As a further example suppose we want now to test admin accounts. In this case we can easily write a second test passing in a new meta-constant ..admin-id.. to the get-user-roles function and add a new prerequisite to return a different set of data for admins. Our test can now expect a different set of roles to be returned without having to change any core code or worry about what data is in the database. Magic.

When it comes to web development with Clojure everything pretty much revolves around Ring and Compojure for HTTP abstraction and routing with a generous helping of Hiccup or Enlive for HTML templating plus Friend or SQL Korma or whatever else you want need to round out your stack. This works pretty well since it gives you a huge amount of control over how you put together your application, but as a newcomer to Clojure it can be really daunting since not only are you trying to learn the nuances of a new language but also the intricacies of a whole bunch of libraries. When you’re new to something like this what you really want is a one stop shop where you can get all the parts without having to worry about how the pieces fit together.

One early attempt to do this for Clojure was Noir. Noir is essentially an abstraction over Ring/Compojure that includes Hiccup by default and adds some extra features like cookie handling and stateful sessions and some syntactic sugar for creating pages. However with the recent deprecation of Noir that pretty much puts us back where we started. But as they say when God closes a door he opens a window and in this case the best bits of Noir were repackaged as lib-noir (a library that can be accessed by any Clojure code without the rest of the framework) and this was taken up by a couple of new frameworks such as Luminus and Ganelon. Both of these projects build on top of Compojure by adding lib-noir features and smoothing out some of Noir’s rough points, but whereas Luminus is closer to a straight replacement for Noir (although it uses a different templating library and adds database support) building a full stack framework a la Rails for Clojure, Ganelon has taken the best bits of Compojure and Noir and added some AJAX sizzle.

Essentially Ganelon is Ring, Compojure and Noir (with better handling for custom middleware) and some javascript and CSS (Bootstrap anyone ?), but it takes an unusual approach to AJAX. Really what it comes down to is that you can create widgets (which are snippets of HTML you want to perform some AJAX operation on) and actions (which are server-side functions that return JSON or javascript operations). They work by allowing part of your web page to call some code on the server, that code can then generate some HTML fragments or other ouput that then gets sent back to the browser and inserted into the DOM. Pretty simple right ? But all of this is written in Clojure, no javascript no Clojurescript just plain Clojure. What I really like about this is that the code that you use to generate the initial rendering can be reused to generate the updates. This is great because it means you can have a single code base and don’t need one set of code to generate the initial page and another one to do the samething in javascript on the client side.

There’s obviously more to it than just that and I’ll try to work through a tutorial in the near future, but for the moment I recommend that anyone interested in Clojure web development check it out.

I’ve been following Clojurescript since it was released, but have only recently started using it. Just in case you don’t know what it is, Clojurescript is a compiler for Clojure that compiles to Javascript so it can be executed in a browser. It can also generated highly optimized Javascript by running it through the Google Closure (not to confused with Clojure) compiler.

Since there are already some really good introductions to Clojurescript online I thought I’d post some links to those rather than posting a tutorial on getting started with Clojurescript, so here they are: