Tag Archives: programming

Yo dawg, I heard you like queries. So I put a query in your query so you can query while you query.

Now we are starting to get into some really cool and powerful SQL. Subqueries! Let’s jump right in, shall we?

Subqueries For Change

I want to count up all the times something changed across some set of data. You could pull all the data down and iterate over it in your code, and maybe that runs just fine. You can also let the database do the heavy lifting and pull down just the results that you need.

Using the LEAD windowing function we can look forward one record to see what happens next, and include that in the “current” record. We saw that in yesterday’s post. That’s not quite what we need, though. Fortunately we can query against that query, or use that as a subquery for the query we actually want to run.

Right, so what we have there is a nested subquery. The outer query runs against the output of the inner query. If you were to run an EXPLAIN plan against this you would see how many rows the inner query returns and at what cost. I do tend to run the query plans against any of these and then keep an eye on them as the data scales up.

Sorcery! You can see that I refer to the main query in the subquery. That’s pretty handy, but you don’t have to. You could just ask Postgres to give you the max value from this or some other table.

The reading ID may not be the most useful thing, so maybe instead you pull a timestamp. If you did that then you could do math and calculate the duration, which Postgres can also do for you. You can also wrap that up in another outer query. That’s when I start looking to DRY up the base query into a function.

Common Table Expression (CTE)

You can see where all this subquerying can maybe get out of hand. I’ll confess that I have stopped myself when I’ve nested a few queries deep and asked if maybe there is a better way. I try to avoid Russian Dolls when I program. You can also JOIN on a subquery, which is powerful when you need it and also pretty ugly.

Enter the Common Table Expression, or CTE. This is also known as a WITH query. Let’s take a look at that first query and see how that would look in CTE.

WITH amazing_subquery AS (
SELECT id,
major,
minor,
LEAD(minor) OVER (ORDER BY reading_timestamp) AS next_minor
FROM readings
)
SELECT *
FROM amazing_subquery
WHERE minor != next_minor;

Here is the SQL Fiddle. This is a pretty simplified example, so maybe that doesn’t so much knock your socks off.

Let’s take this up to 11, shall we?

WITH interval_query AS (
-- Start by making a series of 12 time intervals
SELECT (ts ||' hour')::INTERVAL AS hour_interval
FROM generate_series(0,11) AS ts
), time_series AS (
-- Then take that interval and convert it into a TIMESTAMP
SELECT date_trunc('hour', now()) + INTERVAL '60 min' * ROUND(date_part('minute', now()) / 60.0) - interval_query.hour_interval AS start_time
FROM interval_query
), time_intervals AS (
-- Then take that TIMESTAMP and add an hour to it to make an hour timebox
SELECT start_time, start_time + '1 hour'::INTERVAL AS end_time
FROM time_series
ORDER BY start_time
)
-- RIGHT JOIN on those times s.t. we get all the times even when data is missing
SELECT f.start_time, f.end_time, r1.minor, count(DISTINCT r1.phone) readings
FROM readings r1
RIGHT JOIN time_intervals f
ON r1.reading_timestamp >= f.start_time AND r1.reading_timestamp < f.end_time
GROUP BY f.start_time, f.end_time, r1.minor
ORDER BY f.start_time, r1.minor

Now it is entirely possible that there are better ways to do this. I am no DBA. I’m just a simple caveman programmer who likes to take data out for a spin and see what it can do for me.

When I had projects in Sybase and SQL Server I would build these complex queries up over a series of temp tables. You can do that with Postgres, too. I just tend to prefer CTE and subqueries. However, there is another technique that Postgres brings to the table that I do use from time to time. Tomorrow.

Yesterday I began to make the case that it’s OK to skirt your ORM and write some raw SQL every once in a while.

I’ll be honest with you. I have no idea how well the ActiveRecord postgres adapter handles the JSON datatype. I mentioned in that post that I always store entire JSON payloads. I also never select the entire column. I always pick what I want out, like I showed in the post.

Today, we have a brand spanking new version of Postgres! Version 9.4 shipped today, and it has an update that I had already planned to talk about. But we aren’t quite ready to talk about it yet.

I’m working through this blog series the same way I work through a refactoring process:

I have some data, and I want to query it.

Huh. That’s a giant pile of SQL. How can I clean that up?

Maybe there is a more performant way to access this gnarly data?

So today we will be in the first 2 bullets.

Windowing Functions

You have a lot of data. You can take that data and fold it on itself to see how one record is related to or changes from another.

Let’s say you want to look and see when someone has moved from one place to another. You can take a row and fold in data from the next row to make that comparison by using the LEADwindowing function. Like so:

SELECT major,
minor,
LEAD(minor) OVER (ORDER BY reading_timestamp) AS next_minor
FROM readings
ORDER BY reading_timestamp;

We’ve asked for the next record — the LEAD. We use the OVER keyword to tell Postgres how to construct the window. This is how we fold the data. So I’ve said that I want the data sorted by the reading timestamps, and given that same sort I want to pull in the value for the next record’s minor column.

Repetition Is Lame

Now imagine that query is the basis for a couple of different queries. You can store that query in the database in a function. You saw how to create a function in the previous post. This is something that I actually do every once in a while.

I will, at this point, take a step back and say that I am not advocating that you stuff all of your business logic in the database. I have fought in that stored procedure holy war on the side of keeping business logic in the application that really needs to know about it.

I see this in a different light, though. This is a code optimization. I’m in control of the code and the logic, and I’ve decided where it makes the most sense for what code to run. Yes, the SQL in the function is compiled and optimized and whatever in Postgres, but I don’t honestly know how important that is. This is just me DRYing up redundant queries and making joins less hairy.

Leveling Up Your Windowing

Now I want to take the data and start to piece together sequences of events. What I really want to do is put together specific events in sequence. And I can totally do that with a windowing function:

SELECT phone,
major,
minor,
reading_timestamp,
FIRST_VALUE(reading_timestamp) OVER (PARTITION BY phone, major ORDER BY phone, reading_timestamp)
FROM readings
ORDER BY 1, 4;

I’ve asked Postgres to give me the first reading_timestamp value across every row in a window. That window is defined as a grouping of like phone numbers and major values, and those partitions are also sorted.

This is the kickoff of a series of posts on using Postgres and SQL to do some interesting things that you may not have known were possible. If you typically use an ORM, like ActiveRecord, you probably do not write a lot of raw SQL. I like Rails, and ActiveRecord has it’s place. This is not a rant where I shake my fist in the air and tell ActiveRecord to get off my lawn.

Instead I want to highlight a handful of techniques that I find really useful, and also perhaps suggest that it’s OK to venture off the golden path and do a little raw SQL where it makes sense. The fact of the matter is that you cannot simply switch databases and have the ORM just handle it for you. At least not once your application starts to get interesting. So use the right tool for the job, and I hope to highlight a few tools that Postgres brings to the table.

TL;DR: SQL is your friend.

JSON Datatype

You may already be familiar with the HSTORE datatype that came in Postgres 9.0. It’s nice, but does not handle nesting. In Postgres 9.2 we got something even awesomer: the JSON datatype. It’s pretty spectacular, and I have used it to store entire raw payloads now anytime I consume resources. You can also use this datatype in ActiveRecord migrations without requiring any additional gems.

What makes the JSON datatype especially nice is that you can reach into the data and pull out specific elements. You can also create custom functions and even operators to do even more powerful things with the data in the JSON column.

Here is an example that takes a column called “beacons” which contains a list of bluetooth beacons that a phone has ranged. We want to look into that list and pull out the minor number from the closest beacon, and the accuracy from the next closest beacon. We also cast values from the JSON datatype to a more meaningful datatype. Yes, the cast to TEXT and then to the actual datatype is a little awkward, but it works.

Wait. What even is that?! You create a function called json_pluck. Then you create a custom operator | that allows you to invoke json_pluck by simply calling the pipe character and telling it for what key to return the values.

I’ve seen some discussion over the years on how to onboard new Ruby developers. What about re-onboarding “old” developers though?

I’ve been a Ruby developer for several years. I remember the bump to Rails 1.0. We used to have to hand-crank our app servers while walking uphill. You kids don’t know how good you have it. I digress while I shake my cane.

As I was saying, I’ve been in the community for a long time, but I have also been away from ruby for several months on other languages. I consider Ruby my “home” language, though, and the Ruby community is a special community that I enjoy feeling a connection with.

So how do I reorient myself? How do I stay connected with the language and the community that I am not always directly involved in?

Twitter

Local Ruby Brigade

Practice

Playing Around

Twitter

I follow a lot of developers on Twitter, as one does. Sometimes I binge add people, and sometimes I cull the list. I love the community, but I don’t love the drama. The drama sometimes eats at me, actually. The things that we get worked up about also eat at me, but sometimes we get a little caught up in the swirl.

I like seeing the evolution of the language and ideas. I like seeing other languages come into play, too. I don’t think Ruby is necessarily the best language solution for all problems, and I believe that it’s important to have a varied language toolkit at your disposal.

Local Ruby Brigade

User groups are a fantastic resource. The Dallas Ruby Brigade has always been very open and nurturing in my experience. There are monthly meetings where people give talks, and I’ve given a few over the years. There are also weekly meetups where we code and socialize. We also have an active mailing list where we discuss current events, upcoming meetings, and also ask and answer questions.

I felt safe as an inexperienced developer, and I am tremendously grateful. A local user group is a really great place to keep your chops up and get experience talking in a group setting (whether in the front or in the audience).

After a couple of years I felt like it was my turn to “give back” and get more involved, which has been very fulfilling.

Practice

Certainly you can practice the wrong things, but I find practice very valuable. Repetition helps me learn and cements ideas for me. I also learn by doing. I need a concept to have a practical application or else I just will not see it. For the past several months I have done Exercism exercises in several languages. For a while I was doing Ruby backend services, so I did Javascript exercises to keep up those chops. Then we started doing a lot of native apps, so I focused on Ruby exercises. I’ve also looked at a few katas here and there. Sometimes they’re really hard and I don’t get them. Sometimes they’re too easy and I don’t get much value out of those either. Usually they’re just right though.

Scratch An Itch (Play Around)

I’ll occasionally try to cook up a little project to work on. I can’t tell you how many times I’ve redone my blog. Not each of those has seen the light of day, and that’s ok. It’s more about the process. I usually also have a new data model in mind, which means that the data needs to be transformed. I love data, and this is a really fun exercise for me. ActiveRecord is great, but it can also be pretty limiting. Sometimes raw SQL really is the best way to express what you need to do.

Other

I also try to make it to at least one Ruby conference a year, as well as submit talks on various things. Thinking about what I might be able to shed light on or what things of value I might be able to share helps me take a critical look at my own growth and evolution.

Addendum

I realized as I was walking around thinking about doing something in Xcode that this isn’t just Ruby for me. I bounce around languages a lot based on the projects I work on. One thing that is unique to Ruby, though, is the community.

When I’m working on a Rails app that relies on live data, like a reporting app for example, I find myself constantly wanting to pull live data to my local dev database. So I made a rake task to do this:

Perspective. Being a polyglot is good, and I think the natural evolution as you progress from beginner to seasoned developer. I’m not here to discuss the benefits of a multi-lingual cross-platform architecture. This is about me, and this is about you. This is about how we spend our time honing our craft on our time.

The Sapir-Whorf hypothesis says (about natural languages) that the languages we know limit what we think. Continuing that thought, I believe that the programming languages you know, the communities you participate in, the things you feel passionate about, all become the lens through which you see the world. Sort of like Kant’s subject-object problem, where he suggests that we experience the world as conveyed by our senses. You don’t experience (or evaluate) anything directly, but instead through rose-colored glasses (as I remember the lesson going).

As you participate more, and as you learn more, you expand your field of vision. You recognize patterns and see where different problems are better solved in different ways, and perhaps a language is a good match for some problems and not as good for others.

A single language (or pattern) isn’t the silver bullet for all programs.

Varied experience also allows you to think about problems at a higher level rather than getting lost in the weeds of implementation specifics or community dogma.

Another benefit to the polyglot life is that it’s just fun to explore and play around. We are naturally curious. The Pragmatic Programmer suggests that we make ourselves uncomfortable and learn a new language every year. A different language with different constructs.

As a reminder, I’m talking about personal growth and how we spend our off time honing our craft. Some of these new languages and patterns will make their way into production code or glue processes, but that’s not really what I’m driving at here. This is more along the lines of what Steve Klabnik has called “Irresponsible Ruby.” You’re playing with language to explore and push the boundaries.

I have noticed some downsides to the polyglot life, though.

I am a long-time Rubyist. I remember the bump to Rails 1.0. I remember Chunky Bacon and Rails is a Ghetto. I’ve forgotten more Rails API changes than I remember now. I haven’t spent as much time doing Ruby lately, and I miss it. I miss the community. I think the community sometimes takes itself a little too seriously, but I miss my “home” community.

Context switching can also be expensive. Even just between different tasks in a single project, but especially between different projects in different languages.

I also joked with someone recently that I’m not “good” at anything anymore. I once had a boss suggest (when I was just getting into web development) that I should narrow my focus. Remember all the things you have to get a handle on to do web development? HTML, CSS, JavaScript, some app server backend, some web server to run the app server, linux to run the web server that runs the app server, ….

He was trying to look out for me. He was concerned that I would be a Jack of all trades and master of none. I feel that way now as I bounce around different projects in different languages.

Maybe it’s like that for everyone, and maybe that’s just where I am personally in my own growth. Perhaps I am back in that “beginner” stage, but on a different field. I’ll take it, though.

Yesterday I laid out a high level overview of how I use Docker. I neglected to point out 1 thing — the environment variables. I exposed the POSTGRES_USER and POSTGRES_PASSWORD, but didn’t explicitly call out what to do with those in the context of the Rails app.

Docker exposes some variables for you based on the containers and their names. You can also specify your own with that -e switch.

Alright. With that little bit cleared up, let’s talk about how I actually use all this now.

How I Actually Use All This Now

Remember the intent of this exercise was to come up with a way to easily share an app with it’s dependencies in some nice tidy package such that nobody else needs to understand how to line up all those dependencies and deal with the maintenance.

Docker is great, but one of the things that I liked about Vagrant was how easily provisioning and reprovisioning happens. But I don’t really need all the overhead of some big complicated thing that requires 3 separate licenses and all that. We have boot2docker, which handles the virtualization on OSX with VirtualBox. It’s maybe not the most amazing software, but it hasn’t left a bad taste in my mouth yet.

I like the simplicity of Docker, and I like the simplicity of just using the tooling that Docker provides us.

So to simplify the workflow (using Docker and Boot2Docker) I wrote a bunch of rake tasks. Feel free to comment or submit updates.

So what does this buy us, this giant pile of rake tasks? This gives us a thin wrapper around some pretty basic functionality. Rake also has the ability to chain things together, so you can ask to clean up all the old instances and it will make sure they’re all stopped and that boot2docker is even running.

It makes is so that we can pass this around and have a simplified workflow.

Sometimes boot2docker loses it’s mind and quits on you, and you don’t even have to care about that anymore because the rake tasks will make sure boot2docker is running for you.

Here are all the rake tasks that I currently have in place:$ rake -T
rake boot2docker:install_nsenter # Install nsenter
rake boot2docker:shellinit # Setup shell
rake boot2docker:ssh[container] # SSH into the given container
rake boot2docker:start # Start VM from any states
rake boot2docker:stop # Gracefully shutdown the VM
rake docker:fig:build # Use fig to build the Dockerfile image
rake docker:fig:up # Use fig to run the app as described in fig.yml
rake docker:info:hostname[container,port] # Give the hostname and port for a container
rake docker:info:images # List all the images on localhost (docker images)
rake docker:info:ps # List all the container (docker ps -a)
rake docker:maintenance:rm[container] # Remove one or more containers
rake docker:maintenance:rmi[container] # Remove one or more images
rake docker:maintenance:stop[container] # Stop running containers
rake git:pull[remote,branch] # Pull the freshest code
rake git:stash # Stash files

Generalized Yet Specific

So we have Dockerfiles and environment variables that we can pass into containers. We have boot2docker to handle the virtualization for us on OSX. We have rake to manage docker and boot2docker. But what about apps and their dependencies?

Fig

We need a DSL that is a little bit generalized, but not as generalized as Chef or Puppet. We just need to be able to tell Docker how to do all those things that were in my little container startup script. Enter Fig.

Then you can ask docker about the instance:$ rake docker:info:hostname[railsbasic_web_1,3000]
http://192.168.59.103:49154

Even cooler, you can open that up in your browser withopen $(rake docker:info:hostname[railsbasic_web_1,3000])

And using a little bit on linux magic (nsenter) you can SSH into the running container:$ rake boot2docker:ssh[railsbasic_web_1]
SSH into container: f52a7f422ec2
root@f52a7f422ec2:/#

Summary

So that’s where my head is at the moment. Docker + boot2docker + rake + fig. That gives you a flexible system that is still specific enough. You can run multiple apps, and you can have multiple apps that have dependencies in each other. I’ve had a little more trouble getting this running with a Scala stack, so YMMV.

Again, I’m not saying that any of this is particularly correct. It’s just what has evolved for me in my particular team’s use case. I would love to hear feedback on what you think works best for you and your needs. I would also like to thank Clifton King, who gave me a couple of docker-related tips along the way.

I have spent the past several months playing with Docker off and on. I have to say that I think I really like it.

I love the idea of really small, specific, containers. I also love the idea of building up a toolkit of things that you use.

Some Background

I work at Sabre Labs, and we get to explore and experiment with trends and technologies that we think could make a difference in the travel world. This means that we spin up a lot of projects in a lot of different technologies and languages. So it is really nice to have a toolkit of things that we’ve used and done. That also helps with knowledge sharing.

About a year ago we set upon a path to attempt to make DevOps easier in a cross-functional team. We have a designer, a front-end developer, and a couple of back-end or full-stack developers. DevOps can be very time consuming when you’re working in a Java stack one day and a Rails stack the next.

Vagrant

First we looked at Vagrant, using Chef to setup the dependencies. It worked, generally. It’s expensive. It’s messy.

The licenses are frustrating. You pay for Vagrant, which is fine. If you want to use VMWare, which we did, you pay for that. Also fine. If you want to use Vagrant with VMWare you pay for a third license. That feels weird.

Chef. I’ve used Chef before in previous engagements, and it’s worked ok. It’s better than nothing (or shell scripts) for sure. My frustration with Chef is how messy the recipes are. The dependencies in particular. We didn’t use Chef Server, so we would just copy in all the recipes that we needed, and then their dependencies. Look at the dependency chain for git (on Ubuntu). Before you’re done you’ve brought in Windows dependencies. And also the Yum package manager. Why? That makes me crazy.

I know that Vagrant supports Puppet, and I just never got around to looking at Puppet. I know that Vagrant also supports Docker now, and I can’t get it to dance. But I don’t really want to. I think that’s overly complicated.

Docker

What’s so great about Docker then? It’s complicated in different ways. Or can be complicated.

Like I said at the top, I love how simple and concise the containers are. I also like the DSL. It’s specific, and it lets me do what feels natural on that particular environment. I don’t need to overly generalize. I don’t need a DSL that can work on all *nix distros. I just want to say what I need and make it so.

Teach Me How To Docker

Look around at blog posts on how to Docker. I’ve noticed that a great many of them fall into the “install ALLTHETHINGS” camp. You build a single container that includes the database and the app server. Bleh. Why bother with containerization if you’re just going to stick everything in a single container?

Small. Simple. Concise. That’s how I Docker.

So, without all of that as the background, here is a simple Docker setup for a simple Rails app.

You see here that I have a base container image called barrettclark/ruby-2.1.2. Docker containers can inherit from other containers. This is where you can end up with a bit of a Russian Doll situation because that container takes a base image and puts Ruby 2.1.2 on it. You see it then runs some basic apt-get commands. We are using Postgres for this Rails app, so we need those header files for the pg gem, hence libpq-dev.

Next we add some files. Docker caches each step to see if it needs to be rebuilt. That’s why we don’t just copy in the whole project yet. Once we’ve handled the gem dependencies, then we can copy over the app. Rails will run on port 3000, and because we know it’s the only thing running in that container, and also in this machine for that matter, we can just expose that port and call it good.

CMD /start.sh tells the container to run that file when the container is started (unless you tell it to do something else when you start it). You can see that it does some housekeeping, database setup, and fires up the app.

What’s all that? The first docker run spins up a Postgres container named ‘rails-basic-postgres’ and sets some environment variables. If you don’t have the given image on your localhost, Docker will go try to fetch it from Dockerhub. The ‘-d’ switch tells Docker to daemonize the container — run it in the background. Next we run the Rails container and link that Postgres container to it.

Remember how I said the whole point of all this was to be able to give this service (or whatever) to everyone else in a cross-functional team. I don’t want them to have to know how to line up all these dependencies. This allows me to handle that for them. The real kicker here is that you can mount a directory on the localhost inside the container. Yes! The caveat is that the localhost directory MUST live in /Users. A symlink is not sufficient. Now this allows a designer to play with the javascript or markup on his or her computer in his or her preferred editor, and nobody has to know where the code is actually running.

Well, that last piece isn’t entirely true. Remember how I sort of waved my hands at just exposing port 3000 and calling it good? The -P Docker option allows us to expose ports, and Docker will map them to a port on the localhost. If you want to be specific you can do that as well with a different switch.

Run docker ps -a to see all the instances running and what ports they’ve been mapped to.

I’m not saying that any of this is particularly correct. It’s just what has evolved and makes the most sense to me. The next blog post will go into a little more detail on how I actually use all of this, and some tooling that I’ve written to make it a little easier.

I had the privilege of attending DCamp this past weekend. It was awesome. While doing the Game Of Life pairing sessions, one of my pair partners wanted to explore the exercise using Minitest. Neither of us had ever used it, so we moved on with RSpec and I made a note to come back to it.

I try to be discerning about bringing additional dependencies (including gems) into a project. I’ve always just used test/unit, and I prefer that syntax over the RSpec syntax. Minitest is baked into the stdlib (starting with Ruby 1.9.3), so you’ve already got everything you need. You can also write tests using either the test/unit or spec style. Win-win right? If you want more there are also additional gems that can supplement functionality. Everybody wins!!!

require 'minitest/autorun'
require_relative "../lib/meme"
describe Meme do
before do
@meme = Meme.new
end
describe "when asked about cheeseburgers" do
it "must respond positively" do
@meme.i_can_has_cheezburger?.must_equal "OHAI!"
end
end
describe "when asked about blending possibilities" do
it "won't say no" do
@meme.will_it_blend?.wont_match /^no/i
end
end
end

I’ve been excited for the past year or so about RubyMotion, and I was even more excited when they announced support for Android.

Then I went to WWDC.

I already felt like PhoneGap was too much of a disconnect. Mobile web and PhoneGap are good in a pinch, and there are things that are more easily done in a web form. However, I’ve had a growing feeling that if you have the ability to make a native app then that is the right answer. Anything short of making a native app is a shortcut that you’ll eventually regret.

You already don’t control the ecosystem, but if you’re depending on someone’s abstraction of the native implementation then you’re one step removed from the metal. More importantly, being as close as possible to the metal gives you a more intimate feel for the nuances of that platform.

iOS 8 doesn’t look all that different compared to the bold new look of iOS 7. But everything is different. Just slightly enhanced or extended. Apps can work together seamlessly. User interfaces are adaptive. Everything changes, but ever so slightly.

If you’re not right up against that cutting edge, then how can you fully take advantage of everything the operating system gives you?