Using seed to explore APIs

overview of what we’re working on and how to explore apis

I’ve been working to update seed, which is HappyFunCorp’s app generator to make it easy to kick off MVPs. Check out the website for more information. One of the things that I’ve started to do is to seperate out the dependancies more, and being tutorials on how to use each of the different features. After I link to that stuff, let’s walk through a way to combine different techniques we’ve discussed together.

Using Seed, OAuth, and Rake to explore github data

What I want to do is to look at all of the Gemfiles for all of our projects, and see which gems are the most popular, which versions we are using, and if we can develop some more expertise around it. But first, I want to get the data.

I could go to each of the repos in github, but there are well over a hundred so that doesn’t work. Github has an API, which is great, but I need oauth2 to access it. Let’s get going.

Generate the app

If you haven’t installed happy_seed yet, do so now by typing gem install happy_seed.

Then generate the app:

$ happy_seed rails project_stats

Now it’s going to install things. Inititally, just say no to everything. Once it’s done, install the github generator:

$ cd project_status
$ rails g happy_seed:github

That will do its thing. Then we need to create the database and get going:

$ rake db:migrate

Ask for more scope

When the github generator is run, it configures the oauth scope that it requests in config/initializers/devise.rb. We need to ask for a bit more permissions, so open up that file and change the scope requested to be "user,repo,read:org", so:

Ok, now we can start figuring out what we need to do to get access to the data. We have an authenticated user account, and we can start hitting the API. I know for a fact that I have way more than 30 repos – I mean, seriously – so first thing is to figure out why that is and how to get more. It’s probably related to pagination.

Looking at this, we see that github returns the contents of the file base64 encoded. I guess that makes sense, so if we want to print it out:

2.2.1:009>Base64.decode64content.content

Using Rake to pull the data down

Get access to the API.

Open up a console and start playing with commands.

Start storing the data locally once we need to get more.

Play around with the data.

Design a UI around in.

Refactor the test “scripts” into ruby classes that fit that UI.

Lets use rake to start managing the data. We’re going to be using some of the techniques that were outlined in the using rake for dataflow programming and data science post. First step is to create a lib/tasks/github.rake file that we’re going to put our tasks.

Be sure to change the HappyFunCorp to your organization, or use the repos call instead of the organization one.

Now lets run rake data/projects.json. If you run it a second time, notice that rake returns imediately and doesn’t hit the remote server.

The file task only runs if the file doesn’t exist.

Rake::Task["environment"].invoke is a way to ensure that a task as been run without forcing it to run.

The API calls are from our console experiments.

Just save it to a file.

OK, now lets be able to loop over everything to load the files that we want. First we define a method that lets us define a task to loop over all the entries in a JSON array, and then we’ll call it with our block which loads up the contents. (Add this to the end of the github.rake file)

deffor_each_elem(name,file)taskname=>filedoJSON.parse(File.read(file)).eachdo|record|yieldrecordendendendfor_each_elem"load_gemfiles","data/projects.json"do|repo|outfile="data/gemfiles/#{repo['name']}.Gemfile.lock"FileUtils.mkdir_p"data/gemfiles"fileoutfiledoRake::Task["environment"].invokegc=User.first.github_clientbegincontent=gc.contentsrepo['full_name'],path:'Gemfile.lock'File.openoutfile,"w"do|out|out.putsBase64.decode64content.contentendrescueOctokit::NotFoundputs"No Gemfile.lock found for #{repo['full_name']}"endendRake::Task[outfile].invokeend

And run it, rake load_gemfiles. Depending upon how many repos you have, this could take a few seconds. (Also make sure you’ve updated the organization!)

Define a file task for each output file, that we will invoke at the very end.

Inside the task, make sure that the environment is loaded.

Pull down the contents of the Gemfile.lock from the API.

If you run this a second time, notice that it only attempts to load from the files that weren’t loaded before.

For fun, delete the data directly and run the rake task again. BOOM!

Massaging the data into something usable

Get access to the API.

Open up a console and start playing with commands.

Start storing the data locally once we need to get more.

Play around with the data.

Design a UI around in.

Refactor the test “scripts” into ruby classes that fit that UI.

OK, now that we have all the data, lets figure out how to slice and dice it. Lets just wire together some standard UNIX tools to filter and get some info.

Running rake filter_gemfiles will go through and only show the specific gems that were locked out the Gemfile.locks. Obviously, filtering the file based on the fact that it has exactly 4 spaces isn’t robust, but it works.

I’m going to stop here, but in case you are wondering the top gems that we use are:

(82) json

(81) tzinfo

(81) i18n

(81) activesupport

(79) rack

(79) multi_json

(78) sass

(77) rack-test

(76) tilt

(76) mime-types

Repeatable data in 10 minutes

There’s lots of stuff you can do from there, the most likely one being “sending an email and forgetting about it.” But lets look at what we have.

The access key isn’t hard coded anywhere. When you come back to this, if it expires, you just reconnect on the website.

Way easier to get access keys this way, only a few oauth providers make this simple. (Twitter does, for example, github doesn’t.)

There’s a direct process transitioning from ‘playing around’ to automated.

Loading the data from the remote API is automated and repeatable. If you’ve setup the dependancies correctly, you can run the rake tasks and things magically get up to date.

If you do want to build a UI around this, you already have a webapp up and running…

Importantly, this is something that you can get up and going with in under 10 minutes, at least if you know how the API works. It takes less that 1 minute to get to the point where you have an authenticated client to the remote service and you can spend time exploring.

One of the reasons I like having seed around to help prototype and explore ideas!

See also

Adding social login to your sites really makes it easier to get users onboard. Devise is great to help get an authentication system up and running, but there are a few tricky things to get right. The first challenge is that you don’t always get the user’s email address when the first connect. The second challenge is that we want to request the minimum permissions first so that the user is more likely to sign up, and gradually ask more as the time arises.

The goal is to get features out fast, and iterate on them quickly. Does anyone care about it? What do they care about? How do we make it better?
As projects get bigger, both in terms of people using the site as well as people working on the site, testing and quality become relatively more important. Adding tests introduces drag, and the theory is that you invest now for payoffs later.

I’ve been using Rake more and more for data collection and processing tasks. Rake is pretty pretty powerful. Most people know it as way to add external tasks to a Rails app, but it’s actually very powerful build system. We’re going to take advantage of that to build out a framework that will make it easy to collect, process, and interpret data while keeping it all in sync.
In fact, if you just want to start playing with stuff now, head over to the rake-data site to go through some walk throughs.