Adventures in web development

beanstalkd, PHP workers, fires

99designs pushes four million background jobs through beanstalkd each day.
beanstalkd is a fantastic job queue which we’ve used for more than five
years, via the pheanstalk client which I wrote in 2008.

Each beanstalkd job has a TTR; a
timer which counts down during job processing. If TTR seconds elapse before
the worker finishes the job, beanstalkd assumes the worker is dead and releases
the job. Another one of our worker takes the job, despite the original worker
still churning away. Each iteration of this results in greater load and less
chance of this or any other job finishing. Eventually all the worker processes
are stuck, and everything literally catches fire.

That’s what happened when we began pushing ImageMagick and GhostScript jobs
to rasterize graphics. Some pathological EPS files took longer than the 600
second TTR, causing worker resource starvation.

Increasing the TTR would mitigate the issue, but these EPS files seem subject
to the halting problem. That leaves workers vulnerable to slow job
saturation.

Interrupting the image operation when the job hits its TTR would be a better
solution. But workers need concurrency to watch the TTR during the job. PHP
doesn’t do threads, except via an extension that I’m disinclined
to use. Using fork() would introduce IPC / signal handling
complexity, and prevent processes sharing the beanstalkd connection. PHP feels
like the wrong language to attack the problem.

cmdstalk

Lachlan and I decided we could kill N birds with a single stone. One:
solve the queue fires. Two: move another piece of our production
infrastructure to Go. Three: provide a beanstalkd layer which our PHP, Ruby
and Go apps could all use.

cmdstalk set out to harness the beanstalkd semantics we like on one end, and
talk standard unix processes on the other. This allows us to write workers in
any language. Here’s the basic model:

Connect to a beanstalkd server, watch one
or more tubes.

Pipe each job payload to a command specified by cmdstalk --cmd=… argument.

If the subprocess exits 0, delete the job; done.

If the subprocess exits non-zero, release the job for retry (with backoff).

If TTR elapses, kill the subprocess and bury the bad job.

Anything that can read stdin and exit(int) can be a cmdstalk worker — no
need for beanstalkd knowledge.

Go

Go has become the go-to language at 99designs for
infrastructure components. My only previous Go experience comes from
writing go6502, an 8-bit computer emulator. Fascinating, but
different to writing concurrent network applications. Despite that, building
cmdstalk with Go was a pleasure.

Starting from the cmdstalk entrypoint you’ll see broker and
cli packages loaded. cli/options.go demonstrates Go’s flag library for
argument parsing. broker_dispatcher.go
coordinates broker concurrency across tubes, and broker.go
is where the action happens. Broker.Run() is a clear candidate for
refactoring, but when workers are burning, software’s better shipped than
perfect.

Commit ade6f6b0 introduces a simple -all flag to watch all
tubes at start-up. 431ac5fc evolves it to poll for new tubes as
they’re created. The latter illustrates how well timers and concurrency come
together in Go. Together they show that it’s simple to add functionality that
would be complex in other languages.

Tests live alongside the code they’re testing, such as
broker_test.go alongside
broker.go. They’re regular Go code using if to make
assertions, but richer assertion libraries do exist.

Conclusion

cmdstalk applies a unix-process abstraction layer to beanstalkd job processing.
Like any abstraction it needs to make itself worthwhile.

If you’re running Rails, you might want to look at Sidekiq or
Resque, or maybe even delayed_job. If you’re 100% python, you
could wire together some solid libraries for job processing.

But if you need to process background jobs using several languages, some of
them poorly suited to long-running daemons and concurrency, cmdstalk may be for
you. Give it a try; feedback and pull requests are welcome.

In this series of guest blog posts, 99designs intern Daniel Williams takes us through how he has applied his knowledge of Machine Learning to the challenge of classifying Swiftly tasks based on what what customer requests.

The challenge

Swiftly is an online service from 99designs that lets customers get small graphic design jobs done quickly and affordably. It’s powered by a global network of professional designers who tackle things like business card updates and photo retouching in 30 minutes or less – an amazing turnaround time for a service with real people in the loop!

With time vital to the service value, any moment wasted in allocating a task to a designer with experience in the specific requirements could have a detrimental impact on the customers experience.

With the ultimate aim of complete and accurate automation of job to designer matching with the customer simply saying in their own terms what they need, we decided to apply machine learning to further develop Swiftly’s “Intelligent Matching System”.

This is part two of a three-part blog series. In part one we tried to determine the types of tasks. In this post, we use machine learning to classify tasks into these task categories. A future post will discuss using our predictions for task allocation.

Categories to predict

To set up a machine learning problem, we need to first decide on what we want the answers to be.
After the last post’s experimentation, I decided to split the classification into two parts: what type of document is to be edited or created, and what type of work is needed on the document.

This gives us 7 document types:

Logo

Business Card

Icon

Template (ppt / pdf / word etc)

Header / Banner / Ad / Poster

Social Media

Other Image

and 9 types of graphic design work appropriate for small tasks:

Vectorisation

Transparency

Holidays edit

Creative Update

Resize

Reformat

General Edit

Colour Change

Text Change

For example, one task might be Vectorisation on a Logo, another might be Text Change on a Business Card. In total, 63 different combinations of document and work type exist. This is what we’re trying to predict.

Obtaining training data

In my last post, I used unsupervised techniques that don’t need training data. Now that we have a specific outcome we’d like to predict, supervised methods are more appropriate. They use training data find patterns associated with each category, patterns that might be hard for humans to spot. For us, that training data will be a bunch of historical tasks and the correct categories for them.

However, obtaining good training data is a large problem in itself, especially given how many combinations of categories there are!

Mechanical Turk

Knowing how much work was involved, my first instinct was to outsource it to Amazon’s Mechanical Turk service. Mechanical Turk is named after an elaborate 18th century hoax that was exhibited across Europe, in which an automaton could play a strong game of chess against a human opponent. It was a hoax because it was not an automaton at all: there was a human chess player concealed inside the machine, secretly operating it.

Amazon calls its service Artificial Artificial Intelligence, and it is a form of ‘fake’ machine learning. We use software to submit tasks for classification, but real people all over the world get paid a little money to do the categorising for us.

Manual Classification

Unfortunately, the results I achieved from Mechanical Turk were poor. Even humans incorrectly classified many tasks, and this data, if fed into my machine learning classifier, would lead it to poor conclusions and low accuracy. The Turkers may have lacked some specialised knowledge about graphic design, or I may not have set up the Mechanical Turk task sufficiently well. (I wish I had read this post before diving into Mechanical Turk!)

Ultimately, having an accurate training set is perhaps the most important part of developing a good classifier. I rolled up my sleeves, and manually inspected and classified approximately 1200 Swiftly design briefs myself. This was slow and monotonous, but it meant that I knew I had an excellent quality training set.

Pre-processing Pipeline

Our classifier doesn’t accept raw text, but instead we must turn design briefs into features it can make decisions on. Human language is complicated, so there are many steps to go from text to features. Any good natural language system has such a pipeline. In ours, we:

Tokenise: split the text up into individual ‘words’

Remove punctuation and casing

Remove stop words (common words with no predictive power such as ‘a’, ‘the’)

The first four steps we covered in the last post, let’s go over steps 5 and 6 here.

Lemmatisation

Lemmatisation is similar to stemming. It’s the process of grouping related words together by replacing several variations with a common shared symbol. For example, Swiftly task descriptions often contain URLs. Lemmatisation of URLs would mean replacing every URL with a common placeholder (for example “$URL”). So the following brief:

On this business card, please change “www.coolguynumber1.com” to “www.greatestdude.org”

becomes:

On this business card, please change “$URL” to “$URL”

We do this because the number of words that occur in the data set is large, but many only occur once or twice. Nearly every URL we see in a brief will be unique. For our machine learner, it can only say something useful about words which are shared between different tasks, so all these unique words and URLs are wasted.

We do this because pre-processing involves generating a list of all the words that appear in the training dataset. However, words that only appear once in the dataset are removed because they add noise. URLs are generally unique and are unlikely to occur more than once. Without lemmatisation, we lose all information gained from the presence of URLs in a brief. With lemmatisation, we instead get the symbol “$URL” many times. If a URL in a task description turns out to be a discriminating feature, this should increase classification accuracy.

Please change the email on this business card from coolguy99@99designs.com to koolguy99@99designs.com. Can you also include a link to my website www.coolestguyuknow.net on the bottom? Please also change all the fonts to #CC3399 and the circle to #4C3F99. I want a few different business card sizes, namely: 400 x 400, 30 x 45 and 5600 by 3320. Thanks!

to:

Please change the email on this business card from $EMAIL to $EMAIL. Can you also include a link to my website $URL on the bottom? Please also change all the fonts to $CHEX and the circle to $CHEX. I want a few different business card sizes, namely: $DIM, $DIM and $DIM. Thanks!

Now URLs, email addresses, dimensions and so on can all take many different forms. The easiest way to match as many as possible is to use regular expressions. I used these patterns to perform my lemmatisation (for Python’s re module), you might find them useful too.

Bigrams

Previously I had worked with each word in the text individually (“unigrams”), but this often means words have no context. So, for example, “business card” was broken into “business” and “card”, and the importance of those words appearing together was lost. Bigrams are simply pairs of words that appear next to each other. So, if we include both unigrams and bigrams, the text “business card” would provide us the features “business”, “card” and “business card”. This captures more of the context of certain phrases. In our data, the top bigrams after stemming were:

bigram

frequency

would like

72

logo add

49

take exist

47

fun creativ

47

add fun

44

exist logo

44

busi card

33

transpar background

28

$URL $URL

24

creativ festiv

22

festiv element

22

bat pumpkin

21

spooki element

21

pumpkin skeleton

21

busi name

20

name logo

20

The pipeline in action

Let’s do a worked example using the sentence below:

Please change the email on this business card from coolguy99@gmail.com to koolguy99@gmail.com. Thanks!

Our pipeline first tokenises the sentence into words. Follow each word from left to right in the table below to see how it gets transformed by the pipeline.

Vectorisation

As discussed in the last post, we need to convert text into a numerical format. I used a simple model known as the bag-of-words vector space model. This model represents each document as a vector, a count of how many time each different word occurred in it. The vector will have n dimensions, where n is the total number of terms in the whole collection of documents. In the training dataset, there are 9186 tokens. Each brief is sparse – the vast majority of terms will have a count of 0.

Once the data set has been converted into vectors, it can be used to train a supervised learning algorithm.

Supervised Learning: Training the Classifier

Now that our data’s in the desired format, we can finally develop as system that learns to tell the difference between the various categories. This is called building a classifier model. Once the model has been built, new briefs can be fed into it and it will predict their category (called their label).

What we’ve discussed so far is getting labels and extracting features using our pipeline. But what algorithm should we use?

Multinomial Naive Bayes

I have chose to use the Multinomial Naive Bayes (“MNB”) classifier for this task. The Naive Bayes Wikipedia page does a good job of explaining the mathematics behind the classifier in detail. Suffice to say that it is simple, computationally efficient and has been shown to work surprisingly well in the field of document classification.

A (simplified) worked Example

A simplified way of thinking about how the algorithm works in the context of document classification is:

For each token in the total training dataset, what is the probability of that token being associated with each class?

For each token in a particular brief, add up the probabilities of each class for each token

pick the class with highest probability.

So, say we have the following probabilities (after laplacian smoothing and normalisation) for the tokens from our earlier example occurring in each category type:

Token Name

Other Image

Header / Banner / Ad/ Poster /Flier

Logo

Business Card

Template work (ppt / pdf /word etc)

Icon

Social Media

card

0.00019

0.00322

0.00257

0.0155

0.00021

0.00055

9e-05

busi

0.00038

0.00154

0.00325

0.00915

0.00021

0.00048

0.00037

busi card

6e-05

0.00055

0.00174

0.00904

0.00021

0.00048

9e-05

chang

0.00275

0.00445

0.00416

0.00525

0.00064

0.00159

0.00028

file

0.00596

0.00395

0.00649

0.00525

0.00245

0.00408

0.00241

logo

0.00096

0.0054

0.0266

0.00513

0.00075

0.00512

0.00408

need

0.00832

0.00672

0.00717

0.00478

0.00139

0.00623

0.00232

attach

0.00467

0.00622

0.00364

0.00414

0.0017

0.00484

0.0012

updat

0.00013

0.00104

0.00079

0.00391

0.00032

0.00042

9e-05

$EMAIL

0.00019

0.00073

0.0002

0.00373

0.00032

0.00014

0.00028

Given the the brief:

update the logo on my business card

We would match up each token with it’s probabilities in the table above, giving us the following table. Adding up each column would then give us a score for that class.

Token name

Other Image

Header / Banner / Ad / Poster / Flier

Logo

Business Card

Template work (ppt / pdf /word etc)

Icon

Social Media

card

0.00019

0.00322

0.00257

0.0155

0.00021

0.00055

9e-05

busi

0.00038

0.00154

0.00325

0.00915

0.00021

0.00048

0.00037

busi card

6e-05

0.00055

0.00174

0.00904

0.00021

0.00048

9e-05

logo

0.00096

0.0054

0.0266

0.00513

0.00075

0.00512

0.00408

updat

0.00013

0.00104

0.00079

0.00391

0.00032

0.00042

9e-05

sum:

0.00172

0.0118

0.035

0.0427

0.0017

0.00705

0.00472

Business card has the highest score, and so that is our prediction. Simple! The mathematics is a little more sophisticated than this, but the intuition behind it is the same.

Classifier Structure

Now, we have two types of classes to predict, document type and task type. I decided to build the machine learning classifier structure reflect this. A top level classifier which predicts the document type (logo, business card, etc), trained using the full dataset. Then we have a separate specialised classifier for each document type which will predict the task category. So, we will have a classifier just for working out the task type for business card cases, trained only on those cases.

The training and classification is summarised in these handy diagrams.

Classifier Training

Classification

Results

Are we getting good predictions?

To see whether our algorithm is, in fact, learning with experience, we can plot a learning curve. This tells us both how the classifier is doing, and how helpful more data would be. To test this, I plotted the 10-fold cross-validated accuracy of the top-layer classifier as the training set size is increased:

It looks like our machine is learning! The more data it sees, the better it gets at picking out the correct category. It looks as though accuracy may flatten off at about 80%. This suggests that to do better, we’d need to find new features instead of just collecting more cases. The sub-classifiers, as a result of the classifier structure, have less data to work with in the training set. However, they appeared to follow a similar learning curve.

Accuracy of various implementations

Over the course of my experiments, I tested the accuracy of a variety of implementation and algorithms. For those interested in the details, accuracy figures are below.

Classifier Type / Algorithm Type

MNB

NB

Baseline

Specialised Sub-Classifier:

Top Level Classifier

78.62 %

60.17 %

36.33 %

Sub-Classifier

69.46 %

61.54 %

32.97 %

Combined accuracy

54.61 %

37.03 %

11.97 %

Generalised Sub-Classifier:

Top Level Classifier

78.62 %

60.17 %

36.33 %

Sub-Classifier

59.97 %

50.95 %

24.13 %

Combined accuracy

47.15 %

30.66 %

8.77 %

Single Classifier:

Accuracy

45.58 %

39.12 %

11.43 %

The “Specialised Sub-Classifier” is the implementation we discussed above, whereas the “Generalised Sub-Classifier” used a single classifier to look at task type, rather than one per document type. The “Single Classifier” tries to hit both targets at once, classifying against the full set of 63 category combinations. I also compared multinomial naive bayes against naive bayes (NB) and a simple Zero-R baseline.

Wrapping up

The two-tier classifier approach worked the best, picking the document type correctly nearly 80% of the time, but getting both document and task type right only 55% of the time. The Multinomial Naive Bayes also did better than Naive Bayes on this task, as expected.

Next Time

Next time, I will be discussing the how this system can be applied to assist with the next stage of the customer to designer matching process. How do we figure out which categories a particular designer may be good at? And how do we make sure that designer gets those tasks?

About Daniel

Daniel Williams is a Bachelor of Science (Computing and Software Science) student at the University of Melbourne and Research Assistant at the Centre for Neural Engineering where he applies Machine Learning techniques to the search for genetic indicators of Schizophrenia. He also serves as a tutor at the Department of Computing and Information Systems. Daniel was one of four students selected to take part in the inaugural round of Tin Alley Beta summer internships and he now works part-time at 99designs. Daniel is an avid eurogamer, follower of “the cricket”, and hearty enjoyer of the pub.

Here at 99designs we’re what you’d call a polygot shop – we’ve got a mix of PHP, Ruby, Python, and Go in production. When we say production, we mean at serious scale. Our mission is to connect the world with great graphics designers wherever they are, something which we do quite a bit of.

Right now we’re on a hunt for a developer who can Help Us Out™. Usually we advertise for a generalist “web developer” and then find the right place for them internally based on their strengths. This time we’re trying to hire a very specific skill set for a very specific project. The skills are Ruby and Rails, and the project is building out our new payments service.

Company wide we’re transitioning to having small, decentralised teams with their own product lines and the attendant SOA/Platform to support that goal. Last year we had great success with creating our single sign-on system in Go, and this year we’re rounding out the platform with a shared payments system in Rails*.

This new service will enable us to spin up new product lines or move into new international markets quickly. Between the iterative approach we’re taking to replace our old payments system and the UX for both the customers using the service and the developers integrating it there are some exciting and interesting problems to solve on this project.

The existing team on project are very strong developers with good knowledge of the problem space but not a lot of Rails experience. We need a mid to senior developer to come in and help “set the tone” of the codebase. That role had been filled within the team by me (John Barton, internet famous as “angry webscale ruby guy”), but I’ve since been promoted to manage the engineering team as a whole and between all the meetings and spreadsheets it’s hard to keep up the pace of contribution that this project deserves.

You’ll need to be the diesel engine of the team: churn through the backlog turning features into idiomatic and reliable Rails code at a steady cadence. There are opportunities to coach within the team, but even just creating a sizeable body of code to be an example of “this is how we do it” (cue Montell Jordan playing https://www.youtube.com/watch?v=0hiUuL5uTKc) will keep this project on track.

The quality of the codebase after 3 months of progress is high. We don’t believe in magic make-believe numbers here, but right now we’re sitting on a code climate GPA of 4.0. If you’re a fan of Sandi Metz’s Practical Object Oriented Design in Ruby or Avdi Grimm’s Objects on Rails you will feel right at home in this codebase.

If this is something you’re interested in and think you can help us out with, check out the job ad

*You may be wondering “why not go?” for this system. The short answer is that there’s enough complexity in the business rules that the expressiveness of Ruby is very useful, and being a financial project moving numbers around in a database is very important and ActiveRecord is more mature that any of the ORMs available in Go right now. I’m happy to elaborate on our line of thinking during your interview ;-)

At 99designs we heavily (ab)use Varnish to make our app super fast, but also to
do common, simple tasks without having to invoke our heavy-by-contrast PHP
stack. As a result, our Varnish config is pretty involved, containing more than
1000 lines of VCL, and a non-trivial amount of embedded C.

When we started seeing regular segfaults, it was a pretty safe assumption that one of
us had goofed writing C code. So how do you track down a transient segfault in a system like Varnish? Join us down the rabbit hole…

Get a core dump

The first step is to modify your production environment to provide you with
useful core dumps. There are a few steps in this:

First of all, configure the kernel to provide core dumps by setting a few sysctls:

Creates a place to store cores on AWS’s ephemeral storage (if like us you’re on EC2)

Tells the kernel to write core files out there

With this done, and no known way to trigger the bug, play the waiting game.

When varnish explodes, it’s show time. Copy the core file, along with the shared
object that varnish emits from compiling the VCL (Located in
/var/lib/varnish/$HOSTNAME) over to a development instance and let the debugging begin.

Locate the crash point

If you have access to the excellent LLDB from the LLVM project, use that. In our case, getting it to
work on Ubuntu 12.04 involves upgrading half the system, resulting in an
environment too dissimilar to production.

If you spend a lot of time in a debugger, you’ll probably want to use a helper
like fG!’s gdbinit or voltron to make your life
easier. I use voltron, but because of some of the clumsiness in gdb’s API,
immediately ran into somebugs.

Finally, debugging environment working, it’s time to dig into the crash. Your situation is going to be different to ours, but here’s how we went about debugging a problem like this recently:

Debugging the core dump with voltron

As you can see in the [code] pane, the faulting instruction is mov
0x0(%rbp),%r14, trying to load the value pointed to by RBP into r14.
Looking in the register view we see that RBP is NULL.

Inspecting the source, we see that the faulting routine is inlined, and that
the compiler has hijacked RBP (The base pointer for the current stack frame) to
use as argument storage for the inline routine

We now know that the fault is caused by a mapelm struct with a bits member
set to zero; but why are we getting passed this broken struct with garbage in
it?

Digging in deeper

Since this function is declared inline, it’s actually folded into the calling
frame. The only reason it actually appears as in the backtrace is because the
callsite is present in the DWARF debugging data.

We can poke at the value by inferring its location from the upstream assembly,
but it’s easier to jump into the next upstream frame and inspect that:

The code is trying to generate a pointer to this arena run structure, using the
number of bits in the mapelm struct, AND against the inverse pagesize_mask to
locate the start of a page. Because bits is zero, this is the start of the
zero page; a NULL pointer.

This is enough to see how it’s crashing, but doesn’t give us much insight for why. Let’s go digging.

Looking back at the code snippit, we see an assertion that the arena_run_t
structure’s magic member is correct, so with that known we can go looking for
other structures in memory. A quick grep turns up:

1

./lib/libjemalloc/malloc.c:#defineARENA_RUN_MAGIC0x384adf93

pagesize_mask is just the page size -1, meaning that any address bitwise AND
against the inverse of the pagesize_mask will give you the address at the
beginning of that page.

We can therefore just search every writable page in memory for the magic number
at the correct offset.

The magic number and magic member of the struct (Conveniently located as the
first 4 bytes of each page) only exists if we’ve got a debug build.

Aside: can we abuse LD_PRELOAD for profit?

At this point all signs point to either a double free in varnish’s thread pool implementation, leading to an empty bucket (bits == 0), or a bug in its memory allocation library jemalloc.

In theory, it should be pretty easy to rule out jemalloc, by swapping in another malloc library implementation. We could do that by putting, say tcmalloc, in front of its symbol resolution path using LD_PRELOAD:

We’ll add:

1

export LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.0

to /etc/varnish/default and bounce varnish. Then move all the old core files out of the way, wait (and benchmark!)

However, there’s a flaw in our plan. Older versions of varnish
(remember that we’re on an LTS distribution of Ubuntu) vendor in a copy of
jemalloc and statically link it, meaning that the symbols free and malloc
are resolved at compile time, not runtime. This means no easy preload hacks for us.

Rebuilding Varnish

The easy solution won’t work, so let’s do the awkward one: rebuild varnish!

1

apt-get source varnish

Grab a copy of the varnish source, and link it against tcmalloc. Before that
though, I deleted lib/libjemalloc and used grep to remove every reference to
jemalloc from the codebase (Which was basically just some changes to the
configure script and makefiles)

and then add -ltcmalloc_minimal to CFLAGS before building. As an aside, the
ubuntu packages for tcmalloc ship /usr/lib/libtcmalloc_minimal.so.0 but not
/usr/lib/libtcmalloc_minimal.so, which means the linker can’t find them. I
had to manually create a symlink.

With this new varnish in production, we haven’t yet seen the same crash, so it
appears that it was a bug in jemalloc, probably a nasty interaction between
libpthread and libjemalloc (The crash was consistently inside thread
initialization).

Try it yourself?

Let’s hope not. But if you do a lot of Varnish hacking with custom extensions, occasional C bugs are to be expected. This post walked you through a tricky Varnish bug, giving you an idea of the tools and tricks around debugging similar hairy segfaults.

If you’re messing around with voltron, you might find my voltron config and the tmux script I use to setup my environment a useful starting point.

In this series of guest blog posts, 99designs intern Daniel Williams takes us through how he has applied his knowledge of Machine Learning to the problem of classifying Swiftly tasks.

Introduction

Swiftly is an online service from 99designs that lets customers get small graphic design jobs done quickly and affordably. It’s powered by a global network of professional designers who tackle things like business card updates and photo retouching in 30 minutes or less – an amazing turnaround time for a service with real people in the loop!

Given that we have a pool of designers waiting for customer work, how can we best allocate them tasks? Currently we take a naive but fair approach: assign each new task to the designer that has been waiting in the queue the longest. But there’s room for improvement: designers excel at different types of tasks, so ideally we’d match tasks to designers based on expertise. To do this we need to be able to categorise tasks by the skills they require.

In today’s approach, we’ll try to solve the problem with machine learning. The first step is to find a way to automatically categorise a design brief, with categories forming our “areas of expertise”. The next will be figuring out what categories a particular designer is good at. If we can build solid methods for both these two steps, we can begin matching designers to tasks.

In this post, I’ll introduce the problem and walk through some attempts at applying unsupervised techniques for discovering task categories. Follow along, and you may recognise a similar situation of your own that you can apply these methods to.

Swiftly tasks

Swiftly tasks are meant to be quick to fire off and highly flexible. The customer fills in a short text box saying what they want done, uploads an image or two, and then waits for the result. This type of description, plain text and raw images, is highly unstructured. Since image recognition and indexing is its own hard problem, we’ll skip the images for now and focus on the text.

Here’s a couple of examples:

Task A

Remove the man’s glasses.

Make the man’s face MORE HANDSOME.

Task B

In my logo, there is a “virtual” flight path of an airplane. I have had comments that the virtual flight path goes into the middle of the Pacific Ocean for no reason - not a logical graphic. I want you to “straighten” out the flight path - as shown on the Blue lines in the attached PDF titled “Modified_Logo.PDF.” I still want the flight path lines to be in white, with black triangles separating the segments. I just want the segments to be straighter and not go over the ocean as in the original. Please contact me for any clarification. I am uploading the EPS and AI files as well to make the change. Thank you!

How might a human classify these tasks? I would probably classify the first as “image manipulation” and the second as “logo refresh,” although the second could just as easily also be “image manipulation” as well. Already you can see that classifying these sorts of tasks into concrete categories is perhaps going to be more art than science.

Figuring out the categories

The first major problem is deciding on a sensible set of categories. This has turned out to be more difficult than I first imagined. Customers use Swiftly for a wide range of tasks. Plus, there’s quite a bit of overlap — one Swiftly task is sometimes a combination of multiple small tasks. My initial approach, just to get a feel for the data, was to eyeball 100 task briefs and attempt to invent categories and classify them manually. The result of this process:

Category

Number of Tasks

Logo Refresh (Holidays)

34

Logo Refresh

11

Copy Change

11

Vectorise

13

Resize/Reformat

17

Transparency

1

Image Manipulation

10

Too hard to classify

3

A large number of the instances were hard to classify, even for a human! I was not 100% happy with the categories that I came up with, with many tasks not fitting comfortably in the buckets. I decided to apply some unsupervised machine learning techniques in any attempt to cluster design briefs into logical groups. Can a machine do better?

Unsupervised clustering

I explored software called gensim, an unsupervised natural language processing and topic modelling library for Python. Gensim comes equipped with various powerful topic modeling algorithms, which are capable of extracting a pre-specified number of topics and associating words with those topics. It also helps with converting a corpus of documents into various formats (e.g. vector space model). The main algorithm that I made use of is called Latent Dirichlet Allocation. The first step is converting the text corpus into a model that allows for the application of mathematical operations.

The vector space model

To apply mathematical-based algorithms to natural language, we need to convert language into a mathematical format. I used a simple model known as the bag-of-words vector space model. This model represents each document as a vector, where each dimension of the vector corresponds to a different word. The value of a word in a particular document is just the number of times it appeared in that particular document. The vector will have n dimensions, where n is the total number of terms in the whole collection of documents. Let’s try an example.

Say we have the following collection of documents:

The monster looked like a very large bird.

The large bird laid very large eggs.

The monster’s name was “eggs.”

After finding all the unique words (“the,” “monster”, etc.) and assigning them an index in the vector, we can count those words in each document to turn each document into a word frequency vector:

(1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0)

(1, 0, 0, 0, 0, 1, 2, 1, 1, 1, 0, 0)

(1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1)

Corpus pre-processing

If you just split your text into words on whitespace and apply this naively, the results can be messy. On the one hand text contains punctuation we want to ignore. On the other, this is going to work best when we have lots of words in common between the documents. Do we really want to treat “Egg”, “egg” and “eggs” as different words? To get the best results, you deal with these kinds of problems in a pre-processing step.

Stemming is the process where words are reduced to their “stem” or root format, basically chopping any variation off their end. For example, the words “stemmer,” “stemming” and “stemmed” would all be reduced to just “stem”. I used the nltk implementation of the snowball stemmer to perform this step. All of these steps can be performed very easily in Python:

This process reduces the noise in the vector space model, because tokens that mean the same thing are assigned the same token (through stemming and punctuation and caps normalisation) and words that probably do not add any meaning are removed (through stop word removal). Eventually, I expected the pre-processing steps to be much more in depth, but for now this should get us started.

Latent Dirichlet Allocation (LDA)

LDA is an algorithm developed to automatically discover topics contained within a text corpus. Gensim uses an “online” implementation of LDA, which means that it breaks the documents into chunks and regularly updates the LDA model (as opposed to batch which processes the whole corpus at once). It is a generative probabilistic model that uses Bayesian probabilities to assign probabilities that each document in the corpus belongs to a topic. Importantly, the number of topics must be supplied in advance. Since I did not known how many topics might exist, I decided to apply LDA with varying numbers of topics. For example, if we did an LDA with 5 topics, the result for a single document might look like this:

1

[(0,0.0208),(1,0.549),(2,0.0208),(3,0.366),(4,0.0208),(5,0.0208)]

Which means LDA places that document 2% in topic 0, 55% in topic 1, 20% in topic 2 and so on. For the simple analysis I am doing, I just want the best guess topic. We can convert the result from probabilistic to deterministic by just picking the best guess.

1

max(x,key=lambdalda_result:lda_result[1])

Much of my approach in the following segments is based on Gensim’s author’s LDA guides.

Pre-processing for LDA

I extracted ~4400 job descriptions from the Swiftly database. I removed formatting of each, and applied the pre-processing steps described above (tokenisation, stemming, stop word removal etc.). The result was a plain text file, with each pre-processed Swiftly job on a new line, like this:

I then used the gensim tools to create the vector model required for LDA. On the recommendation of the gensim authors, I also removed all tokens that only appeared once. The doc2bow function used in the MyCorpus class below converts the document into the vector space format discussed above.

12345678910111213141516171819202122232425262728293031

fromgensimimportcorpora,models,similarities# pre-process swiftly jobs, each job on a newlineCORPUS="StemmedStoppedCorpus.txt"defcorpus():forlineinopen(CORPUS):yielddictionary.doc2bow(line.split())# create dictionary mapping between text and idsdictionary=corpora.Dictionary(line.split()forlineinopen(CORPUS))# find words that only appear once in the entire doc setonce_ids=[tokenidfortokenid,docfreqindictionary.dfs.iteritems()ifdocfreq==1]# remove once wordsdictionary.filter_tokens(once_ids)# "compactify" - removes gaps in ID mapping created by removing the once wordsdictionary.compactify()# save dictionary to file for future usedictionary.save("swiftly_corpus.dict")# create a corpus objectswiftly_corpus=MyCorpus()# store to disk, for later usecorpora.MmCorpus.serialize("swiftly_corpus.mm",swiftly_corpus)

Regarding the above code, the MM file is a file format known as Matrix Market format, which represents a matrix of sparse vectors. The dictionary file above simply maps the word_id integers that are used in the MM format to the actual word each id represents.

Applying LDA

Now that the corpus has been stored as a matrix of vectors, we can apply the LDA model and start clustering the Swiftly jobs. This is done with the following lines of code. We can generate different models by changing the num_topics argument in the ldamodel.LdaModel() function.

The number before each token represents how discriminating that token is for the category. Ideally, by eyeballing the discrimiating tokens for a topic we could understand and identify it, giving it a useful name. As you can see, this proved to be difficult. I suspected that there are probably more than six unique categories of tasks on Swiftly, so I run LDA with N_TOPICS set to different numbers. With 15 (this time just top 10 words, without numbers, formatted into a table for easier comprehension), the results are:

TOPIC1

TOPIC2

TOPIC3

TOPIC4

TOPIC5

TOPIC6

TOPIC7

TOPIC8

TOPIC9

TOPIC10

TOPIC11

TOPIC12

TOPIC13

TOPIC14

TOPIC15

imag

need

element

tree

yellow

creativ

celebr

file

like

chang

festiv

logo

need

name

x

file

imag

exist

snow

use

take

logo

background

look

color

pdf

card

attach

follow

cover

pictur

attach

etc

santa

view

add

decor

logo

snowflak

blue

send

christma

page

busi

photo

like

size

logo

thanksgiv

new

fun

make

holiday

logo

code

file

use

imag

logo

like

line

file

icon

leav

servic

logo

etc

need

would

red

need

font

text

chang

would

high

word

halloween

gold

replac

pumpkin

word

vector

color

font

back

attach

websit

incorpor

look

resolut

2

app

outlin

team

spooki

possibl

transpar

want

dark

page

like

px

compani

suppli

photoshop

make

add

make

super

bat

add

white

make

green

digit

creat

pictur

card

73

layer

logo

like

fall

feel

skeleton

text

png

someth

match

psd

file

use

line

websit

hand

1

theme

turkey

color

offer

bit

ai

font

panton

version

busi

photo

replac

templat

At this point, I realised that more pre-processing would be requried to get this right. For instance, it seemed strange that in topic 15 the most discriminating word is ‘x’. Looking closer, I realised that this is because topic 15 represents a resize / reformatting job brief. The ‘x’ gets picked out because a large number of customers are specifying dimensions (e.g. 200px x 500px). I was also surprised to find out that ‘73’ was so discriminating, but a little bit of digging revealed that a twitter profile picture is 73x73 pixels. To address this problem, I plan to use a preprocessing step called Lemmatisation.

Lemmatisation is useful for grouping things like numbers, colours, URLs, email addresses and image dimensions together so that different values are treated equally. For example, if there is a specific colour mentioned in a brief, we don’t really care what the specific colour is—we just care that the brief mentions a colour. In our case, we believe that a brief containing a colour (e.g. #FF00FF) or image dimensions (e.g. 400x300) might give us clues about what type of task it is so we convert anything that looks like these to the tokens $COLOUR and $DIM.

Despite the shortcomings of my pre-processing, this clustering task has picked out some interesting topics! Some, as is probably inevitable, are “junk topics”. Further, seasonal words seem to appear in lots of topics, which is a strange result. Despite this, many of the topics are classifiable. Topic 5 was interesting, where ‘yellow’ was such a discriminating term. A very quick (and non-scientific) review of the data suggests that people often do not like the colour yellow (I agree with them!) and want it changed. An attempt to name the topics from the table above:

Topic 1: Change an image so it’s in higher resolution

Topic 3: Change or create a logo or icon, perhaps for a smartphone app

Topic 4: Edits of a seasonal nature (Christmas, Thanksgiving)

Topic 5: Replace yellow (?!)

Topic 6: Halloween edits

Topic 8: Vectorisation task, e.g. “take this png file, turn it into a vector on a transparent background”

Topic 10: Change a colour in some way, often a font. “Panton” is a stemmed form of “pantone”, a popular colour chart

Topic 14: Change copy or update information on a business card

Topic 15: Resize or reformat a photo, often for social media purposes

Having to provide the number of topics to LDA, before you even know what’s reasonable, feels like a chicken-and-egg problem. It’s possible to try different numbers of topics and eyeball the results, but at times it felt a bit too much like guesswork. Nevertheless, I view these results as a decent “proof of concept”. It’s reassuring that a computer can find categories like this, and suggests that with more tweaking and a nicely labelled dataset, the job of automatically classifying Swiftly task briefs is entirely possible!

Next time…

That wraps up my experiments with unsupervised classification for this post. Next time, I plan to discuss my efforts after I settle on the Swiftly categories. I’d like to develop a nice labelled training data set (most likely using Amazon’s Mechanical Turk service), and then experiment with supervised machine learning techniques. I will also detail my efforts at a developing a more sophisticated pre-processing procedure. Tune in!

About Daniel

Daniel Williams is a Bachelor of Science (Computing and Software Science) student at the University of Melbourne and Research Assistant at the Centre for Neural Engineering where he applies Machine Learning techniques to the search for genetic indicators of Schizophrenia. He also serves as a tutor at the Department of Computing and Information Systems. Daniel was one of four students selected to take part in the inaugural round of Tin Alley Beta summer internships. Daniel is an avid eurogamer, follower of “the cricket”, and hearty enjoyer of the pub.

We recently replaced most of our image resizing code with Thumbor, an
open-source thumbnailing server. This post describes how and why we migrated to
a standalone thumbnailing architecture, and addresses some of the challenges we
faced along the way.

Background

Historically, 99designs has largely been powered by a monolithic PHP
application. Maintaining this application has become increasingly difficult as
our team and codebase grow. One cause of this difficulty is that the application
contains a lot of incidental functionality—supporting code that isn’t the
core purpose of the application, but which is necessary for its operation.

As such, we set ourselves a technical goal in 2013 to migrate to a more
service-oriented architecture. This means breaking big masses of functionality
into discrete services and libraries that do one thing well. Such a design tends
to yield smaller, more cohesive services, and provides natural lines along which
our team can subdivide.

Image thumbnailing is a generic function required by many graphics-intensive
websites, and a prime candidate for extraction into a standalone service.

Thumbnails at 99designs

Our 230,000+ strong designer community uploads a new image to 99designs every ~6
seconds. We serve several thumbnail variations of these images across the site.

Our thumbnailing solution needs to scale to serve our production traffic load.
The approach we’ve used until recently has been to generate thumbnails
ahead-of-time using asynchronous task queues. Every time a designer
uploads an image, we kick off a task that generates thumbnails of that image and
stores them in S3:

If a thumbnail request arrives while the task is generating the thumbnail, we
serve a placeholder image:

Once the thumbnailing task finishes, we can serve the resized images:

This architecture has served us pretty well. It keeps response times low and
scales nicely, but it has a few shortcomings:

We’ve intertwined the image resizing logic with our PHP application. Other
apps in our stack have to implement their own resizing.

It’s not the simplest solution. There’s quite a bit of complexity: deduping
resize tasks, using client-side polling to check if a resize operation has
completed, etc.

We can only serve thumbnails at predefined sizes. If we decided to introduce
a new thumbnail size, we’d have to generate that thumbnail for tens of
millions of existing images.

A better solution is to create a separate, simpler thumbnailing service that
any application in our stack can use.

Thumbor overview

Enter Thumbor. Thumbor is an open-source thumbnail server developed
by the clever people behind globo.com. Thumbor resizes
images on-demand using specially constructed URLs that contain the URL of the
original image and the desired thumbnail dimensions, e.g.:

In this example, the Thumbor server at thumbor.example.com fetches
llamas.jpg from images.example.com over HTTP, resizes it to 320x240 pixels,
and streams the thumbnail image data directly to the client.

At face value this seems less scalable than our previous task-based solution,
but some careful use of cacheing ensures we only do the resize work once per
thumbnail.

New architecture

The high-level thumbnailing architecture now looks like this:

Our applications generate URLs that point to a Thumbor server (via a CDN). The
first request for a particular thumbnail blocks while Thumbor fetches the
original image and produces the resized version. We set long cache expiry times
on the resulting images, so they’re effectively cached forever. The CDN serves
all subsequent thumbnail requests.

We put a cluster of Thumbor servers behind an elastic load balancer to cope with
production traffic. This also gives redundancy when one of the servers dies.

The resulting architecture is very simple, and our image-resizing capability is
neatly encapsulated as a standalone service. This means we avoid the need to
re-implement thumbnailing in each of our applications—all that’s needed is
a small client library to produce Thumbor URLs.

Usage example

We created Phumbor to generate Thumbor URLs in PHP applications.
Here’s how you might implement a Thumbor view helper:

Implementation strategy

We used a couple of complementary techniques to test Thumbor’s capabilities
before committing to its use in production.

Firstly, we used feature-flipping to selectively enable Thumbor URLs
for certain users. Initially we used this to let developers click around the
site and check that Thumbor was generating thumbnails correctly.

Secondly, we used asynchronous tasks to simulate a production traffic load on
the Thumbor service. Every time an app server handled a thumbnail request, we
enqueued a task that requested that same thumbnail from the new Thumbor service.
This allowed us to check performance of the service without risking a disruption
to our users.

Finally, we used our feature-flipping system to incrementally roll out Thumbor
thumbnails to all our users. This worked better than immediately pointing all
traffic at the Thumbor service, which tended to cause a spike in response times.

Thumbor configuration

Some of our Thumbor configuration settings differ from the recommended defaults.
We tweaked our configuration in response to our performance measurements.

Thumbor ships with a number of imaging backends; the default and recommended
backend is PIL. Our testing shows that the OpenCV backend is much faster (i.e.
3-4x faster) than PIL. Unfortunately, OpenCV can’t resize GIFs or images with
alpha transparency. As a result, we implemented a simple multiplexing backend
that delegates to OpenCV wherever possible and falls back to PIL in the
degenerate case.

Generally we’ve found that Thumbor is quite stable, and expect it to further
mature as more people use it and make improvements.

Conclusion

Our Thumbor service now serves all design entry thumbnails for our main PHP
application. The resulting architecture is much simpler and the service is
usable by other applications in our stack. We’ll continue to use Thumbor
in future apps we develop, and look for more opportunities to simplify our
codebase by progressively adopting a more service-oriented architecture.

Two years ago, 99designs had localized sites for a handful of English speaking countries, and our dev team had little experience in multilingual web development. But we felt that translating our site was an important step, removing yet another barrier for designers and customers all over the world to work together. Today we serve localized content to customers in 18 countries, across six languages. Here’s how we got there, and some of the road blocks we ran into.

Starting local

The most difficult aspect to internationalizing is language, so we started with localization: everything but language. In particular, this means region-appropriate content and currency. A six-month development effort saw us refactor our core PHP codebase to support local domains for a large number of countries (e.g. 99designs.de), where customers could see local content and users could pay and receive payments in local currencies. At the end of this process, each time we launched a regional domain we began redirecting users to that domain from our Varnish layer, based on GeoIP lookups. The process has changed little since then, and continued to serve us well in our recent launch in Singapore.

Languages and translation

With localization working, it was time to make hard decisions about how we would go about removing the language barrier for non-English speakers (i.e. the majority of the world).There were a lot of questions for us to answer.

What languages will we offer users in a given region?

How will users choose their language?

How will we present translated strings to users?

How will strings be queued for translation?

Who will do the translation?

What languages to offer?

Rather than making region, language and currency all user selectable, we chose to restrict language and currency availability to a user’s region. This was a trade-off which made working with local content easier: if our German region doesn’t support Spanish, we avoid having to write Spanish marketing copy for it. Our one caveat was for all regions to support English as a valid language. As an international language of trade, this lessens any negative impact of region pinning.

Translating strings

There were two main approaches we considered for translation: use a traditional GNU gettext approach and begin escaping strings, or else try a translation proxy such as Smartling. gettext had several advantages: it has a long history, and is well supported by web frameworks; it’s easily embedded; and translations just become additional artifacts which can be easily version controlled. However, it would require a decent refactoring of our existing PHP codebase, and left open issues of how to source translations.

In Smartling’s approach, a user’s request is proxied through Smartling’s servers, in turn requesting the English version of our site and applying translations to the response before the user receives it. When a translation is missing, the English version is served and the string is added to a queue to be translated. Pulling this off would mean reducing substantially the amount of code to be changed, a great win. However, it risked us relying on a third-party for our uptime and performance.

In the end, we went with Smartling for several reasons. They provided a source of translators, and expertise in internationalization which we were lacking. Uptime and performance risks were mitigated somewhat by two factors. Firstly, Smartling’s proxy would be served out of the US-East AWS region, the same region our entire stack is served from, increasing the likelihood that their stack and ours would sink or swim together. Secondly, since our English language domains would continue to be served normally, the bulk of our traffic would still bypass the proxy and be under our direct control.

Preparing our site

We set our course and got to work. There was substantially more to do than we first realized, mostly spread over three areas.

Escaping user-generated content

Strings on our site which contained user content quickly filled our translation queue (think “Logo design for Greg” vs “Logo design for Sarah”). Contest titles, descriptions, usernames, comments, you name it, anything sourced from a user had to be found and wrapped in a <span class="sl_notranslate"> tag. This amounted to a significant ongoing audit of the pages on our site, fixing them as we went.

Preparing Javascript for translation

Our Javascript similarly needed to be prepared for translation, with rich client-side pages the worst hit. All strings needed to be hoisted to a part of the JS file which could be marked up for translation. String concatenation was no longer ok, since it made flawed assumptions about the grammar of other languages. Strings served through a JSON API were likewise hidden from translation, meaning we had to find other ways to serve the same data.

Making our design more flexible

In our design and layout, we could no longer be pixel-perfect, since translated strings for common navigation elements were often much longer in the target language. Instead, it forced us to develop a more robust design which could accommodate the variation in string width. We stopped using CSS transforms to vary the case of text stylistically, since other languages are more sensitive to case changes than English.

The wins snowball

After 9 months of hard work, we were proud to launch a German language version of our site, a huge milestone for us. With the hardest work now done, the following 9 months saw us launch French, Italian, Spanish and Dutch-language sites. Over time, the amount of new engineering work reduced with each launch, so that the non-technical aspects of marketing to, supporting and translating a new region now dominate the time to launch a new language.

The challenges

We also encountered several unexpected challenges.

Client-side templating

We mentioned earlier that the richer the client-side JS, the more work required to ensure smooth translation. The biggest barrier for us was our use of Mustache templates, which were initially untranslatable on the fly. To their credit, Smartling vastly improved their support for Mustache during our development, allowing us to clear this hurdle.

Translating non-web artifacts

It should be no surprise: translation by proxy is a strategy for web pages, but not a strong one for other non-web artifacts. In particular, for a long time translating emails was a pain, and in the worst case consisted of engineers and country managers basically emailing templates for translation back and forward. After some time, we worked around this issue by using Smartling’s API in combination with gettext for email translation.

Exponential growth of translation strings

Over time, we repeatedly found our translation queue clogged with huge numbers of strings awaiting translation. Many of these cases were bugs where we hadn’t appropriately marked up user-generated content, but the most stubborn were due to our long-tail marketing efforts. Having a page for each combination of industry, product category and city led to an explosion of strings to translate. Tackling these properly would require a natural language generation engine with some understanding of each language’s grammar. For now we’ve simply excluded these pages from our translation efforts.

The future

This has been an overview of the engineering work involved in localizing and translating a site like ours to other languages. Ultimately, we feel that the translation proxy approach we took cut down our time to market significantly; we’d recommend it to other companies who are similarly expanding. Now that several sites are up and running, we’ll continue to use a mix of the proxy and gettext approaches, where each is most appropriate.

We’re proud to be able to ship our site in multiple languages, and keen to keep breaking down barriers between businesses and designers wherever they may be, enabling them to work together in the languages in which they’re most comfortable.

I recently found myself wanting the features of the rails asset pipeline in my golang project at work. Since there isn’t much in the way of asset pipelining for golang yet, I built it. Turns out, sprockets is really easy to integrate. Here’s how you can go about setting it up for your project.

Assets in development

First things first - lets get it to the ‘it works on my machine’ stage. I’ve put together a sample repo using the asset pipeline, which you can use as a guide.

The setup for your app will be similar:

The assets folder contains your stylesheets, javascript, etc (this directory name is set in sprockets/environment.rb).

You’ll need a similar Rakefile to build assets (and maybe launch the server)

When your app starts up (in development), it should make a request to http://localhost:11111/assets/manifest.json, which provides a JSON hash linking asset names (e.g. “application.css”) to the relative URLs the compiled assets can be fetched from. To generate a link to an asset in your app, use the JSON hash you fetched to lookup the URL. For example, the URL for “application.css” this might look like http://localhost:11111/application-8e5bf6909b33895a72899ee43f5a9d53.css.

That should be all you need for development - you should be able to see SASS/Coffeescript assets compiled and loading normally. Hooray!

Assets in production

For production we want to pre-compile assets rather than regenerating them each time they change.

rake assets will create a ‘public’ folder containing ‘manifest.json’ (same format as before). Get this directory onto your production servers. git add -Af public/ will add it to source control if you deploy via git.

When generating a link to an asset, simply look up manifest.json from the filesystem rather than from HTTP.

Fin

If you’ve followed these steps, you’ll have a fullly functioning asset pipeline for your golang project. The whole thing, including deployment, took me well under a day to add to our app. The resulting assets are minified, concatenated, and gzipped (for size). They are also fingerprinted, so you can serve them with an unlimited cache lifetime and reap the benefits.

Although I set this up for golang, there’s nothing go-specific about it. The same technique works just as well for any language or framework without a mature asset pipeline. If you find yourself in need, just use this pattern and you can be up and running in no time.

At 99designs, we try to make sure we’re always fixing bugs as well as writing
code. It can be easy to neglect bugs when you’re busy churning out new features.

We use GitHub issues to track bugs in our various applications. GitHub issues
integrate well with our codebase, commits and pull requests, but the reporting
facilities are a bit limited.

As our team grows, it’s become increasingly important for us to be able to
answer key questions about bugs, including:

How many bugs are currently open?

Have we each remembered to spend time working on bug fixes this sprint?

Are we closing more bugs than we’re opening?

To help answer these questions, a few of our team spent a number of
hack days
implementing a bug dashboard named GitHub Survivor.

Unlike the similarly-named reality TV show, GitHub Survivor doesn’t feature
eliminations, gruelling physical challenges, or Jeff Probst. However, it does
pit developers against one another — in a light-hearted way.

We display GitHub Survivor on a big screen in the office, where all the team can
see it. We’ve found it helps keep our minds on bugs — it reminds us to
make a small effort every sprint, gradually bringing the bug count closer to
zero.

A bug leaderboard occupies the bulk of the screen. It shows who’s closed the
most bugs this sprint (may they be laden with Praise and Whisky!) and who’s
forgotten to spend some time fixing bugs (may they toil in the maintenance of a
thousand Malbolge programs!).

There are charts showing the number of bugs opened and closed in recent sprints,
the open bug count over time, and a big indicator showing the current open bug
count.

The source is available for you
to inspect and adapt to your needs. Please try it out, make improvements and
contribute them back! We hope you find it useful.

We’re passionate about building high-quality software at 99designs, and this
is just one way we measure whether we’re doing a good job of that. If you’re
similarly interested in building cool things in an awesome environment,
check out our open positions!

Cool, so what else can we do with this? It’s trivial to define a method with a space in its name, and calling it isn’t terribly difficult:

12345678

1.9.3-p286:005>self.class.send(:define_method,:"i have a space")do1.9.3-p286:006>puts"I has a space"1.9.3-p286:007?>end=>#<Proc:0x007ff89c1e0b58@(irb):5 (lambda)>1.9.3-p286:008>send(:"i have a space")Ihasaspace=>nil1.9.3-p286:009>

But having created such a monstrosity, how do you call it from the repl? Or for that matter, from an actual Ruby program? This is obviously something you should be doing in production…

12345678910111213

self.instance_execdodefmethod_missing(sym,*args)# Splat args if passed in from a parent callifargs.length==1&&args[0].is_a?(Array)&&args[0][0].class==NameErrorargs=args[0]endmethod_names,arguments=args.partition{|a|a.class==NameError}method([sym.to_s,*method_names.map(&:name)].join(" ")).call(*arguments)rescueNameError=>ereturn[e,*arguments]endend

Bam. You may be looking at this baffled (or if you’re reasonably tight with metaprogramming in Ruby, sharpening/setting fire to something with a view to causing me significant bodily harm).

Walking through this, we first of all act on whatever self is; in most cases this will be the local scope. If we didn’t do this, we’d be defining the method on Object, which can cause all kinds of headaches when you’re trying to debug.

Immediately after this, we unpack arguments if they look like they were created by an earlier instance of this method. This is unwieldy, but unfortunately Ruby’s single return values and the recursion we’re employing here make it necessary. We could definitely define a subclass of Array to make the test cleaner and the implementation more robust, but I preferred to keep this as short as possible and use the bare minimum number of Ruby primitives.

Once we’ve unpacked our arguments, we do the real magic. First off, we split our arguments into NameErrors, the container we’re using for our missing method names, and everything else (the legitimate arguments we were called with).

We try to find a method with the current name (as we’ll be building our method name right to left with recursive calls to method_missing), and failing that we pack up our current attempt with our arguments, and return it for the next pass.

There are enough issues with this (if you defined the methods foo bar baz and bar baz, a call to foo bar baz would call foo with bar bazs return) to make it unwieldy. On the other hand; if those bugs are the only thing stopping you from putting this into production, you’ve probably got larger issues.

If this large scale abuse of the language excites you, you might be interested to know that we’re hiring.

At this point you’re probably eager to know.. does it work?

123456789

1.9.3-p286:001>load"bare_words.rb"1.9.3-p286:002>self.class.send(:define_method,:"i has a space")do|name,greeting|1.9.3-p286:003>puts"#{greeting}, #{name}!"1.9.3-p286:004?>end=>#<Proc:0x007fc6b41872c0@(irb):2 (lambda)>1.9.3-p286:005>ihasaspace"richo","Hello"Hello,richo!=>nil1.9.3-p286:006>