From a Frat Social Network to a GPU Compiler for Hadoop: A Story of How We Discovered What To Build

After 2.5 years working on 4 different ventures,a handful of customers, 2 college graduations, life in 3 different cities, 2 YC interviews, and getting shut down by Facebook once, here’s our story.

The only reason we’ve developed a GPU compiler for Hadoop MapReduce is because we built Greekdex, a social network for fraternities and sororities at our universities 2.5 years ago.

Now you‘re probably asking, “How the hell is a frat social network related to a GPU compiler for Hadoop?”

Many inventions have unintended consequences. The microprocessor is one example. It was originally designed for traffic lights and vending machines, whereas it ended up fueling the computer revolution. Like this microprocessor, the clustering algorithm we developed for Greekdex had unintended consequences.

Greekdex

The Greekdex landing page

Our goal was to populate an entire fraternity or sorority’s directory upon just a few users in the same organization signing up. My co-founder, Chuck Moyes, discovered a clustering algorithm that identified friend groups in a given social graph. However, the algorithm took an hour to run parallelized across 6 computers.

Then Chuck had the clever idea of coding the algorithm on a single GPU. The run time? 0.2 seconds! WHOA.

After a few months of accumulating a few thousand users, we realized that Greekdex was a fun project, but not a business.

Lesson learned: Maintaining focus and passion is difficult after the first launch or two, especially for your first “college” venture.

GraphMuse

The GraphMuse invitation API on RockYou’s Bingo Game

GraphMuse was an invitation widget for Facebook apps that increased the number of friends that users invited. Facebook’s invitation widget was alphabetically sorted, so we knew we could do better.

We paired the Greekdex GPU clustering algorithm alongside a mutual friend counter that determined “close friend groups” that were likely to sign up for the given app.

We launched our product with RockYou, Fitocracy, and a few other smaller companies who were issuing 1.5 million queries per month to our API. We quadrupled invitation rates for some of these customers. Some of them were paying customers–what an awesome feeling!

Then we received this email from Facebook’s Platform Policy Team:

I used the Wayback Machine to see when the clause was added… 3 days prior. We had to shut GraphMuse down.

Lesson learned: Don’t be dependent on anyone but yourself, especially Facebook. They will copy you and shut you down without hesitation.

Codentical

We went back to the drawing board... What should we build?

As I was using Google translator to do my German homework, I proposed we build the very tool that would identify if a sentence was translated using Google Translate.We thought it was a decent idea, and I knew it was an issue in most foreign language classes.

This got us thinking on the “anti-cheating” track… “What about computer science homework?” We knew plagiarism was an issue in some of our CS classes, and Chuck’s senior project had been plagiarized by dozens of students around the world.

So, after some research, we started Codentical. However, this time around, I began by emailing 800+ professors to see if this is something they’d use. I knocked on every Penn CS professor’s door without scheduling a meeting. I even met with a group of 6 Drexel professors prior to our first line of code.

The Codentical landing page

After a month or two, we realized that Codentical is a decent business, and some universities and high schools are willing to pay for it. Our technology is incredibly advanced, and once again, we chose to use the GPU.

Then one night, after a few beers, a spark went off. We quit Codentical.

Lesson learned: Getting feedback early is paramount. Validate before you build, and listen to your users.

ParallelX

Since Greekdex, we’ve been trying to figure out how to leverage the GPU’s power for other companies. However, there are some issues associated with GPUs:

Programming CUDA or OpenCL on the GPU is very difficult, and most programmers have never programmed GPUs.

GPUs are only suited for parallel workloads

A GPU cloud (IaaS) would never be feasible considering AWS dominates the market.

That one night we realized that we could parallelize Hadoop MapReduce jobs on the GPU. Yet how?

Here’s how! By creating a GPU compiler that translates the code you’ve written in Java to OpenCL, and executing it on our AWS GPU cloud.

After comparing some Hadoop jobs and OpenCL implementations, we discovered that we can run MapReduce jobs much faster.

After 2.5 years of using GPUs, we want to share the GPU’s incredible power with the average developer. After 3 weeks of contacting every potential user we could, we’ve got some great feedback to work with. Hell, we even pulled a YC interview for a product that we didn’t build yet.

After building a few startups over the course of a few years, you start to listen to your gut feeling as opposed to naive excitement. Even more importantly, you start to listen to your users.

Could we have discovered ParallelX without Greekdex?

Absolutely not.

Lesson learned: Leverage your X-factor, yet be patient, because it might take years before you realize it.