Kickstarter has become the Internet’s prime vector for Cinderella stories, catapulting pet projects to fame and burying would-be entrepreneurs in more logistics and minutiae than they were ready to handle. There are many different degrees of success on Kickstarter, but when broken down to a binary yes/no score, a group of scientists have found that they can predict with reasonable confidence whether a project will succeed or fail within the first four hours of its launch. Their method is based in part on its social media reception, according to a paper presented early in October.

Three researchers at the Éccole Polytechnique Fédérale de Lausanne created statistical models fed with both funding data and discussions on Twitter. The data set was pulled from over 16,000 Kickstarter campaigns that had raised a collective total of $158 million; approximately half of them failed.

The scientists collected Twitter data by searching for the word “kickstarter,” then matching tweets to the Kickstarter project using URLs included in the tweets. They also culled information from each project’s “Backers” page to get a list of what users pledged money, and how much they had pledged collectively. The second step was time-intensive, so the authors only completed it every two days.

At first, they fed only the backers data into two models, a k-nearest neighbor classifier and a Markov chain model. For a control, the authors used a baseline static model that took into account factors like whether a project had a video or not, its category, and its financial goal.

The baseline model was able to predict the success of campaigns with a flat 68 percent rate. Both the nearest-neighbors and Markov chain models fared far better, even in the early hours of a project’s lifetime.

At only 10 percent of the way through a Kickstarter’s life, or about 3 days, both models could find the ultimate result with about 85 percent accuracy. The nearest-neighbor classifier starts off as a slightly better predictor and edges into the eightieth percentile more quickly, though the researchers note that model is significantly more computationally expensive compared to the Markov model.

Next they added the tweets. The authors augmented their models into "support vector machines" to process the twitter data they’d collected, which used tweets’ timestamps as well as the number of replies and retweets. The authors found that a prediction model using tweets alone did not fare much better in the early stages of a Kickstarter campaign than the static control model did.

In a more granular view of the first fifth of a Kickstarter's lifetime, the prediction model gains significant accuracy very quickly, in a matter of hours.

However, when combined with financial data, the models performed the best in early days, shooting up to around 84 percent within the first day and a half and climbing to 87 percent at the end of the first six days.

Using the combined financial and social information, the final model that combined all this information was able to crack a 76 percent prediction rate within four hours of a campaign’s launch—four percent better than the next best model, the researchers said. The highest prediction accuracy, though, would come at around 15 percent of the way through a campaign, or about four and a half days in. They peaked at 85 percent, using only financial data.

Social reach is a tricky thing to measure. There are plenty of tweets that the authors’ parameters might not have included—people tweeting about a Kickstarter who neglected to mention that it’s a Kickstarter, for one. Likewise, URLs linking to that Kickstarter can be masked by different URL-shortening services, making it hard to track all tweets that are directing people to the same place.

Because the authors’ prediction models are complex, they don’t identify trends that any user could pick out to determine whether a particular project will crack its goal. There is a good answer inside Twitter’s black box—or at least three-fourths of one—it’s just not visible to the naked eye.

Casey Johnston
Casey Johnston is the former Culture Editor at Ars Technica, and now does the occasional freelance story. She graduated from Columbia University with a degree in Applied Physics. Twitter@caseyjohnston

Because the authors’ prediction models are complex, they don’t identify trends that any user could pick out to determine whether a particular project will crack its goal. There is a good answer inside Twitter’s black box—or at least three-fourths of one—it’s just not visible to the naked eye.

If the model can give a 75%-confidence prediction with 4 hours' data, it could give sponsors an early heads-up that the project (probably) needs more promotion.

People are a finicky bunch. I've seen some great looking and well-run campaigns peter out in a week and I've seen some weird stuff take off within hours. I really do believe having a vocal following even before you start a KS campaign is the way to succeed.

People are a finicky bunch. I've seen some great looking and well-run campaigns peter out in a week and I've seen some weird stuff take off within hours. I really do believe having a vocal following even before you start a KS campaign is the way to succeed.

I think it's the KISS principle in practice.

Simple Kickstarters with concise details but lots of press seem to do very well, while others that give tons and tons of details making it hard to figure out what they're going to do with your money....don't get much money.

I think it's because the BS meter in our heads know that lengthier arguments shouldn't be trusted as much.

I have a close relative who is currently running a kickstarter campaign. As a campaign owner he has access to lots of statistics about the backers, including how they discovered the project.

In the first couple of days almost all the backers came to the project via kickstarter's own discovery pages. These were enough to get the project funded, but they started to tail off fairly quickly. It turns out there are large numbers of "serial backers", many of them have backed hundreds of projects. These people presumably actively monitor kickstarter, many of the pledges came within minutes/hours.

Once these "on-site" backers started to tail off he promoted the project to blogs, news sites, media etc etc.

Now he is seeing a second spike of backers, and it looks like this spike is going to be broader than the initial one. It is coming on more slowly (as some blogs post immediately, others take several days, traditional media even longer), but looks like it will peak higher. Hopefully it will also tail off more slowly as these links hang around for longer whereas the kickstarter "recently launched" pages (etc) are fleeting.

I strongly suspect that most campaigns fail to capture this second peak, and that is why the initial hours/days are such strong indicators of eventual success.

Getting that second peak / buzz is very hard work, and a very different skill-set. You can't just spam out an email or use some crappy press-release service. You have to work long hours making personal approaches. You have to be enthusiastic and savvy. Most kickstarter founders are designers, or engineers, or dreamers etc... not that many seem to have self promotion skills.

There is a lot of selection bias in the way people see kickstarter, because mostly you only hear about the projects which generate interest. Go and read the "recently launched" page and you'll see countless projects sat at (or near) $0. The ones that "go viral" have most often been marketed tirelessly.

I wonder if they checked the mood or content of the tweets. They should have done a better search and checked Blog links and facebook. And Tv mentions. They seem to have left some data out of the prediction for no reason.

I wonder if they checked the mood or content of the tweets. They should have done a better search and checked Blog links and facebook. And Tv mentions. They seem to have left some data out of the prediction for no reason.

There is a reason: collecting that kind of data is non-trivial. Putting little taps in the Twitter firehose is do-able for someone who'd like to offer this kind of analytics to Kickstarter sponsors.

I study videogame kickstarters religiously, write updates about it on a forum etc., and 85% of a successful prediction in the first three days isn't really that impressive, at least for video games. There's a couple reasons for this.

Most successful game kickstarters make slightly less than a third of their total funding in the first three days. This broad rule of thumb can predict the final funding of recent successes like Mighty No. 9 and Sunless Sea with 95% accuracy, and Hot Tin Roof to 82%. Mind you, that's a percent of their final total, not their success rate.

The second reason is that there's normally a huge disparity between successful projects and failed projects. There aren't too many nailbiters that don't make it; according to the founder, 90% of kickstarters that get a third of the way there make it.

The third reason is that some kickstarters get funded within 24 hours, and many of the successful ones already have an outside fanbase waiting to back it (Project Eternity, Sunless Sea, Wasteland 2). It's not too hard to figure out that they'll be successful when they're several thousand up in the first four hours.

And a fourth reason is that this isn't that useful. Different classes and sizes of products behave wildly differently. Small kickstarters are by their nature more volatile because one devoted backer can substantially change things. $150 extra would not affect the $4m Project Eternity very much at any time in the campaign, but it provides almost 8% of $1794 Soul Power's funding. Basically, the very successful know they're getting funded, the unsuccessful can be pretty sure they won't, and the marginal games are the unpredictable 15%.

Here's where I think concerted research could be far more helpful:The effect of announcing stretchgoals. I have a strong hunch that announcing stretchgoals during the campaign increases total pledges while having them listed at the start does not. This information could greatly help creators.The effect of stretchgoals on the ability of a creator to deliver the game.The effects of different sites. I have repeatedly read that specialty press is far more important than social media for kickstarter.A rough classification system based on size and type of project, with a brief overview of behavior in that class.

Kickstarter is a fascinating market with a lot of interesting data, and I hope they keep analyzing it.

I'm kind of surprised K-nearest-neighbor was used. I'm in an AI class right now, good to know I'm learning something useful!

So, it would be very wise of you to postpone your current online reading activities until class is off, haha.

AI, Neural Networks, Data Mining, are the coolest stuff I had when getting my Comp Eng Bachelor degree. Now that I'm doing a Data Science Master, I think the same of all the statistics-related courses.

It looked like it was going to fail up to the last couple days - squeaked through only 2 days before the end, but it had a ton of social support and an explosion of donations in those last few days.

I think the Long Dark is a great example of how marginal kickstarters (ones that barely make it) frequently behave differently than very successful ones or utter failures. Frequently funding per individual will go up near the end (which could be a sign of fraud, although I don't believe it normally is). It's a shame they frequently lose the nice parabolic shape, but I have a hunch that an analysis of the failure of this prediction model would show campaigns like these smack dab in the middle.

You can see that it peaks and gets its highest donations five days before the end, which is a rare occurrence. There are four days that come close to or beat the final day on the end, which is also fairly rare. Hot Tin Roof, Tex Murphy, and Clang, some other marginal kickstarters display similar behavior, although it's not quite as dramatic.

It's frustrating because these are the campaigns where a prediction model would be useful. I've been eagerly watching Knite and the Ghost Lights for weeks (link below, spoilered because I backed it), and really have no idea whether it will succeed or not (it's looking less likely now).

Here's where I think concerted research could be far more helpful:The effect of announcing stretchgoals. I have a strong hunch that announcing stretchgoals during the campaign increases total pledges while having them listed at the start does not. This information could greatly help creators.The effect of stretchgoals on the ability of a creator to deliver the game.The effects of different sites. I have repeatedly read that specialty press is far more important than social media for kickstarter.A rough classification system based on size and type of project, with a brief overview of behavior in that class.

Kickstarter is a fascinating market with a lot of interesting data, and I hope they keep analyzing it.

Regarding strechgoals I think you're right, judging purely from my own reactions. It really turns me off when a developer is looking for a bunch of money from the get-go and start listing more and more unrealistic(?) goals before they even know if they'll get the money needed to make the base game. On the other hand I think goals such as console support for Mighty No. 9 announced later on really helps to drive interest after the initial peak.I'm also convinced that a mention on Kotaku or Penny-Arcade leads to at least a small spike on that same day. It can be seen plain as day on some projects even. But social media (facebook, twitter)? Not so sure.

I know I am late to the party here, but I can't say that I'm all that impressed. I created a simple model and can predict success or failure of nearly 70% of projects (a year of projects from June 2012 to May 2013) with accuracy BEFORE a project launches. I figured that adding 4 hours of "live" exposure would increase the model more than 5%.

EDIT: I think it would pretty interesting to team up with a computer scientist and combine methods, as I have no way of including the live data. The statistics I can handle, but mining ongoing data is tricky.