Highlights

Apdex is so effective as trend indicator, it’s now part of weekly KPIs

SoundCloud Achieves High Performance, Exponential Growth with Help from New Relic

Launched in 2008 by Alexander Ljung and Eric Wahlforss, SoundCloud is a social sound platform that gives users unprecedented access to the world’s largest community of music and audio creators. Committed to 'unmuting' the web, SoundCloud allows everyone to discover original music and audio, connect with one another, and share their sounds with the world. Sound creators can use the platform to instantly record, upload and share sounds across the internet, as well as receive detailed stats and feedback from the SoundCloud community.

Challenges

When Tobias Schmidt joined SoundCloud in 2011 he was struck by the sheer size of the company’s code base. “At that time, we had 15 - 20 coders, all developing different parts of the code base,” says Schmidt, who serves as one of the company’s Site Reliability Engineers. “It was extremely difficult for anyone, let alone a new employee, to understand that much code — particularly how it might behave in a production environment.” Usage on the site was growing exponentially month over month, and Schmidt’s team was constantly pushing out new features to keep pace with user demand. In many cases, that meant creating code to meet an urgent need, then leaving it untouched for months or even years.

SoundCloud’s early features worked perfectly well for a small number of users. But in many cases, those same features wouldn’t scale as users became more numerous and more active. “As a social sound platform company, SoundCloud embraces a rapid development methodology,” says Schmidt. “While users may have had only hundreds of followers in the early days, today leading-edge users like Snoop Dogg, attract 490,000 followers — and those numbers continue to grow very rapidly. Our original code couldn’t even begin to accommodate that level of engagement. If we’re going to keep pace with the site’s evolving needs, we need to continually up level our coding efforts.”

All of those coding changes, often made under intense pressure and in response to increased demand, made it difficult for the SoundCloud team to identify which lines of code might be the cause of poor site performance. Prior to using New Relic, team members would often email each other to diagnose any given problem, relying on limited internal data to achieve a very slow, very incomplete picture of the issue at hand. “Without real time application monitoring, diagnosis can be a bit of a guessing game,” says Jacob Maizel, Director of Systems Engineering at SoundCloud. “If you don’t know exactly what you’re looking for, you can spend far too much time hunting for the problem instead of focusing on the solution.”

“Product teams are coming to us all the time asking for new features. It’s easy to get overwhelmed. You can’t possibly do it all, so you must make intelligent choices. Trend data from New Relic helps us make informed decisions about what we absolutely must work on, right now.”

Tobias Schmidt
Site Reliability Engineer, SoundCloud

Solution

In 2009, SoundCloud’s web-serving HTML application was built entirely on Ruby. “We’d identified an urgent need to monitor site performance in what was, at the time, an exclusively Ruby environment,” says Schmidt. “New Relic was the only legitimate Ruby monitoring solution on the market, and we simply didn’t have the resources to develop our own solution. So we thought we’d give it a try — and I’m very, very glad we did.”

Implementation was easy from the start, and continues to be a cinch even as the company’s development environment grows increasingly complex. “Just today, I showed New Relic to a Java developer who’d never used it before,” says Schmidt. “He added it to his service almost immediately. It’s just that easy.”

SoundCloud relies on a number of New Relic features to maintain the best possible service for millions of active users of the social sound platform. These features help Schmidt and his team locate problems and identify trends as they emerge, enabling SoundCloud engineers to prioritize their work with an appropriate sense of urgency.

With Transaction Traces, Schmidt can quickly and easily understand the code written by other developers, and how that code behaves in a production environment. “New Relic sorts problematic code by response rate in near-real time, which pulls me directly to the lines that require my immediate attention,” he says. “In one situation, we were able to locate the source of an issue, cache the problematic areas, and achieve resolution in less than an hour. That simply wouldn’t have been possible before New Relic.”

The Performance Breakdown feature shows the top five paths of any given action, giving a summary of performance that helps developers identify the slowest transactions that require immediate attention.

The Apdex feature plays a key role in SoundCloud’s internal alerting system, sounding the alarm when numbers fall below acceptable thresholds. “Product teams are coming to us all the time asking for new features,” says Schmidt. “It’s easy to get overwhelmed. You can’t possibly do it all, so you must make intelligent choices. Trend data from New Relic helps us make informed decisions about what we absolutely must work on, right now. And Apdex is so effective as a bellwether that we now include that data in our weekly KPI reports.”

Even as SoundCloud makes significant changes to its development environment, New Relic will continue to take the lead in diagnosing performance issues. “We’re in the process of migrating our core web-serving application from Ruby to JVM based languages,” says Maizel. “We’ve had this huge Ruby prototype for years, but now we’re splitting that application into smaller, nimbler services and taking Ruby out of the equation. Our goal is to have all of our user-facing applications served by our API. And we will continue to monitor all of that activity through New Relic.”

“Performance is paramount for any web service. If users encounter any performance issues at all, they’ll quickly move to a more responsive service, even if that service offers fewer features than SoundCloud. New Relic plays a major role in keeping our performance as fast and as consistent as possible.”

Jacob Maizel
Director of Systems Engineering, SoundCloud

Results

New Relic has contributed significantly to major UX improvements on the SoundCloud website by identifying and accelerating slow transactions, leading to improved code performance for even the most active profiles. This is especially crucial for users like 50 Cent — top contributors with hundreds of thousands of followers. “50 Cent drives a lot of traffic,” says Schmidt. “Users can comment on his sounds, and if a lot of comments appear, that attracts more attention exponentially. Our code needs to be able to support that level of activity, which is especially important because his profile is so popular.”

“Performance is paramount for any web service,” adds Maizel. “If users encounter any performance issues at all, they’ll quickly move to a more responsive service, even if that service offers fewer features than SoundCloud. New Relic plays a major role in keeping our performance as fast and as consistent as possible. For that reason alone, we believe it’s more than worth the investment.”

For engineers working on SoundCloud web applications, New Relic is simply a core part of the daily workflow. “If a new developer is doing any work on the public-facing site, I immediately show them how to deploy New Relic,” says Schmidt. “It’s what we use to identify current or future problems or pain points. And all of us keep a close eye on New Relic data during deploys.”

“This software is part of the fabric here at SoundCloud,” continues Maizel. “For many of us on the development team for the public-facing site, it’s ingrained in what we do. It’s kind of difficult to imagine our day-to-day work without it.”