Highlights

Maintain and improve performance across the stack with New Relic at core of 99designs' performance strategy

New Relic helps 99designs run at peak performance

99designs is a crowdsourcing marketplace for graphic design, where small business and individuals can post graphic design work that they need done. Founded in 2008 in Melbourne, Australia by Mark Harbottle and colleagues Lachlan Donald and Paul Annesley, the firm was started by designers for designers. 99designs connects passionate designers from around the globe with customers, mostly small businesses, seeking quality, affordable design services. The company provides a friendly, professional and secure environment where designers from all walks of life can find opportunity and compete on a level playing field — where they can show off their work, improve their skills, communicate with peers and win new clients. Over 60% of all design contests held online are run through 99designs, which has paid out more than $30 million to the design community in the last four years

Today the 99designs team is 50 strong and is spread equally across Melbourne and San Francisco offices.

Environment

99designs has a multi-tier web stack, with a PHP application layer at its heart. The entire stack runs on the Amazon Web Services cloud. The company uses a lot of open source, and works to contribute regularly back to the open source community. A thorough overview of their architecture is available on their blog.

Challenges

With a global customer base of small businesses looking for design services, and graphics designers looking for opportunities to sell their designs and to improve their skills, both groups are accessing the 99designs site 24x7. Business is lost for both constituents whenever the site is down, so there is a strong imperative to run a site that is quite literally never in maintenance mode. Being global, the team also monitors page load performance from a variety of countries to measure how successfully they are serving the international community.

Each month, 99designs infrastructure supports:• Hundreds of thousands of unique visitors• Tens of millions of page-views• Some 40x again as many HTTP requests

As the size of their codebase, stack and development team grew, 99designs found it challenging to maintain their speed in bringing new features to market, since that speed was determined in large part by how quickly they could find and resolve problems and performance issues which arose.

Although they periodically performed manual performance analysis across the stack, including log analysis, benchmarking, and profiling of frequently hit pages, they still lacked ongoing visibility into performance issues. Getting to the level of detail the team needed required so much time that the team would do deep dives on performance only a few times a year. This meant that by the time issues were identified, they were much more expensive to correct than they would have been at the time they were introduced. Performance problems were slipping under the radar of everyday development.

“Today our team is more pro-active than ever in addressing these [performance] issues, which lets us spend more time doing what we love: making our product better.”

Lars Yencken
Dev Ops Engineer

Solution

In mid-2011 the development team discovered that New Relic offered a level of transparency into their stack that they had not been able to achieve using their current methods. It provided a window into server and in-browser behavior, and provided profiling information even to the level of slow database queries. They found the New Relic interface intuitive and easy to use, and found the consolidated view on a single console to be much easier and faster to use than the multiple consoles from their old environment.

The 99designs development team installed the New Relic PHP and RUM monitoring agents without difficulty. The PHP agent covered the application layer of their site, responsible for serving content correctly. Although static requests are largely handled by the caching layer, the most media intensive pages on the site go through the PHP app, and are monitored and measured via New Relic. These pages are the heaviest in terms of media, the most dynamic in terms of information being displayed, and also the most frequently accessed pages on the site.

The team found that Javascript RUM instrumentation allowed them to shift their focus from basic server response time to “real-world” performance issues. This supported a shift in development focus back towards changes most likely to improve the user experience. Allowing the 99designs team to measure the impact of the static assets delivered across its site, RUM also helps identify and prioritize work that needs to be done to improve end-user performance.

One such example involved the browser load time at peak traffic. The New Relic data indicated that average browser load time was fastest precisely at times when the team felt the load time would be the slowest. After additional research, the team found that the changes in average browser load time spoke as much to their user demographics and global location as it did to server load.

After a few such discoveries, the team has grown to consistently trust the data New Relic provides. They now use embedded graphs from New Relic to form an ‘always-on monitoring’ screen, where they can see at a glance whether any aspect of site performance has changed and needs further investigating. New Relic continues to uncover subtle issues that the team might otherwise have missed, and allow 99designs to rapidly identify the code causing the performance problem, and often the actual deployment that caused the issue. Today, they view New Relic as essential to their ability to continually debug and tune their applications.

Results

99designs uses New Relic monitoring as the core of their strategy around maintaining and improving performance across their stack. New Relic provides transparency into server and browser behavior, as well as profiling that provides insight where time is spent during requests. As their experience with New Relic grows, the team is finding that using the product helps them prioritize work, improves their productivity, and helps them resolve thorny performance challenges.

The development team views app server response time as the key indicator for overall server load and health. New Relic’s browser load time graph is now the key source of ground truth on how users are experiencing their site. That same graph plotted for various countries gives 99designs a realistic understanding of how well they are meeting the needs of their international user base.

“Transaction traces are extremely useful,” says Lars Yencken, Dev Ops Engineer. “It’s as if we were running the site with a profiler enabled the whole time.”

Lars recalled a particularly challenging performance issue they encountered. “Our app server response time dropped suddenly one day — as if our site had doubled in speed. At the same time, the site was taking longer to load inbrowser, and had basically become less responsive. This was puzzling, but by drilling down into the distribution of responses, we were able to determine that a recent change had resulted in the site being ‘peppered’ by huge amounts of very fast requests. New Relic helped us identify and work through this difficult problem and find its root cause much more quickly than we would otherwise have been able to do.”

In another example, New Relic helped 99designs detect a subtle bug which caused a transaction normally occurring 200 times per minute to occur 10,000 times a minute. Being able to find that bug and others like it, have improved 99designs’ productivity and have allowed them to limit their infrastructure costs.

From a development point of view, finding problems in their early stages makes fixing them much easier and more efficient for the team. Now when a new feature or a change is released, the impact is quickly seen and measured. The result is that issues are often discovered and corrected before they escalate into larger issues, and while the feature in question is still fresh in the developer’s mind.

Lars summed it up: “We’re firmly focused on providing a good user experience, and this means it can be hard to justify time spent chasing difficult performance problems for uncertain gain. The best feature of New Relic is that it brings many of these difficult problems into the ‘doable now’ range, making them easy enough to tackle so we can now simply get them done. In the seven months we’ve been running New Relic, we’ve certainly saved weeks of time debugging performance issues. Today our team is more pro-active than ever in addressing these issues, which lets us spend more time doing what we love: making our product better.”