CanvasPop Cuts Application Response Time to Microseconds with Help from New Relic

Since 2009, CanvasPop has delivered more than 280,000 high-quality, custom-made canvas photo prints to 100,000+ customers across North America. With an "obsessed" approach to customer service – backed by an extraordinary lifetime guarantee – the company matches every customer with a personal in-house designer to help ensure total satisfaction.

CanvasPop operates two facilities in North America. One is the company's headquarters in Ottawa, and the other is a 20,000-square-foot facility in Las Vegas built to support manufacturing and order fulfillment across the continent.

Environment

The vast majority of CanvasPop’s code-based infrastructure is hosted on Amazon Web Services. And PHP is the code of choice for most applications. The primary customer-facing application instance is a single Multi-AZ load balancer with a two-node cache cluster and a Multi-AZ instance on Amazon RDS. Other components of CanvasPop’s predominantly open source based environment include:• Linux• Apache HTTP Server• Chef systems integration framework• Git source control• Gearman job queue server• jQuery• Backbone.js• Zend Framework• ImageMagick

Challenges

Like any online retail business, CanvasPop experiences major spikes in traffic — some more predictable than others. Most of the site’s traffic appears during the fourth-quarter holiday season, when average pageviews rise from about 25,000 per day to as many as 70,000 per day. At other times of the year, heavy media coverage may cause similar jumps in demand. “When we were on Good Morning America, our traffic jumped to 10 times the daily average,” says Paul Brohman, Lead Software Developer at CanvasPop. “We love the exposure. But the exposure won’t do us any good if our site can’t handle those unexpected visitors.”

Another challenge arises in the CanvasPop manufacturing facility, where order fulfillment requires a complex interchange between digital and physical processes. When a customer creates an order, he or she will upload a digital image and select from a variety of canvas sizes, frame designs, and image effects. Then the CanvasPop system delivers the image to an in-house graphic artist who can work one-on-one with the customer to recommend adjustments or enhancements. After printing on canvas, the customer’s digital image becomes a physical product, with CanvasPop employees laminating and then manually framing each piece of artwork prior to shipping. “The interplay between digital image and physical product can make order fulfillment very tricky,” says Patrick Leckey, Senior Developer and Lead Systems Architect at CanvasPop. “If we’re not careful in navigating those transitions, an order can simply disappear from the system. That’s why the performance of our backend manufacturing application is so important — without it, we essentially don’t have a business.”

As it turns out, business is booming. But success brings its own IT challenges, with customer demand threatening to exceed the scale of the CanvasPop infrastructure. By late 2011, the IT team determined that unless they performed a major site overhaul, the customer experience would begin to suffer. “Until very recently, CanvasPop was largely an ASP application,” says Leckey. “It was a hodgepodge of different systems. Our applications weren’t scalable. And we had very little access to real time information about system performance, because we only received reports from our application monitoring vendor once a day — and even those reports were very limited in scope.”

In order to build a more scalable infrastructure and keep a closer eye on system performance, the CanvasPop IT team decided to migrate their entire system from ASP to PHP. “We literally redesigned our whole system in a matter of months,” says Leckey. “It was a huge undertaking. Fortunately for us, the day we began migrating was also the day we deployed New Relic.”

“New Relic alerted us to the Amazon outage before Amazon did. We were able to bring up another copy of our database, along with some application instances in AZs that weren’t affected, and get back online. In fact, our infrastructure was back up and taking orders faster than portions of Foursquare, Pinterest, or Twitter”

Patrick Leckey
Senior Developer and Lead Systems Architect

Solution

After a six-month process of building and testing, CanvasPop went live with its new PHP-based system on August 1, 2012. The New Relic PHP client was with them every step of the way. “Deploying New Relic was a five-minute job at most,” says Brohman. “We downloaded the Chef-specific recipe for the PHP agent, plugged it into Chef, and that was it. With our old monitoring solution, we would’ve needed to wait until the end of the day to start seeing numbers. But New Relic started giving us actionable information within minutes.” It wasn’t long before the CanvasPop IT team realized that they could use New Relic to monitor not just the new PHP system, but legacy systems as well. “It can be tough to troubleshoot problems in code developed by someone else, so our older applications rarely got the attention they needed,” says Brohman. “We’re decommissioning a lot of those systems now, which gives us even less incentive to significantly redo them. In the meantime, New Relic performs a lot of monitoring and debugging to keep that legacy environment performing at higher levels than ever.”

New Relic plays a crucial role in CanvasPop’s continuous deployment strategy, providing the real time data necessary for engineers to deploy changes to production an average of three or four times every day. “New Relic is so effective because it identifies problems so early in the deployment process,” says Leckey. “This is a single, unified solution that gives us the same interface and reports whether we’re in dev, test, or prod. And that helps us anticipate many production issues that otherwise wouldn’t show up in testing.”

Whether monitoring the dot-com front-end or the manufacturing backend, New Relic gives CanvasPop the tools and features necessary to keep all systems on track. For the customer-facing eCommerce application, the following features prove especially helpful:

Real User Monitoring. “When we anticipate a spike in traffic, we use RUM to see what’s going on in real time with every single process in our system,” says Brohman. “It tells us whether or not our current configuration can handle the load. Then we can deploy new instances with one command line or the click of a button.”

Transaction Traces. “Google Analytics can be helpful, but it’s really more of a marketing dashboard,” says Leckey. “The web transactions in New Relic are more engineering-focused, allowing us to see the slow points in terms of database memcache or rendering.”

App Map. “We do a lot of linking with Facebook, Instagram, PayPal, and other external services,” says Leckey. “The App Map is invaluable because it helps us identify if a pain point is coming from us or from one of our third-party partners.” Meanwhile, the manufacturing and order fulfillment application relies on the following New Relic features:

Slow Query Log. “With our backend app, queries tend to be more complex than on the front-end — each query is hitting a lot more tables,” says Leckey. “Within minutes of deploying New Relic, we were able to spot slow queries right off the bat.”

Error Reporting. “We obviously don’t want customers to see errors on the front-end,” says Brohman. “But if they do, it’s usually a quick fix. If an error happens in the backend and we lose an item in the manufacturing process, that’s far worse. We simply can’t have errors in our order fulfillment system, and New Relic helps us keep those to a minimum.”

“We can easily embed any chart from New Relic into another application. That’s a huge value-add, because some of our colleagues may not be familiar with this software, and they may not be interested in the nittygritty technical details. But we can give them access to key metrics by exporting relevant charts and graphs to the applications they’re already familiar with”

Paul Brohman
Senior Software Developer

Results

With help from New Relic, CanvasPop’s application performance isn’t just faster — it’s more consistent, too. “These days, we don’t see any big spikes or dips in how fast our servers respond,” says Leckey. “We have instant access to performance metrics, so we can keep pace with sudden changes in demand. And because the alerting process in New Relic isn’t a binary switch, we can easily see degradation of performance before there’s a real problem. We can set the thresholds that make sense for our system.”

When Amazon Web Services experienced a major outage on the East Coast of the United States in October 2012, CanvasPop was able to stay one step ahead and avoid a significant service interruption. “New Relic alerted us to the Amazon outage before Amazon did,” says Leckey. “We were able to bring up another copy of our database, along with some application instances in AZs that weren’t affected, and get back online. In fact, our infrastructure was back up and taking orders faster than portions of Foursquare, Pinterest, or Twitter.”

In addition to providing up-to-the-minute metrics for proactive problemsolving, New Relic also alerted the CanvasPop IT team to longstanding issues that had simply gone unnoticed. “There’s a query that doesn’t get hit very often in development and staging — it’s very specific to individual customers,” says Brohman. “With New Relic, we were able to see that as soon as the query went into production, it would slow down into the milliseconds. By simply changing from left to inner joins, we dropped that query from milliseconds to microseconds, which saved us huge amounts of compute resources on our database.”

With faster, easier access to performance metrics — and with all metrics generated by a single solution — New Relic contributes to a major boost in productivity at CanvasPop. “By using one solution instead of four or five different solutions, we’re shaving off hours per week for every developer,” says Leckey. “And of course the developers appreciate no longer needing to spend so much time hunting for problems. With this data, they can go straight to the source of the trouble immediately.” All of which will be critical to CanvasPop’s 2013 plans of expanding into the European market.

New Relic also contributes to knowledge-sharing across the CanvasPop organization. “We can easily embed any chart from New Relic into another application,” says Brohman. “That’s a huge value add, because some of our colleagues may not be familiar with this software, and they may not be interested in the nitty-gritty technical details. But we can give them access to key metrics by exporting relevant charts and graphs to the applications they’re already familiar with.”

Soon enough, that kind of knowledge-sharing will move to a whole new level, with the installation of dashboards throughout the CanvasPop office displaying key metrics for all employees and visitors to see. “In many ways, application performance is the heartbeat of our business,” says Leckey. “By using these dashboards to display, for instance, the number of database queries we’ve run in the past 24 hours, we can give our colleagues an up-to-the-minute understanding of the health and vitality of the company. Broader awareness means better alignment across the organization. And I credit New Relic with giving us the most accurate, most current data we’ve ever had for moving this business forward.”