xMatters

xMatters is an actionable IT alerting platform that relays data between systems while engaging the right people to resolve incidents faster. xMatters automates and brings structure to communication so you can proactively prevent outages, resolve incidents, and keep the right people informed. xMatters brings your toolchain together to empower IT and Development enabling the connectivity among your solutions as well as the handoffs in your process.

Deploying our service is a grudge match between customer benefits and customer pain. In one corner, rolling out fixes (yay!) and delivering new features (double yay!). In the other corner, training on new features (boo – sounds like work), and change management processes (more work).

We put a great deal of effort into optimizing these processes, so it’s important to share how we think through these optimizations. For now, I’ll narrow focus on the issue-fix aspect of the deployment process, and in later blogs I’ll cover new feature delivery.

The fine printBefore I start, a quick disclaimer: what I’m about to describe is subject to change. We’ve been a fanatical agile shop for nearly seven years and we believe change is an important principle of the agile approach. So, if you’re reading this in 2017 or later, please understand that we might have tweaked some of these details. OK, now for the good stuff…

Bug zapping

You might not be surprised to hear that bugs are occasionally found in our service (I know, the horror!). Nobody likes bugs, so we need to address them as soon as possible. And because our service uses a number of telecommunication providers to make text messages flong (on my iPhone, anyway), phones ring, and other third-party communication channels do their thing, managing the infrastructure related to these providers requires constant effort. In short, we need to be able to release and deploy fixes FAST.

Keep those cards and letters coming

It also might not surprise you to hear that our customers ask for new features (the audacity!). From July 1st to Sept 30th, we received 133 service enhancement requests – that’s 2+ per business day. The product managers who handle these requests interpret them into new features and then work with our engineers to build them.

So far, so good – and in theory we should be getting these features into your hands ASAP. But these features are product changes: while the customers who asked for a change are willing to take on any additional testing and training work to get their new features, customers who don’t intend to use the new features don’t want that burden imposed on them. That means we need a process that doesn’t force new features on customers too quickly.

Balancing the need for speed

Okay, so we can probably agree we want fixes fast and features not too fast – how can we possibly strike this balance?

First, by using deployment speed: we deploy at least weekly, if not more frequently. Our core sprint work uses a continuous delivery process which we deploy on a weekly schedule so that we can seamlessly deliver non-critical fixes at that cadence. We can also generate and deploy critical fixes in between these weekly deployments as necessary to address issues like zero-day vulnerabilities.

Second, by using feature flags: deployments include the latest feature developments, but some features are not quite ready for your use after just one sprint. And as previously mentioned, your user base might be confused if new buttons and features show up every week. So we use feature flags/toggles on our new features so that we can conditionally turn them on when we’re ready to release them. That allows the new fixes to get out the door without having new features show up. (We’ll talk more about feature flags in another blog, but those of you who are new to xMatters might want to check out our Early Access Program if you want to play with new features faster).

Third, through thoughtful communication: because fixes can cause behavior changes in the system, we have implemented a new communication process to advertise the parts of the service we worked on so customers can keep an eye out for any unexpected behavior. Our support notes are designed to provide this information (here’s a sample from the deployments leading up to our Rogue release). Those notes work in conjunction with the scheduled maintenance posts on our status page to ensure this information is delivered by the latest technologies.

Using these three mechanisms, we can offer the best mix of quick fixes – without forcing training and change management on our customers.

How many times a day do you open, acknowledge, or close an IT incident? What’s your process? Do you have a process depending on the incident, systems involved, and other factors? New Relic Alerts gives you options for how you interact with notification channels for sending alerts.

Alerts is a new tool to manage your alerting policies and integrate with team communication tools like xMatters, HipChat, Slack, and more—so you can immediately let the right people know when critical issues arise.

We’re pleased to announce that xMatters is an official notification channel in New Relic Alerts.

With the new integration, New Relic provides a notification channel that you can use to manage your xMatters integration and communication plans easily. You can also use custom payload webhooks to control how the xMatters alert is delivered.

xMatters allows you to automate and structure communication as events unfold during a deployment or service outage. New Relic enables you to manage alerting policies and integrations, while still providing industry-leading insights.

How it worksWith New Relic Alerts, you can choose the communication tool and easily set up the integration.

xMatters has a similarly easy interface for setting up an integration with New Relic.

Reaping the benefits

New Relic integrates with xMatters to replace manual steps in incident management processes:

When the s#!t hits the fan, you don’t have time to look up who’s on call, draft emails, call collaborators, or send text messages. An instant chat window is definitely the way to go, especially one like HipChat.

HipChat is a true business app. And while it’s tempting to call it a chat application, it’s much more. It’s persistent, searchable, and loaded with extras like group chat, video chat, screen sharing, and airtight security.

So if you’re busy doing other things when a nasty incident ticket starts hogging space on your screen, how quickly and effectively can you get into HipChat? That’s where we come in.

Combined with xMatters, this integration allows individuals to collaborate with the correct on-call resources via HipChat to coordinate and resolve incidents faster. xMatters leverages your group on-call schedules and rotations, escalation rules, and user device preferences to quickly engage the right resources into a targeted HipChat room.

Linking HipChat in your toolchain

Integration xMatters with your monitoring, ITSM, incident management, and communication tools enables you to share data across your entire incident resolution toolchain. Using the xMatters integration, you can open a HipChat room directly from JIRA Service desk or another ITSM system without leaving the ITSM environment.

When you send invitations to collaborate from HipChat, they reference key data from monitoring tools or service management systems. All this data enables your resolution teams to quickly get up to speed and act.

Within the targeted HipChat room, members can use slash commands to see who is on call from a specific team, invite additional resources, and make updates to a service management ticket or StatusPage listing. xMatters eliminates the need to switch back and forth between systems, so your team can resolve incidents instead of worrying about record keeping.

Here are a few other things you can do with the xMatters HipChat integration:

Automatically assign a JIRA issue to the responder

Record HipChat activity back into a service management ticket

Use slash commands to add comments to a service management ticket or StatusPage

Dramatic changes are revolutionizing how we build and use technology. Every company is automating, digitizing, and modernizing operations. We need a better, more connected way to work together as teams so we can harness the insights from our systems and drive effective collaboration.

Just a few years ago, we were all looking around and asking each other the same question. Fortunately, some people are figuring it out and providing some guidance on how DevOps processes can improve all areas of business, from development and operations to monitoring and incident management.

Here are a few examples from our customers.

Pacific Life has transitioned its monitoring from a human-operated system to an automated one. Leaders had to overcome employee anxiety over losing jobs and providing direct access to customers.

Where can you go to hear how experts are solving these complex problems?

We're bring the Agility 2017 Tour to a city near you, where you can hear from experts and pick their brains regarding how they’re improving business and trends you should be preparing for. It's a great opportunity to hear from more than just talking heads.

Every month there’s a blog, new app, new service, new productivity guru, or some other trigger that gets us to think, “Maybe this will help me tame the email beast and make sure that important stuff gets my attention.”I try a lot of these services, whether it’s Priority Inbox, Gmail’s category labels, Gmail inbox, Boxer, Dispatch, Mailbox, Zero, or the myriad other choices. They all have great features but don’t address the underlying problem. It’s still one unregulated communication channel. It’s open to use by individuals, companies, automated systems, distribution lists, and pretty much everything. Its strength in ubiquity is also its Achilles heel.

All that being said, I would estimate that no more than 20% of people in any given company are masters of their email, whether it’s through the use of a particular app, productivity hack, structure methodology for how they work, or a deal with the devil. The other 80% struggle to keep up with the signals in the noise of email.

The impact is most acute when communications are associated with a specific business process or event. If someone on your IT team is late or misses a major incident triage session, that has measurable impact to the business. …

Scenario: Your operations manager has discovered an anomaly in your security system. The business will start to suffer within 15 minutes if it is a major IT incident. What should she do? We have 6 recommendations for managing major incidents.

1. Define a Major Incident

Before your operations manager can determine whether the incident is critical, she has to have a definition for comparison. There is no official definition, so your organization has to have its own. ITIL recommends using three criteria:

• Urgency: Effect on important business deadlines
• Impact: Impact to the business’s finances, reputation and viability
• Severity: Impact to end users, including employees and customers

Share the definition with your operations managers and major incident managers, and put them through training and practices so they’re ready when they’re under pressure.

Bottom line: If you don’t define a major incident, you’re setting up your resolution team for failure.

xMatters

xMatters is an actionable IT alerting platform that relays data between systems while engaging the right people to resolve incidents faster. xMatters automates and brings structure to communication so you can proactively prevent outages, resolve incidents, and keep the right people informed. xMatters brings your toolchain together to empower IT and Development enabling the connectivity among your solutions as well as the handoffs in your process.