At LinkedIn, our on-call incidents are managed using Iris and Oncall, two tools that we released as open source to the community about two years ago. Oncall allows our teams to manage their on-call shifts in a largely automated fashion, scheduling rotations without any human intervention. At the same time, it allows teams to be agile and adaptable when defining...

At a company as large as LinkedIn, service degradation isn’t a question of “if” so much as “when,” and when things do break, we need to escalate as quickly as possible to make sure the problem gets fixed. This usually takes the form of calling up an on-call engineer, but what if this person doesn’t answer the phone? In the past, LinkedIn addressed this question...