If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Hybrid View

On-call schedule - what kinds?

Hi!

I've been asked to assemble a DBA team to support databases at our company. Previously we've only had one full time DBA but in the last two months I have begun transitioning a developer into the role, and have hired a Sr. DBA as well. I'm a fairly seasoned DBA but have never been a part of a DBA team and therefore I have some questions about how to assemble one.

Specifically: what kind of on-call rotations / schedules should I look at? Currently the sole DBA is pretty much 24x7 for everything, and it can be a strain. Soon she will have two other DBAs to support her, what kind of rotations should I consider? There are about twenty databases of varying visibility; should I establish primary and secondary DBAs for each instance, then establish a rotation appropriately? Or is it more effective for each DBA to be responsible for all instances, and simply establish a schedule a la "you are off-call every third week" in a staggered 2-on-1-off fashion?

I'd love to hear what you're using, what works and doesn't work. Any other thoughts on the formation of a DBA team (things to keep in mind) would be appreciated!

I have three DBAs I rotate a month at a time. One month you are "duty" (on-call), the next you are "backup", and the third you are "off".

The "duty" DBA gets a pager and is available 24x7 for the whole month. He/she is responsible for all instances. He is responsible for co-ordinating the problem resolution if other resources (Oracle Support, sysadmins, physical plant, etc.) need to get involved. The "duty" DBA doesn't schedule vacation for the month.

The "backup" DBA is the first person the duty DBA calls if there is a problem he/she can't handle. The backup DBA will let the duty DBA know if he has plans to go out of town and and provide alternate contact information (ie. cell phone). The backup DBA is by no means responsible for resolving problems, but can be available should the need arise.

The "off" DBA leaves the office at 05:00 and can go on vacation. THE OFF DBA WILL NOT BE CALLED UNLESS A DISASTER CRITICAL TO BUSINESS IS UNDERWAY.

This works in my environment because everybody knows about all instances and we have standards in place. One person may know more about database X, but when database Y fills up the log_archive_dest, every DBA knows what to do to get it going again.

In my department we have three Oracle DBA's, most are DB2. Anyhow, we have a pager for production that we rotate weekly. We are a tight bunch and if we get stuck or need to switch it is never a problem.
I think monthly would be too much, but that is my opinion. Our Project Lead has the secondary pager but it never goes off.

You also need to evaluate your apps and how critical they are. Can some wait till morning? Communicate with them as well, nothing worse then having them load data without letting your department know ahead of time. Prevention is your friend.

Originally posted by marist89 I have three DBAs I rotate a month at a time. One month you are "duty" (on-call), the next you are "backup", and the third you are "off".

The "duty" DBA gets a pager and is available 24x7 for the whole month. He/she is responsible for all instances. He is responsible for co-ordinating the problem resolution if other resources (Oracle Support, sysadmins, physical plant, etc.) need to get involved. The "duty" DBA doesn't schedule vacation for the month.

The "backup" DBA is the first person the duty DBA calls if there is a problem he/she can't handle. The backup DBA will let the duty DBA know if he has plans to go out of town and and provide alternate contact information (ie. cell phone). The backup DBA is by no means responsible for resolving problems, but can be available should the need arise.

The "off" DBA leaves the office at 05:00 and can go on vacation. THE OFF DBA WILL NOT BE CALLED UNLESS A DISASTER CRITICAL TO BUSINESS IS UNDERWAY.

This works in my environment because everybody knows about all instances and we have standards in place. One person may know more about database X, but when database Y fills up the log_archive_dest, every DBA knows what to do to get it going again.

Tha't's very good. But assuming all DBAs have the same knowledge level.

Usually a DBA should try to help if there are business critical problem.

I my old place (oh how I miss it ) there was a team of 6 DBAs taking responsibility for all the databases (but the more important tended to be looked after my the people with the most experience) - on call was a weekly rotation and escalation was to the lead DBA (after all he did get paid more )

Here it is just me looking after everything so I have to wear my underpants on the outside of my trousers and be a general DBA superhero whenever the need arises, which thankfully isn't that often.

Thanks!

This is some great advice so far, esp. marist89's idea. Over time that is a possibility for our team.

A parallel question: how do you publicize your on-call schedules? Up until now everyone has walked to our lone DBA's desk or paged her directly, but with a team forming I would like to transition to a more abstract process that accommodates an on-call rotation. Any recommendations?

Re: Thanks!

Originally posted by swiego A parallel question: how do you publicize your on-call schedules? Up until now everyone has walked to our lone DBA's desk or paged her directly, but with a team forming I would like to transition to a more abstract process that accommodates an on-call rotation. Any recommendations?

This is difficult, especially when you have a one-man shop. You have to be careful not to get your Sr. guy's feathers ruffled, but at the same time get him to realize that you're doing this so he can concentrate on things more crucial to business.

Your users are another story. Users are stupid. Their problem is the most important thing in the company. You'll never get away from having your Sr. guy do something for Hot Mary in accounting. I don't thing you should try to stop that. However, setup an email alias "DBA.Requests@yourcompany.com" and let that go to the Sr. guy for awhile. After everybody knows about it, switch it to the group leader and let them start doling out tasks.

Rotate the pager that everybody knows the number to. Sure, the Sr. guy thinks it's his personal pager, but the company pays for it. You'll get resistance at first, but after a month of not getting woken up at 02:00, he'll see the light.

Setup an internal number that works on a schedule. Most corporate phone systems can have a number that rotates based on a schedule. If not, just forward the number to the duty pager.

Originally posted by julian Tha't's very good. But assuming all DBAs have the same knowledge level.

Ageed. A couple points to clarify:
1. I don't put a Jr. DBA on duty unless I think they can handle the responsibility and have the ability to realize when they're over their head. I had one Jr. DBA that never asked for help and constantly ended up going down the wrong road because he never stopped to ask for help. He never was on duty and doesn't work for me anymore.

2. I typically am the backup for new people on duty. This way if they get into trouble I can guide them on the correct decisions to get them out.

3. I put people on duty to get them up to speed so they have the same basic knowlege level. For requests that require specialized knowlege, we schedule those during normal working hours. If a global index goes unusable because some developer dropped a partition, I expect all my DBAs to know how to rebuild it. If somebody calls the duty pager and asks for java to be installed in the database because it's "Urgent", we simply won't do that after hours.

Re: Re: Thanks!

The faceless mail address is a good idea as it lets the team handle work how they see fit, your also not effected by someone going on holiday. For a Jnr DBA on call can be a scary thing as frist but after the first few call outs you soon get comfortable with the responsibility and as others have said....its easier knowing there is some backup avaliable.