Can anyone give me a brief description of how they had implemented SLA for an email service?

What kind of metrics? Do you have any example or any link where I can download examples of this sort of SLA?
Thanks a lot
Javier_________________--
Javier G. Arcal
PhD Engineer by Universidad Politécnica de Madrid
ITIL V3 Accredited Trainer
ITIL Service Manager® Certified by EXIN
ITIL Foundation® Certified by UK ISEB
ITIL Consultant and Trainer
Email: javier.arcal@gmail.com
MSN: javierrmad

If you're talking about corporate exchange server, I think that availability is the good SLA. Email works or not, and probably all the managers in the company are very concerned about email service failure.

My customer knows that my Email service, support MAPI, IMAP, POP, and OWA protocols.
He knows, that VIP users have an 1 GB storage, and common users have 350 Mb storage.
The server are protected by spamkiller and virus survillance.
I believe, that we're talking about service definition.
Once you have defined your email service, with all its features. (technical specs, procedures ...)

you can build your SLA

for me Availability is the big one.

you can find a lot of indicators to build your SLA.
mainly based on the performance of the service. (guaranted transfer rate, spamkiller effectiveness ....)

Question:
1. How to implemented SLA for an email service
2. What kind of metrics.

Definition:

1. Implemented:
To put into practical effect; carry out: implement the new procedures

2. SLA (Service Level Agreement):
A service level agreement (SLA) is a contract between a network service provider and a customer that specifies, usually in measurable terms, what services the network service provider will furnish. The idea of writing a service level agreement is so that services for customers (users in other departments within the enterprise) can be measured, justified, and perhaps compared with those of outsourcing network providers.

Some metrics that SLA’s may specify include:
a. What percentage of the time services will be available?
b. The number of users that can be served simultaneously.
c. Specific performance benchmarks to which actual performance will be periodically compared.
d. Usage statistics that will be provided.

3. E-mail Service:
A system for sending and receiving e-mail’s electronically, that is provided by a server.

4. Metric:
A standard of measurement. In software development, a metric (noun) is the measurement of a particular characteristic of a program's performance or efficiency.

5. Benchmark:
A set of conditions against which a product or system is measured. A set of performance criteria which a product is expected to meet

Introduction:
Let’s define this first. How do we put into practical effect, a contract between the IT department and the business that measures the ability of the network to provide a messaging service?

I would begin by identifying some of the components of the “e-mail system”:

So were would I start, well I think that MRCASAL is right it is basically availability (does it work or not). That’s what we want to measure, but what do we define as the measurement criteria: the e-mail service alone, e-mail throughput, hardware availability… and the list goes on.

1. E-mail Service:
We need to measure the availability of this service, let’s do this on a monthly basis:

Logging:
This can be monitored using the Event Log, which will tell you when the services stopped and were restarted. As well as other errors that occurred on the server that caused problems relating to the service going down. I would keep a log of this and record any “events” that may have occurred. There are other applications that can do this (Argent Guardian and I am sure some SMTP apps could do this as well). Downtime for maintenance should also be recorded as this is a contributing factor to service availability.

This information would then tell us on which days the service was unavailable.

Calculation:
E.g. In the month of June (30 Days), due to problems experienced the e-mail service (pop3, SMTP, etc…) was not available for 5 days.

(# of days service is down / # days in the month) x 100 = % of downtime of service
= (5 / 30) x 100 = 16.6 % of downtime.

(# of days service is running / # days in the month) x 100 = % of uptime of service
= (25/30) x 100 = 83.3 % of uptime.

What was the agreement?

What was the agreement between the business and the IT department for the required amount of uptime?

i.e. The business has a requirement for the server to be available for 90% of the month. This does not include downtime due to maintenance. By this I mean the business has said to us that the server should be available for at least 90 % of the month, excluding pre-scheduled downtime which does not add to the downtime of the service availability.

2/2.1 The Physical Hardware and Operating system:
Ideally this should be covered in another SLA agreement, one that specifies hardware availability of your servers. The impact that this has on the e-mail system is great but it should not be considered as the same SLA agreement.

3. Network Availability:
Also another component that should have its own SLA, whose impact on the e-mail availability is also great but should not affect the SLA agreement for the mail system, as it is covered in the SLA contract for the hardware.

Here as well you could use network management software that can analyze the traffic flow of data from and to the mail server and identify what is mail, this give us not only throughput but bandwidth utilization of your network and internet access.

SPAM:
(measure of throughput and the breakdown thereof)
An excellent utility for this is GFI mail essentials. Not only does it do an excellent job of blocking spam and you will be surprised as to how much of it there is but it will give you very accurate reports of utilization, amount of spam being blocked as we as identifying users who are receiving the spam and send the largest e-mails.

Remember: Once monitored all of the data that is collected for the monitoring of these SLA’s can be used in the motivation for new equipment or upgrades to existing equipment and communication lines.

Question 2:

In order to establish the unit of measurement we want to first establish what it is we want to measure. Then get a benchmark so that we have something to compare it to. We have already established what we want to measure in the chapter above, however we now need to put a monitoring system in (manual/automatic)

Conclusion of SLA:

Step 1:
Begin monitoring event logs and events.

Step 2: Start a report detailing the calculations above and display in a pie chart, then add more information regarding each incident of downtime experienced, the duration and fixes.

Step 3: Record all finalised information and file for future reference.

Step 4: Structure your SLA agreement around the baselines you have calculated over a period of 3 months, which would give you a fair estimation and you will prevent over committing and under delivering._________________Sincerly,

Your information it is really helpful for me.
I am a consultant, working for a hugue state company defining SLA for email service, we use Windows 2003 Server, Exchange, we have BMC Patrol monitorizing, and access to email service through MAPI (for some users) and OMA, OWA for most of users

Business opinion about quality of email service is quite poor, and really this is not very fair sometimes! email service works fine from my point of view.

I have quite experience defining SLA for Service Desk, but not for an email Service, I have think about the following metrics to define SLR in order to define later SLA:

Business opinion is quite low. My advice to you here is to “Fake It”, that’s right. Tell your users that you did a major upgrade to the e-mail system and its running a lot faster now. Try it, it works. That should give you some peace for a few months until the next “Upgrade”…

Your point of view may be closer to the server room than the rest of the company; I would consider trying to access your e-mail from their desks to get a clearer point of view. Sometimes you get the occasion (very rare) were the users have a valid complaint. If I were you I would check it out.

Your experience for SLA for Service Desk, I am very interested in the SLA for the Service desk that you have created, if you don’t mind sharing it can you e-mail me a copy. (nadirkhan@absamail.co.za).

Ok, getting back to the grinder, looking at your requirements. I personally think you are looking too hard into this. Let me define what I mean by this.

Storage (Account size, transfer limit, maximum time a deleted message is stored): I would consider this part of a computer system policy not SLA. It is company policy to allocate 10MB of e-mail storage to a data capturer and 100MB to a manager. Very much like it is company/system policy to change your password every 30 days, with alphanumeric characters and a retention time of 3 passwords, with account lockout. Do you understand where I am going with this; you have a password policy not a password SLA. I think the same applies to your storage limits.

Availability (maximum number of failed emails): This would be hard to monitor as it depends on the type of failure. Firstly how many e-mails flow in/out of your mail server on a monthly basis? I know small companies < 50 users who go through 3000 valid e-mails daily (excluding the 5000 SPAM messages blocked by GFI). E-mails can fail for a lot of reason, Timeout failure, incorrect delivery address, unknown user and so on. By defining this in an SLA you are now taking on the responsibility of failed messages for whatever reason. I don’t think this is a function of IT. Taking responsibility for providing an e-mail service, yes. Ensuring delivery to best efforts yes, but I don’t think we can guarantee delivery. A NDR (Non Delivery Response) is the responsibility of the remote site, not yours.

Continuity (major disasters and normal service disruption acceptable times): OK, The only question you could ask yourself here is
1. Do we have a DR (Disaster Recovery) Plan for this Service?
2. What is the operational turn around time (This should also be covered in the DR plans for the company)?

Security (Encrypted access and transfer information using https, encrypted accounts for VIP staff):
Similar reasoning applies, How do you Monitor this…and its effectiveness.

Other Services (delay notifications, automatic answer):
These are the finer details of an e-mail policy or e-mail usage policy not an SLA.

I would look at things that you can already monitor:
- Performance monitor in windows.
- Some sort of network monitor.
- Like GFI mail essentials, a tool (Spam/Firewall) that can produce reports from within the system on mail flow etc…

I still think that sometimes we give the business to many options or rather too much information. There is nothing wrong with it but it can be a waste of both your time and the time of management. I mean if it cannot be used to motivate for an upgrade of a system then it should not be monitored.

As I type this response I am asking myself a question. If I want a system to be upgraded, someone in management is going to be asking, why, prove to me that this is causing a bottleneck and needs to be upgraded.

So you mentioned storage earlier. Ok if I am running out of disk space. I can run a report on my exchange server to check mailbox usage every month, however there are already proactive alerts built into exchange that warns the user of the space shortage.

But if I am truly running out of space than a performance monitor monitoring that disk space usage will tell you that over the last six months the hdd space has tripled. And we will now have to upgrade hard drives to accommodate that request.

I think that by beginning to ask these questions it inherently answers the question… “What metric do we use to monitor e-mail service”?

Part of this solution I think is on the BMC Patrol website: (Extracts taken directly from website for information purposes)

I am a consultant and in the customer I am working for rigth know, we are using Remedy (ARS) as Service Desk tool,but I have a great experience with Peregrine Service Center 5.1, (I am Peregrine Certified) and a some knowledege of Unicenter suite of CA.

I have defined SLA, UC, and OLA for a Telco company I will email to you from my corporate email account.

Nice discussion. Am implementing ITIL in one of the indian cos. and thought i might just add my views:

I totally agree with Nadir on the SLA's. Response Time (provide you have a tool for monitoring this) and Availability can be your service level targets. Your SLA should also define the Service Hours & Support Hours. In most likely hood the Service Hours for email may be 24 hrs, but for certain support functions you might want to provide a smaller window. It's imp. to define service hours as it will have a direct impact on the Availability.

In our organization, we plan to define attributes for each services. These attributes will vary from based on the customer segments. Few of the parameters that i can think of are: Mailbox size, Max attachment size, VPN access, blackberry etc. These attributes will vary based on the customer segments.
Defining these attributes will assist in developing a costing methodology for IT Financial Management, later on.
Hope this was useful.
Hi javierarcal, was wondering if it is possible for you to share the Service Desk SLA's with me too... Here we have identified the Call-pickup time (i.e. the no. of rings after which the agent attends to the call) and Ticket assignment time as the targets.

About SLAs for call centre I have defined, it were based in three times:

Call-pickup time (not only with no. of rings after which the agent attends to the call, take care that an incident can arrive by email/fax or web)

Ticket assignment time (time between incident is registered and we begin to work with it)

Ticket resolution time (time between incident ticket is registered and incident ticket is solved)

Take care about what happens when a ticket is "pending vendor" or "pending customer" you should stop SLA clocks....

As you know for ITIL, ticket closure can only be done with user resolution acceptace and can only be closed by Service Desk when ticket is solved

I have some SLAs examples for HW/SW "in-situ" or remote maintenance etc.. I have no problem in email to you (give me your email) but this are only examples, remember when defining SLAs is very important agreed and review continously all targets defined

Thanx for the info. As for the SLA's for Service Desk, i guess Resolution times should not be a part of the targtes they have. Thoughly timely update of tickets as well as timely escalation should be. Am saying resolution should not be, as the Support Desk does not have the control over the resolution. But if you are talking about a Call Centre i.e a completely outsource scenario and in that case it makes sense to have the resolution time as targets.

I agree with the material that was written, but would caution on "over defining" and suboptimizing the SLA. The SLA should be written and agreed to by you and the business so that it addresses their concerns for availability, speed, etc.

Beyond that are good numbers for you as an IT professional to track but has no place in being communicated outside the IT organization. As an example, i don't really think the business needs to know or cares about bandwidth utilization of your email applicaiton -- but this will be translated to them in the SLA to them a a performance metric of "x seconds" for login (or message retrieval, etc.)

One final comment..I disagree with the "fake it" notion. Business users today are much more savvy than ever before and will discover your mis-truth. When they find out you were less than honest the relationship may be ruined. Be honest, tell them where you are at, what you are doing, etc and it will go a long way to building a trusting relationship.

I just found this site today. So far, so good. I am also working on our email system, availability and SLA work. Would any of you guys be willing to share your email SLA? My email is davis751@hotmail.com.