What is a service level agreement?


POSTED: Monday, April 06, 2009

As more and more folks turn to vendors to provide software as a service over the web, the question gets asked, “;Are we getting a good deal?”;

A key to answering this question lies in a contractual document known as a service level agreement (SLA).

An SLA defines the services to be provided by the vendor, and the associated levels of availability, response, and maintenance associated with these services. The biggest concern in an SLA is typically with availability, or the percentage of time in which a system must be accessible and usable.

General purpose Web sites such as search engines, email, and news typically must be available on a 24x7 basis.

The holy grail of availability is referred to as the “;five nines.”;

That is, the system must be operable for 99.999 percent of the time. This means that, in a year, the system can only be down 5 minutes and 15 seconds, for any reason, planned or unplanned.

Typically, even in this day and age, businesses do not five nines status for their operational software.

Many SLAs carve out specific periods of time for system maintenance, and then make an uptime commitment outside of those periods. This commitment typically ranges from five nines down to two nines (99 percent).

Of course, the higher the uptime commitment, the higher the cost, so organizations need to balance the two.

Furthermore, not all downtime is created equally. We're starting to see SLAs that try to differentiate between the types of system outages, the times they occur, and the number of users affected.

Instead of measuring downtime in fixed percentages, these SLAs try to gauge overall business impact.

For example, it isn't hard to see cases where outages at 3 a.m. on Saturday are far less impactful than outages at 10 a.m. on Wednesday.

Failing to meet the commitments of the SLA usually result in penalties of some kind.

These may be monetary, especially in commercial environments. In government organizations, SLAs usually don't involve monetary penalties and can be harder to enforce.

Industry estimates show that about 20 percent of unplanned system downtime is caused by technology failure. In addition to normal wear and tear, equipment fails due to environmental issues such as electrical failure or not enough cooling.

However, approximately 80 percent of unplanned system downtime is attributable to “;people problems.”;

This includes software problems such as bugs and performance issues.

So-called “;operator error”; also falls into this category. For example, the system operator forgets to run a maintenance job.

People problems are best addressed by documenting processes and ensuring that responsible staff members are adequately trained.

Eliminating human intervention wherever possible is also helpful.

If you are evaluating an SLA, it may be handy to ask your vendor about these issues.


John Agsalud is Pacific region director of professional services for Decision Research Corp. He can be reached at .(JavaScript must be enabled to view this email address).