|
Service Level Management: The Big Picture
by Rick Sturm
Service Level
Management (SLM) -- to most people in the industry, the term refers to a
very narrow, finite space -- Service Level Agreements, metrics, data
capture, reporting, and so on. The products that IT departments and
service providers purchase for SLM are predominantly in the area of
performance management and reporting, plus tools to measure
availability. While those are not the only tools, they are the
dominant ones - the ones that certainly dominate the thoughts of most
service providers. That reflects their thinking about SLM.
However, SLM
is much broader than that. It encompasses everything involved in
delivering a service at acceptable levels. When the subject is
addressed in this way, most of us will be quick to agree. Yet even then,
most will tend to narrowly focus our thinking, concentrating primarily on
the issue of performance management.
Infrastructure Behind the Scenes
It is true
that performance management is an important part of SLM. It is the
part that ensures that expected levels of service are provided, or even
exceeded. Performance management is the piece of the puzzle that provides
for trouble-shooting performance problems and also for continuous
improvement of service levels.
Similarly,
fault management is a key component. Through fault management,
problems are detected and addressed. In some cases, those problems
may impact availability, while others will degrade performance. However,
both impact the level of service being provided. Unfortunately, these
components are also reactive in nature. (Yes, I do realize that some
products monitor performance in real time and make extrapolations, issuing
warnings when there is a chance of violating a service level guarantee.
However, I will argue that even that function has a sense of being
reactive.)
Fault and
performance tend to grab the headlines in our mind. They represent the
issues that we are concerned with on a daily basis. "What else is there?"
you may ask. The answer is simply, "everything" in terms of
infrastructure.
If you are
going to provide any kind of service, you must first "provision" it. That
is, you create the system of hardware and software that is necessary for
you to be able to provide the service. If the equipment and software
selected is not appropriate for the service, there is no hope of being
able to deliver adequate levels of service. There must be facilities
to house the equipment. The service must be configured to support the
customer or internal user. If the configuration changes are not made, then
the user is not going to be served.
Don't Wait
for a Disaster
However, the
list of areas frequently neglected does not stop there. In the wake of the
tragedy of September 11, it is clear that backup and recovery are more
important than ever before. The idea of having thorough and tested
disaster recovery plans, including an alternate site available in the
event of a disaster, has long been viewed by executives as expensive
luxuries. They would look at the probability of an actual disaster
occurring and contrast that with the cost. Far too often, they would
decide that the danger represented an acceptable degree of risk.
Even without
the complete obliteration of a building, every day, thousands of smaller
"disasters" occur. Some merely require restoring a corrupted database. In
other cases, there may be actual damage to facilities or equipment. The
notorious fiber-seeking backhoe may have just severed the only
connectivity between your facility and your users. What are you going to
do? Someone on your staff with a sick sense of humor may decide to create
an anthrax scare by sending a "contaminated" letter to a co-worker. The
police are called and your building is evacuated for two days of testing.
I once worked in a high rise building in the downtown area of a major
city. That building had to be evacuated when it was discovered that a
small natural gas leak from a pipe under the street had filled a
sub-basement with an explosive level of natural gas. Tornados, floods,
earthquakes, power blackouts …the list of possible causes of disruption of
service is seemingly endless. If you aren't well prepared to deal with
the consequences of these disruptions, you are not prepared to deliver the
service.
Security
in an Insecure World
Another
neglected area of SLM is security. The most likely single cause of a
disruption or degradation in service is the well-intentioned, but inept
employee. You say that you have only the finest, error-free employees?
Fine. What about the employees who are not well intentioned? Those people
who, in the face of layoffs or out of pure malevolence, deliberately set
out to damage your facility or disrupt the service? It happens every day
and usually goes unreported. Then there the hordes of "crackers" who try
to bring your service to its knees. It won't happen to you? Don't bet on
it! The choice of targets often seems random. Being small in size or
profile does not provide assurance of safety. Denial of Service
attacks happen all too frequently and like other security issues usually
goes unreported. Then, of course, there are the viruses. These are
becoming increasingly sophisticated and destructive. Not only can viruses
disrupt your service; they can also destroy your firm's relationship with
users by causing sensitive information to be released.
Like
backup and recovery, if you aren't deadly serious about security, then you
are not prepared to deliver the service you are
guaranteeing.
If you're a
user contemplating a contract with a service provider (even an in-house
service provider, such as IT), in the course of your due diligence, you
must address how adequate are their security measures and disaster
recovery plans. Of course, if you are like most users, you won't
bother with due diligence and will instead rely on the representations
made by the sales rep for the service provider. (If this is the case, I
want to be your service provider, and while we're at it, I have some
swampland in Florida that I'd like to sell you. I am assured that,
although it is a swamp, that it is totally free of snakes, mosquitoes and
alligators.)
"And the
Winner is …"
In my last
column, I asked readers to send in their stories of SLM in non-IT
environments, with the promise that one of them would receive an
autographed copy of "Foundations of Service Level Management." The winner
is Anthony Dowdall. He is a Contract Manager for Interleasing in
Birmingham, England. He has implemented an SLM process for interface
between insurance companies and people making repairs to insured
equipment. He did this by creating a set of manual processes to capture
the data and ultimately to log it into a database.
Rick Sturm is the founder
and president of Enterprise
Management Associates, the first technology analyst firm to specialize
exclusively in management software and services. His 25-year career in
Information Technology includes membership on the Internet Engineering
Task Force (IETF) and as co-chair of the Applications MIB Working Group
that developed the standards for managing application software with SNMP.
Rick has authored numerous articles in leading trade publications and has
also co-authored three books, The
Foundations of Application Management, Foundations
of Service Level Management, and Working
with Unicenter TNG. As a recognized and widely quoted authority on
management software and services, Rick has regularly spoken at leading
industry events in the USA and Europe.
|