Resolving Major ITIL Incidents
Many certified IT professionals have experience dealing with problems that are addressed on a day-to-day basis to repair the underlying causes of common incidents. Most IT professionals also have experience with everyday incidents like a printer malfunction. But when is an incident a major incident, and what processes need to be in place to minimize the impacts?
An ITIL® major incident is generally defined by its impact, urgency, and priority. Sometimes an incident like a printer failure can have a high impact, but is easily addressed at the help desk level without calling in additional help. When an incident is more complicated, and involves a “no workaround” situation with major negative consequences, often disrupting service to several internal or external customers at once, then you have a major incident which requires a coordinated effort in order to diagnosis, address, and resolve the issue at hand.
In order to best address a major incident, it is imperative to have a skilled team in place beforehand, consisting of members across all departments. A certified IT major incident service manager will be able to work with ITIL professionals to address the situation with other team members of the major incident response team. The ITIL incident response team will review the data from the help desk and work through the incident’s entire life cycle with the goal of restoring the IT services as quickly as possible. Although major incidents cannot always be avoided, planning for their eventuality by investing in certified IT professionals knowledgeable in the protocol of incident management is vital for minimizing the impact of the outage.
Although the process of addressing a major incident will vary from situation to situation, certified ITIL incident management professionals will generally address the challenge in a sequential manner following standard industry-wide protocol. First, the team will look at the primary level support solutions that were attempted and examine any workaround attempts that may have worked on a limited basis or didn’t work at all. After the initial review is complete, the team attempts to identify the root cause through systematically reviewing the entire scope of the incident. At the same time, the response team is working with other IT colleagues and incident response team members to maintain communication with affected constituents and restore services to the extent that is possible while the incident is occurring.
When the major incident is on-going, the process and data coming in is coordinated and monitored as the underlying situation is being addressed. Other IT team members are brought onboard to address specific hardware and software failures. The aim is to use all available resources to get the incident resolved as quickly as possible and to provide sufficient details to other IT professionals on the team that may be able to prevent a similar outage in the future. IT service managers and other ITIL professionals who work on the major incident management team are multi-faceted and creative problem-solvers with comprehensive skills and a wide knowledge base. As such, investing in the assembly of a team ahead of any major incident pays off in the quality of solution afforded and the time frame needed to achieve the resolution.
Major ITIL incidents are not completely avoidable, but being caught off-guard without sufficient knowledge on your team can have devastating consequences. Assembling a strong major incident response team ahead of time will minimize down-time, protect client confidence, and conserve resources that can better be spent on expanding and marketing products and services.