How to operate Incident Management
1. Detecting an incident 2. How to report an incident 3. Initial incident investigation 4. Request technical support 5. Incident diagnosis 6. Incident resolution 7. Incident closure
1. Detecting an incident
Incidents are usually detected because an error occurs or the system does not respond as expected. Some of the simplest resolutions find that user expectation is to blame. The system is expected to run some software that has not been installed or the user has not followed instructions, or the instructions on how to operate the system or software are not clear enough to be understood.
It is important to discover whether or not an error really exists. This is where good detection can prevent an unnecessary callout to a technician.
How to detect whether an error has occurred
The basic steps to detecting if an error has occurred can be performed by the User.
1. Repeat the process that produced the error and see if this occurs again. For example if you were trying to print, click on the print document icon. This time, take note of what did or did not happen.
2. Try another way to achieve the desired result - most software will have functions operated by a click on an icon, selecting from a bar menu at the top of the screen or a combination of key strokes
3. Look at the screen to see if any error messages are displayed. Windows can often hide the error windows if you have several applications open, so check the bar at the bottom of the window. Are any of the symbols flashing? If so, click on the symbol to see if an error window has been opened.
4. Write down any error messages on the incident sheet. Also write down which applications were open at the same time. Most importantly write down when - to your knowledge - this function last worked on this system.
5. If there are no error messages but you did resolve the incident, it may be worth letting your service desk know if the instructions for using the system or software are difficult to understand, don't exist, or need amendment. You could be saving someone else's time with these observations. Use the incident sheet to let the Service Desk know.
The next step from detection is the initial incident investigation, usually performed by the Service Desk on receipt of the completed incident sheet.

2. How to report an incident
If an incident is detected it must be reported. Do not worry if someone else has reported the incident; the service desk will be able to identify duplicate reports.
The impact of not reporting an incident can be great and lead to a lot of disruption if not resolved. All staff have a responsibility to ensure that others have a good working system and errors cannot be left unresolved. You wouldn't walk past a fire and hope that someone else raises the alarm; the same approach should follow for computer system incidents and problems.
The incident reporting process should be simple enough to follow and should not impact on the time or other resources of staff that first detect the error.

Service Desk Forms
see Incident Management downloads for the following forms

Incident/request sheet
 
 
User guide to completing the incident/request sheet
 
 
Service Desk guide to completing the incident/request sheet
 
 
Call log
 
 
Service Desk guide to completing the call log
 
 

Benefits and disadvantages of using forms to report incidents

 
Benefit
Disadvantage
No technician in school
 
 
When a fault occurs, there is not a technician to tell. Therefore putting the details of the fault into writing removes the ambiguity or brevity that can occur during a verbal explanation. For example telling a technician 'it doesn't work' is not a helpful explanation.
x
 
Clear details of a fault written down can provide time for the technician to prioritise workload and understand the work involved in solving the fault. This can reduce time and improve efficiency. For example the resolution may involve a quick phone call to the school instead of a scheduled visit.
x
 
All faults would need detailed recording to ensure the best possible chance of getting a solution, especially when you do not know who will be dealing with the call.
 
x
Staff would not be able to get quick solution, as a technician is not on site. Quick solutions may involve the technician knowing the solution or finding out the solution from a knowledge base - either available in the school or through other sources such as the Internet.
 
x
 
 
 
Technician in school
 
 
When a fault occurs, it is easy for staff to just tell the technician the fault details. The staff feel that they have passed on the responsibility for the fault and can expect the technician to deal with the incident.
x
 
Technicians are immediately available to resolve high priority problems and reduce the impact of an incident.
x
 
Ambiguity or brevity can occur during a verbal explanation. For example, telling a technician 'it doesn't work' is not a helpful explanation.
 
x
Verbal contact with a technician in a corridor will not ensure that the work is recorded or prioritised.
 
x
The technician can feel 'put upon' and this may discourage their efforts to try to prioritise their workload. This will reduce the benefits of their service.
 
x

3. Initial incident investigation
Initial investigation.
The incident log will be checked using key words from the incident sheet. For example, you can search for an error message code in the call log using the find button in the spreadsheet.

With experience, the service desk will know if the resolution to the problem can be found in the schools knowledge base or if a technician is to be contacted.  The schools knowledge base will be checked for a resolution.

If the knowledge base provides a solution, then this should be tried before contacting a technician. This is where the system agreed by the school will be followed.
The options are these:
1. Someone in the school tries the resolution and a technician is not called.
2. The technician is contacted and given the resolution found in the knowledge base.

4. Request technical support
If a resolution has not been found, the technician will be contacted and provided with details from the incident sheet. Again the system agreed by the school will be followed.
The options are as follows

1. Email, post or fax the sheet to the technician
2. Telephone the technician and discuss the incident and action taken so far.
3. Leave the incident sheet for collection by the technician at their next scheduled visit.

What to do if  upgrades are required
This is where Incident Management links into Release Management and Configuration Management.

Upgrades must be planned - even if to one system. If the upgrade is not tested for compatibility with the other software on the system, further errors could occur.

5. Incident diagnosis
The user could perform the first stage of the investigation, if they have the diagnostics tools available and the confidence to proceed.  You can implement User self-service tools assist with this.  Otherwise the first stage is performed by the schools single point of contact at the school's Service Desk.

In the early days of a Service Desk, there will be little previous information to check, so all calls will be passed to the person providing technical support.

After approximately three weeks of using a Service Desk, there should be enough information to enable the SPOC to check previous calls and see if there is any similarity.

These checks should be done in the following way:

1. Check the summary of initial action taken and see if a similar incident occurred previously. If a previous incident exists, find the incident sheet which should be filed in date order, then check the details of the incident. If the summary of the incident is the same, where possible try to see if the resolution can be implemented by a member of staff, before a technician is called.

2. Check any diagnostics sheets or diagnostic information supplied to the school to see if there is any help available.

3. Report the incident to the technician.

In all cases, the action taken should be recorded on the incident sheet by ticking or circling the appropriate box.


User self-service tools
What are user self-service tools?
User self-service offers users a strategy that enables them to use for obtaining support services without direct intervention from a technician
The most important thing to identify is who will use the tool and what the tool is to be used for.

The tools can be
  • written lists of things to check
  • flow diagrams with easy to follow instructions
  • on a CD or on the school network, created by the school technical staff or provider
  • online through the Internet, or downloaded to the school intranet (if it has one)
  • diagnostics supplied by the hardware manufacturer or software manufacturer
  • telephone support.

How can user self-service tools be used?
How user self-service is implemented can vary significantly, depending upon what the school wants to achieve and the range of services being offered:
  • Users register their own requests and check on their progress.
  • Users then have direct access to support information and knowledge.
  • Users are able to manage support transactions themselves.
  • Users can search knowledge bases for solutions.
  • Users can download program updates or bug fixes.
  • Users can order goods or services.
  • Ease of access and speed of resolution is increased.
  • Demand on support resources is reduced.

A strategy for deploying user self-service tools
A successful user self-service strategy depends on several important factors:

School Leadership Commitment
  • Any initiative that entails change within a school requires leadership support and commitment to execute the initiative.
  • See the Change Management process on how to introduce any changes within ICT to your school.
  • It is essential to put the right processes and tools in place to ensure that while the user is in control, they are following a path that is carefully designed by the school or provider.
  • Users need to know what user self-service channels are in place, along with the value and responsibilities of using them.
  • If the decision has been taken to supplement technical support with a self help tool, users must understand that if the system in unavailable, they should wait and try again and not pick up the phone.
  • Email contact should be used, together with online communities to share the information obtained, where possible.

Support processes are maintained
  • It is important  that none of the existing Change Management and Release Management processes are bypassed or invalidated.
  • Maintain the process of completing the incident form, even if the self- service tools enable the incident to be resolved, as time and effort were still spent on the incident.
  • The effectiveness of the service is monitored by measuring what self-help services are being requested, how often and what for.
  • Feedback will be required on how effective the ideas were on resolution, how well they were presented, did the incident recur.

Content of the self-service system
  • Any system that is not easy to use or that does not contain high-quality content will fail.
  • If the users are unable to get the information they need when they need it, they will immediately pick up the telephone next time they encounter a problem.
  • In a worst-case scenario, the support team will find itself supporting yet another application - the self-service system itself.

To buy or create a user self-service system
Buy into a provided user self-service system
  • Does the system provide benefit to your school?
  • Is it more cost effective to use a user self- service system by reducing staff costs
  • Can you be sure the advice is current and accurate?
  • Can you carry out the advice given by the user self-service method?

Creating your own user self-service system
  • Do you have the resource, both now and in the future, to plan, implement, upgrade and maintain your own user self-service system?
  • Who will support your own user self-service system?
  • How long will it take to develop?
  • Who is going to pay for it?
  • When will it be ready?
  • What if your 'experts' leave?

Known errors
A known error is a problem that has previously been successfully diagnosed and for which a workaround has been identified.  For example, where the cause of the incident is an existing problem with the version of the software, a workaround is a software 'patch' that can be installed. The problem will only be fixed with the next release of the software by the manufacturer.

A known error can also be referred to as the root cause of a problem (or incident).

Information about known errors can be supplied by manufacturers of hardware and software.

What to do when a manufacturer notifies of an error condition

Manufacturer notification of error conditions.
  • These will be cascaded through manufacturer websites, suppliers, computer magazines, blanket emails and word of mouth.
  • It may also be the outcome of a reported incident - to find out it is a known error. Usually known errors of this type have a fix or workaround that can be applied. However, it is frustrating to discover, after reinstalling software, that the error would occur anyway and has been acknowledged by the manufacturer.
  • It is useful to have the ability to use the internet and know how search engines work. Putting the details of any error message or details of the error into a search field may produce several thousand results. Filtering the results will help in finding the more useful advice on what to do next.
  • With experience and confidence the Service Desk may be able to use this approach to find known errors before a technician is contacted.

Workaround
  • A workaround is a method of avoiding an incident or problem, either from a temporary fix or from a technique that means the user is not reliant on a particular aspect of a service that is known to have a problem.
  • Workarounds are an acceptable way to resolve an incident, they achieve the aims of incident management - to get the user working again. The Service Desk or technician must then acknowledge that the underlying problem still needs fixing, but the time to fix is not impacting on the user.
  • This leads from a reactive situation into a proactive situation.

Workaround examples
1. One of the safest workarounds is to use a different computer or printer where possible until the incident or underlying problem is resolved.
2. If lots of windows and applications are open on the computer, close them down and use the software producing an error on its own to see if the incident recurs. It may be that too many windows were open and the computer doesn't have enough memory to run lots of applications at once. The workaround is to use one or two applications at a time until more memory can be added to the computer.
3. If the error affects printing, try copying the file to be printed onto a floppy disk or saving it the file server and printing from another printer. Try to use a printer that has more memory, as printing problems often occur when a complicated file is sent to a printer without much memory. The errors often exhibited do not suggest a memory problem, but by successfully printing to a larger-memory printer, you can prove that this was the cause of the incident.
4. Have a list of memory-hungry applications available to the users, to help them decide which applications to shut down first if the computer appears slow or unresponsive.
5. Make sure everyone knows the rules for password resets or email resets. One school has decided to make the password rename to 'diarrhoea', to encourage users to remember their original passwords more often!

Spare equipment
Think about the following to see if this could be an effective workaround in your school.

1. Ensure that there are four spare computers. Really spare!
2. Configure two of the computers with the school's computer image (standard build, see Release Management for further information) ready to swap out when required.
3. When an incident occurs that cannot be resolved in 15 minutes, replace the computer with one of the spares.
4. Bring the faulty computer back to the technicians area.

Do not try to fix the problem.

5. Re-image the faulty computer with the school's computer image.
6. Run a set of pre-approved tests onto the re- imaged computer.
7. Make the re-imaged computer one of your two spares.

With the other two computers, configure one with any new software and create the new image. Then test the new image on the second computer.

These four computers are not equipment for use in the school as ordinary equipment. If you need to allocate one as an additional resource, ensure that a replacement is ordered and put back into the group of four.

The cost of these four computers compared to lost teaching time, hours of technician time each week, unresolved incidents and problems should pay for the cost of them over and over again. But you must stick to the rules and re- image. Do not try to resolve the incident or problem. This may be boring - but it can be very effective.

How to diagnose incidents
It is possible to own a library of books about IT and ICT and never come across the concept of incidents or errors, except in software programming! It is not an easy subject to describe or on which to produce a guide to the best approach.

Once an incident is reported and passed to a technician to resolve, you then move into the real art of technical support. Sometimes this can appear to the user or customer - delete as a 'black art', the area where the person with the technical knowledge has a real grasp of the subject and no one else can expect to achieve that end result. However, this is not true. Users can do most of the diagnostics themselves and provide a pointer towards where the root cause of the incident really lies. This can help in the speed of resolution and increases the productivity of the technician.

Here is one approach to incident diagnosis that you may like to try.

There are several steps but one of the most important steps to take is the pause. The pause is the step where the decision is made about which action to take first. If action is taken before the diagnostics, it often becomes more difficult to resolve the incident.


Steps in incident diagnosis
Use the incident diagnostics sheet - see toolkit

1. Establish current status by deciding which area is the likely cause.
    • hardware
    • software
    • network
    • user guide
    • other.

2. Pause and decide which action or actions to take.
It is important not to act rashly as this could create further incidents!

3. Take action and record the results. This could be an iterative process, but it is vitally important to record what was done.
There are many examples of what should have been a five minute fix taking several hours because the technician failed to record the actions taken.

Checking the knowledge base
In the early days of incident management, it is likely that the first checks will be in the call log using a keyword search.
When the call log grows larger or the decision is made to use databases, it is important to ensure that a search facility it available. The school should decide which words or type of words should be recorded in the incident resolution. This will enable searches to be made using these words to aid in checking the knowledge base.

Mature systems may implement the use of categories to help with quick searches of the knowledge base. However, the incorrect use of categories can reduce the effectiveness of searching and the overall usefulness of the knowledge base.
Technician diagnostics
The technician does not require an arsenal of diagnostics tools for incidents. More in depth analysis is performed in Problem Management.

Go to the toolkit to download the incident diagnostics sheet

6. Incident resolution
The aim of incident resolution is to establish a resolution or work-around as quickly as possible, in order to restore the service to users with minimum disruption to their work.

Incident Management at this stage can often be at odds with Problem Management
  • Incident Management aims to get the system back up and running and a quick fix will do.
  • Problem Management seeks to identify the cause of the incident to prevent it being repeated and a quick fix will prevent the problem diagnosis required to identify the cause.

After resolution of the cause of the incident and restoration of the agreed service, the incident is closed.


Incident handling
graphicThe process for handling incidents from detection through to closure is shown in the diagram below.



Logging incidents and making requests
There are various ways to log incidents and make requests. The success of the method used depends on the size of the school and the flexibility of those providing technical support.
Corridor approach
This is a similar way of logging calls as the 'visit office' approach. In this instance the technician or the person providing technical support may not even have the opportunity to write down the details of the incident. The user is confident they have 'logged' the incident or request and then feels let down when the call is not actioned appropriately.
Visit office
The user visits the technical support office to report an incident or make a request. This approach inspires confidence in the user - they have discussed the problem with technical support and know that action will ensue.
Often this approach does not benefit anyone if those providing technical support are beseiged with visitors and do not have time to prioritise their workload or start working on the incidents. The staff providing technical support feel they are very busy, but not proritising their work reduces their effectiveness.  This reactive situation does not embrace best practice.
Paper record of call
The user completes a paper form with details of the incident and posts it in an in-tray used by support staff. The tray is often placed in staff rooms or near reception. Multipart copies are useful in giving users a copy of the details they have logged.
Using this system relies on technical staff collecting the forms and allocating priorities sufficiently quickly to encourage staff to continue using the system. It will fail if users find their form still in the in-tray later in the day.
Registering details by phone or email
External service desks may use phone or email to speed up the process of logging calls. The user must be armed with information about the system they are calling about, which may include an allocated asset tag number and machine type.
The speed of response is not determined by the speed with which the call can be logged. Users may become frustrated if they are required to provide lots of information to supply to the support team, only to find that the response is not what they anticipated. It is important to make all users aware of the agreed response times with this service.
Computer interactive
The users uses a simple online form to log the incident or request. The form is easy to follow and is automatically sent to the technical support team. Having completed the form, the user should be confident that the call will be actioned and will wait for a response from the support team.
Because there is no interaction with a person, the system must be proven to work, or users will quickly avoid this method and use the 'corridor' approach instead.

Details to be recorded for service desk calls
  • information about the incident or request
  • user name and contact information
  • Service Desk details (to be completed by the single point of contact)
  • resolution.

7. Incident closure
Incident closure is an important aspect of incident management and should not be overlooked.

Once the incident is resolved, closure aims to ensure that the lessons learnt are recorded for future use. This is where recording the details of the incident and resolution contribute towards reducing the impact of future incidents.

Category - It is more appropriate at the closure stage to assign a category than when the incident is first reported. Once the incident has been resolved, the knowledge is available about which component or part of the system caused the symptoms of the error.

Known errors - Once an incident has been resolved, the solution becomes a resolution or workaround and can be passed to problem management to be logged as a known error.

Update the call log and incident sheet
  • Enter the closure details in the call log including the category of incident (if used by the school).
  • Enter the closure details on the incident sheet.
  • File the incident sheet in chronological order, using the date when the call was first placed.