Incident Report Number: 2015-012

TAO Assessment Servers

Ticket Number: ​INC0026211
Problem Number: ​PRB0010106
Major Incident Number: ​MIR0001059
What happened?

The TAO Assessment Service for the Faculty of Education experienced a service degradation

Who was affected?

Students using the service and the staff running the service were affected by this outage

What was the impact?

The affected users were experiencing a performance degradation while writing online exams.

What was the timeline of the incident?

Start: 2015/04/09 13:15 ­ Users reported a performance issue with an online exam in the TAO environment.
2015/04/09 13:30 ­ IT support analysts began investigating performance issues.
2015/04/09 15:00 ­ Further investigation determined the storage load balancing was uneven.
2015/04/09 16:15 ­ IT support analysts began to rebalance the load to improve performance. 
2015/04/09 17:00 ­ Rebalance was completed.
2015/04/09 18:00 ­ IT support analysts began to redistribute storage of some servers to maximize performance.
2015/04/10 05:30 ­ Redistribution was completed.
End: 2015/04/10 06:50 ­ Service was restored, user verification is pending.

What was the root cause of the incident?

Load balancing of the storage impacted performance of the TAO environment

What was the work around and resolution for the incident?
Work Around

The affected exams were moved to the eClass environment.



Resolution

The storage load was manually rebalanced.

What are any recommendations to prevent this incident from occurring again?
  • Upgrade storage solution (in progress ­ infrastructure refresh project).
  • Increase storage capacity to improve disk space and add processing power to decrease the impact of possible recurrences.
  • Enhance monitoring to detect performance issues.
Updates

Not Applicable