Incident Report Number: 2015-011

Mail Forwarding Service

Ticket Number: ​INC0025533
Problem Number: ​PRB0010101
Major Incident Number: ​MIR0001057
What happened?

The email forwarding service experienced an issue

Who was affected?

Anyone using University of Alberta email service was affected.

What was the impact?

Email messages being sent to University of Alberta accounts were delayed.

What was the timeline of the incident?

Start: 2015/03/28 16:15 – Monitoring systems discovered a problem with multiple systems.
2015/03/28 16:20 – IT support analysts began working on the issue.
2015/03/28 17:50 – The problem was identified to be an unresponsive network interface on the main load balancer.
2015/03/28 17:50 – The network interface was restarted.
End: 2015/03/28 18:15 – All services were confirmed restored.

What was the root cause of the incident?

A network interface on the main load balancer became unresponsive. When the interface failed it did not trigger a failover to the second load balancer as expected. The load balancer has three network interfaces but only one of them went down. Because the other interfaces were still functional the failover to the second load balancer did not occur.

What was the work around and resolution for the incident?
Work Around

Not Applicable



Resolution

Network interface was taken down and brought back online

What are any recommendations to prevent this incident from occurring again?

Investigate if it’s possible to monitor individual interfaces separately.

Updates

Not Applicable