Incident Report Number: 2017-006

outside.srv.ualberta.ca

Ticket Number: INC0094785

What happened?

The server, outside.srv.ualberta.ca, became unavailable due to a hardware issue.

Who was affected?

Any group that relies on data files from PeopleSoft as well as users waiting for a CCID to be created or changed were affected.

What was the impact?

PeopleSoft became unable to send updated data files to services that use the information. As a result, services which depend on the updated data such as CCID management and admission application processing became degraded.

What was the timeline of the incident?

Start: 2017/11/16 22:40 – Monitoring systems reported that outside.srv.ualberta.ca was not responding to tests.
2017/11/16 23:35 – After an initial investigation, IT support analysts restarted the server in an attempt to restore the affected services.
2017/11/17 08:30 – A thorough investigation discovered a hardware failure had occurred. Work began to rebuild the server.
End: 2017/11/22 11:15 – Service was confirmed to be restored.

What was the root cause of the incident?

A hardware failure on the server’s motherboard caused outside.srv.ualberta to become unavailable.

What was the work around and resolution for the incident?
Work Around

Not applicable.



Resolution

Due to the impact of the failure and the urgency to restore services, it was determined that rebuilding the server would the the most effective resolution.

What are any recommendations to prevent this incident from occurring again?

Update IST’s disaster recovery plan to include the measures taken to resolve this issue.

Updates

Not applicable.