Incident Report Number: 2016-017

Single Sign On Authentication Service

Ticket Number: INC0057743

Major Incident Number: MIR0001101
What happened?

The Single Sign On Authentication Service (SSO) that users use to log into some university services experienced an outage.

Who was affected?

All Campus Computing ID (CCID) account holders trying to log into some university services (Google, PeopleSoft and ServiceNow) were affected by this outage. Users already logged in to the affected university services were not affected.

What was the impact?

The affected users were not able to log into the affected university services (Google, PeopleSoft and ServiceNow). Users already logged in to the affected university services were not affected.

What was the timeline of the incident?

Start: 2016/07/15 10:55 – Users began reporting to the Service Desk they were unable to log into university services (Google, PeopleSoft and ServiceNow). IT support analysts began working on the issue.
2016/07/15 11:05 – IT support analysts identified the cause to be the partition for the SSO database had filled up and was no longer responding to login requests. Work began on creating a new larger partition.
End: 2016/07/15 12:10 – The new partition was completed and the SSO database was moved. SSO services were started and service was confirmed restored.

What was the root cause of the incident?

The partition hosting the SSO database had filled up very quickly for unknown reasons and root cause is still under investigation.

What was the work around and resolution for the incident?
Work Around

Create a new bigger partition and move the database.



Resolution

A permanent resolution is still being investigated.

What are any recommendations to prevent this incident from occurring again?

Enhance monitoring of the database server to detect such occurrences before they impact service.

Updates

None.