Incident Report Number: 2015­-021

LDAP, UWS, PeopleSoft, Database Access, Across Campus

Ticket Number: INC0032219
Problem Number: PRB0010119
Major Incident Number: MIR0001075
What happened?

A device located in the GSB data centre was accidentally connected with a second network connection resulting in the switch becoming flooded with network traffic. A number of IT services dependent on the affected switch experienced an outage or degradation.

Who was affected?

Multiple groups of IT users were affected by this outage/degradation including, but not limited to: Lightweight Directory Access Protocol (LDAP), University Wireless Service (UWS), PeopleSoft, some Voice over Internet Protocol Cisco (VoIP) phones and Database Access.

What was the impact?

The affected users were not able to connect to impacted IT services or experienced slowness using impacted IT services.

What was the timeline of the incident?

Start: 2015/07/30 09:40 ­ Monitoring systems alerted multiple devices became unavailable.
2015/07/30 09:40 ­ IT support analysts began working on the issue.
2015/07/30 09:57 ­ Event was determined to be a network traffic issue in GSB data centre.
2015/07/30 10:05 ­ Root cause was determined a be a cable which was thought to be disconnected accidentally was inadvertently reconnected. The cable was disconnected to restore IT services.
2015/07/30 10:10 ­ Monitoring systems showed devices coming back online. Callbacks were placed to users who reported issues to confirm service restoration.
End: 2015/07/30 11:00 ­ Services were confirmed to be restored.

What was the root cause of the incident?

This issue was a server had multiple physical connections to one of the GSB data centre switches resulting in a flood of network traffic. A cable that appeared to have fallen out of a network port was plugged back in, thinking that it was accidentally disconnected. The network cable, however should not have been connected.

What was the work around and resolution for the incident?
Work Around

Not Applicable



Resolution

The erroneous cable was unplugged from the GSB data centre switch.

What are any recommendations to prevent this incident from occurring again?

Add a procedure to survey all racks to the monthly checks/inventory update process to visually identify and remove all unplugged cables from IST Data Centres.

Updates

Not Applicable