Blog-Icon_4_100x385

2/12/2014 API Impact

We experienced a brief outage yesterday, and we are sorry about the inconvenience. We know how frustrating it can be when a chef-client cannot connect to the server.At 00:12 UTC the firewall configuration was modified to use TCP for syslog to keep syslog messages from being dropped/lost over UDP. The firewall almost immediately began dropping all connections with the log filling up with “%ASA-3-201008: Disallowing new connections” messages. The TCP connections began overrunning the Splunk server, and once the queue was full the firewall began dropping all connections.

Even though the Splunk server was still taking in logs, it was not able to keep up, and because Splunk could not provide the ACK, all available connections in the firewall were hung. The new firewall logging configuration was cleared within 3 minutes of being implemented, however the blocking situation continued. Rebooting the Splunk instance and clearing the TCP connections on the firewall brought the system back online, with 8 minutes of API downtime.

Pauly Comtois