BLINK Outage in Singapore Region
Status Report Update State Resolved
Sep 05 at 01:34am CDT
BLINK experienced an outage on September 4th, 2024, from approximately 9:17 PM Eastern Time (ET) / September 5th, 2024, 9:17 AM Singapore Time (SGT), to 1:33 AM ET / 1:33 PM SGT.
Impact Assessment:
The services impacted during this period included primary redirect services, multiple API endpoints, and the web console functions. This outage affected both self-service and enterprise customers only in the Singapore region.
Timeline:
(All times listed in Eastern Time and Singapore Time)
9:17 PM ET / 9:17 AM SGT (September 5th): Initial alerts notified of service unavailability in the Singapore region.
9:18 PM ET / 9:18 AM SGT (September 5th): Customers in the Singapore region were contacted directly and provided status updates on the situation.
9:19 PM ET / 9:19 AM SGT (September 5th): The BLINK Incident Response Plan was initiated, expecting a 15-minute recovery window.
9:30 PM ET / 9:30 AM SGT (September 5th): Identified an issue with the Extra Packages for Enterprise Linux (EPEL) repository, delaying automated recovery.
9:45 PM ET / 9:45 AM SGT (September 5th): The extended engineering team was assembled, and multiple deployments and manual recovery steps were initiated. We continued to perform recovery and restoration steps until the proper procedure was documented and repeatable. We maintained contact with customers throughout the incident directly via email and text messages.
1:33 AM ET (September 5th) / 1:33 PM SGT (September 5th): All services in the Singapore region were fully restored and operational.
Root Cause Identification:
The root cause of the outage was the unavailability of the EPEL repository, which BLINK relies on for automated recovery during AWS server deployment processes. Due to this dependency issue, the manual failover process required engineering intervention, extending the recovery time.
Action Plan:
BLINK is committed to ensuring the reliability of services. As part of this incident, we have taken the following actions:
Adjusted our deployment scripts to prevent future delays due to EPEL repository unavailability.
We are enhancing our disaster recovery plan to have a secondary server for failover in the Singapore region.
Immediate testing and rehearsals will be conducted to ensure that recovery times remain within a 15-minute window.
BLINK has been actively improving its infrastructure to include a fully distributed architecture with multiple regions and automatic failover.
We will also publish more regular updates throughout any future incidents. Subscriber updates should have been posted as we were communicating with customers directly.
We are happy to discuss further details with customers directly. Please email us at help@bl.ink, and our Chief Operations Officer will schedule a time with you.
Affected services
BL.INK Enterprise: SGP