Incident: On 2024-04-08 at 19:40 UTC, an operator ran a routine database maintenance script on a recently-upgraded database containing data required for user sessions. At 19:50 UTC, the script experienced unexpected behavior in MySQL which prevented the database from creating new connections. As a result of this issue, starting at 19:53 UTC, the database was unable to respond to requests and Asana became unavailable to users. At 20:11 UTC, an engineer rebooted the database. The incident was resolved after the database reboot completed at 20:13 UTC.
Impact: From 19:53 UTC to 20:13 UTC, users of Asana in all regions experienced application crashes, making Asana fully inaccessible. No customer data was lost.
Moving forward: We are rewriting the database maintenance script to avoid hitting the code path that triggered the issue, and we have warned operators not to run the script until it can be modified to address this behavior. Based on the nature of the issue and the load patterns of our databases, we are confident it will not be triggered in our other databases or via other code paths.
Our metric considers a weighted average of uptime experienced by users at each data center. The number of minutes of downtime shown reflects this weighted average.