Major Outage

Incident Report for Asana

Postmortem

Incident: From 2023-03-23 13:30 UTC until 2023-03-23 15:10 UTC all databases responsible for storing data related to Asana users were periodically failing over to their standby instance. We run these databases using AWS RDS Multi-AZ, and AWS triggers these failovers based on health signals they are monitoring. We are still working with AWS to investigate and determine the full root cause but our current best theory is that this was caused by elevated DNS lookup latency resulting in database connection saturation. The increase in DNS lookup latency caused connections to the database to pile-up eventually overloading the databases triggering the automated RDS failover system. We recovered after deploying a configuration change to disable DNS hostname lookups.

Impact: During the incident, users of the web app were unable to load or use Asana for 1 to 5 minutes while the database was unavailable during failover. Users may have experienced these periodic outages multiple times during the 100 minute event. API users experienced a similar downtime impact.

Moving forward: We’re working with the AWS RDS team to fully understand the root cause of the issue as well as making some configuration changes such as disabling DNS hostname lookups on our databases so that we will no longer be as susceptible to increased DNS latency. We believe our database use case does not need to resolve client hostnames. In the long-term, we’ll be revisiting our database connection strategy to make more use of connection pooling, which will reduce our dependence on establishing and managing quick and short-lived database connections.

Posted Mar 28, 2023 - 00:46 UTC

Resolved

This incident has been resolved.

Posted Mar 23, 2023 - 15:51 UTC

Update

We are continuing to monitor for any further issues.

Posted Mar 23, 2023 - 15:23 UTC

Monitoring

A fix has been implemented and we are monitoring the results.

Posted Mar 23, 2023 - 15:17 UTC

Update

We are continuing to investigate this issue.

Posted Mar 23, 2023 - 14:35 UTC

Update

We are continuing to investigate this issue.

Posted Mar 23, 2023 - 14:33 UTC

Investigating

We're currently experiencing some difficulties; as a result, a majority of our users are unable to access Asana. Our Development Team is currently working to resolve this issue as soon as possible. Sincere apologies for the inconvenience caused, please keep an eye on this page for the latest updates.

Posted Mar 23, 2023 - 14:25 UTC

This incident affected: US (App) and EU (App).