Major Production Outage
Incident Report for Asana
Postmortem

Incident: Between 20:00 UTC and 23:30 UTC, Asana experienced a major production outage resulting in limited web application functionality and elevated error rates for API traffic. Two principal application services that serve Asana's web traffic were impacted: LunaDb, which performs data loading and handles communication with web clients, and Worldstore, which functions as a database caching layer and allows users to see the changes they’ve made. Typically these services deploy independently to reduce the load on either system. However, during this incident we saw updates to both services overlap which placed stress on a shared service, causing it to fail which then cascaded to other services. Engineers responded to automated alerts within minutes of the start of the incident, but stabilizing the Worldstore cluster took several hours and several different attempts.

Impact: For the duration of the incident, web-app users saw a loss of reactivity, i.e. they perceived their own changes not being saved or did not receive collaborative edits made by other users. Users of Asana’s API and mobile may have been unable to make changes to Asana at all. At around 23:30 UTC, full application functionality across webapp and API was restored.

Moving forward: We are changing the configurations of our LunaDb and Worldstore services to prevent overload under similar circumstances, and adjusting deployment times of these services to avoid updating both simultaneously.

Our metric considers a weighted average of uptime experienced by users at each data center. The number of minutes of downtime shown reflects this weighted average.

Posted Aug 22, 2022 - 21:32 UTC

Resolved
This incident has been resolved.
Posted Aug 19, 2022 - 00:24 UTC
Update
We are continuing to monitor for any further issues.
Posted Aug 19, 2022 - 00:23 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Aug 19, 2022 - 00:23 UTC
Update
We are continuing to work on a fix for this issue.
Posted Aug 18, 2022 - 23:47 UTC
Update
We are continuing to work on a fix for this issue.
Posted Aug 18, 2022 - 23:28 UTC
Update
Changes made in the past few hours on Asana might not be visible, but any new changes made should be visible. We are working on redeploying our servers, which should make all previous writes to Asana visible.
Posted Aug 18, 2022 - 23:19 UTC
Update
We are continuing to work on a fix for this issue. Changes made in the app will be persisted but may not be reflected immediately.
Posted Aug 18, 2022 - 22:50 UTC
Identified
We've identified some potential issues and are restarting affected servers.
Posted Aug 18, 2022 - 22:07 UTC
Investigating
Our team has identified the problem and is currently working on a fix. We appreciate your patience!
Posted Aug 18, 2022 - 21:00 UTC
This incident affected: App, API, and Mobile.