Partial Outage

Incident Report for Asana

Postmortem

Incident: Around 2023-02-28 20:31 UTC servers responsible for reactivity started to fail due to unexpected input. After reverting the change triggering this, most recovered between 21:30 and 22:00. In some cases stale data was displayed until caches were cleared, which finished at 23:05. Approximately 1% of users continued to see reactivity failures and stale data until around 2023-03-01 00:04 UTC.

Impact: While reactivity servers were down, API writes failed and changes were not reflected to other tabs. After recovery of reactivity servers, in some cases stale data was displayed within our applications until the caches were fully cleared. No customer data was lost.

Moving forward: We are making changes to the application servers which crashed to make them more resilient against unexpected input, and making tooling changes to reduce time to resolution for this class of incident. Architectural changes which are in progress will provide smaller failure domains, which would reduce impact and provide faster resolution for this class of failure. We use the 5 Whys approach to identify technical, operational, and organizational changes to reduce the likelihood and severity of incidents.

Our metric considers a weighted average of uptime experienced by users at each data center. The number of minutes of downtime shown reflects this weighted average.

Posted Mar 03, 2023 - 22:06 UTC

Resolved

This has been resolved but we'll continue to monitor any further issues.

Posted Mar 01, 2023 - 00:07 UTC

Monitoring

The majority of users should be able to see their changes again, but some data will not yet show up. We expect to see a full recovery in ~ 1 hour, at which point all users will see all of their changes again

Posted Feb 28, 2023 - 22:07 UTC

Update

We are continuing to investigate this issue.

Posted Feb 28, 2023 - 22:02 UTC

Investigating

We are currently experiencing some difficulties. Some percentage of API requests are failing, and changes you make will be slow to show up. Our Development Team is currently investigating this issue and we're hoping to get it fixed as soon as possible. Sincere apologies for the inconvenience; if you're impacted by this issue, please keep an eye on this page for the latest updates.

Posted Feb 28, 2023 - 21:32 UTC

This incident affected: US (App, API, Mobile), EU (App, API, Mobile), Japan (App, API, Mobile), and Australia (App, API, Mobile).