Incident: Beginning around 7am UTC, we began observing API traffic failures, which continued to deteriorate. When action was taken to resolve the issues at a high peak traffic time, key components of the system overloaded.
Impact: Asana experienced approximately 50% unavailability from 14:10-14:36 UTC and approximately 30% unavailability from 16:21-16:30 UTC. An operator error during the final part of the response caused full unavailability from 18:58-19:40 UTC. The API remained unavailable until 19:54 UTC. Background systems resumed normal operation by 20:15 UTC.
Moving Forward: We have since updated our tools to make operator error unlikely to recur, and are investigating the cause of the initial problems.
Our metric considers a weighted average of uptime experienced by users at each data center. The number of minutes of downtime shown reflects this weighted average.