Incident: On 2024-08-01 at 14:09 UTC, an increased connection rate on a central database saturated the database’s ability to accept new connections. As a result, Asana became unavailable to users in all regions.
In response, Asana engineers throttled traffic for automations, webhooks, and API, and were able to restore primary functionality of the Asana application at 15:01 UTC. The application continued to run with sporadic unavailability until 18:22 UTC, when the underlying cause of the increased connections was determined and disabled. At this point the application was fully operational, but asynchronous workloads such as automations and notifications were delayed due to the work enqueued while traffic was throttled. The incident was resolved when asynchronous work queues fully caught up at 19:39 UTC.
Impact: From 14:09 UTC to 15:01 UTC, users were unable to access Asana through the web, mobile, or API. From 15:01 UTC to 18:22 UTC users could access Asana through the web and mobile, but experienced sporadic unavailability. Access to the API was restored gradually between 15:15 UTC and 15:58 UTC. From 15:01 UTC to 19:39 UTC users experienced delayed automations, background actions, webhooks, and event streams. No customer data was lost.
Moving forward: We have reverted the change that caused the increased connections. We are adding monitoring to detect changes in database connection rates, so we can identify potential problems earlier. Beyond this, we will be replicating data from the central database to other databases to reduce the impact of any single database being overloaded.