Incident: Asana uses several load balancers to route incoming requests to the correct backend service. As our volume of traffic increased over time, these load balancers consumed more resources to handle the additional requests. High traffic pushed computing resource utilization past a critical threshold leading to queueing of slow requests and eventual failures. This incident was made worse by certain error reporting mechanisms sending additional requests through the same load balancing infrastructure.
Impact: Between 15:08 and 16:50 UTC on June 17, 2024, requests to load the Asana application, use of the Asana mobile applications, and calls to our API were delayed or failed at an elevated rate. Users in our European, Australian, and Japanese data centers may have experienced additional crashes while using Asana.
Moving forward: In the short term, we have increased the capacity for our load balancing infrastructure to handle our current traffic as well as anticipated growth, and improved the monitoring to alert us before failure. In the longer term, we have efforts underway to replace this component entirely with routing and load balancing infrastructure that automatically scales in response to increases in traffic.
Our metric considers a weighted average of uptime experienced by users at each data center. The number of minutes of downtime shown reflects this weighted average.