Elevated error rates

Incident Report for Asana

Postmortem

Incident: Asana uses several load balancers to route incoming requests to the correct backend service. As our volume of traffic increased over time, these load balancers consumed more resources to handle the additional requests. High traffic pushed computing resource utilization past a critical threshold leading to queueing of slow requests and eventual failures. This incident was made worse by certain error reporting mechanisms sending additional requests through the same load balancing infrastructure.

Impact: Between 15:08 and 16:50 UTC on June 17, 2024, requests to load the Asana application, use of the Asana mobile applications, and calls to our API were delayed or failed at an elevated rate. Users in our European, Australian, and Japanese data centers may have experienced additional crashes while using Asana.

Moving forward: In the short term, we have increased the capacity for our load balancing infrastructure to handle our current traffic as well as anticipated growth, and improved the monitoring to alert us before failure. In the longer term, we have efforts underway to replace this component entirely with routing and load balancing infrastructure that automatically scales in response to increases in traffic.

Our metric considers a weighted average of uptime experienced by users at each data center. The number of minutes of downtime shown reflects this weighted average.

Posted 8 months ago. Jun 19, 2024 - 15:46 UTC

Resolved

This incident has been resolved.
Posted 8 months ago. Jun 17, 2024 - 19:49 UTC

Monitoring

A fix has been implemented and we are monitoring the results.
Posted 8 months ago. Jun 17, 2024 - 16:52 UTC

Investigating

We're currently experiencing some difficulties; as a result, a majority of our users are unable to access Asana. Our Development Team is currently working to resolve this issue as soon as possible. Sincere apologies for the inconvenience caused, please keep an eye on this page for the latest updates.
Posted 8 months ago. Jun 17, 2024 - 16:44 UTC
This incident affected: US (App, API), EU (App, API), Japan (App, API), and Australia (App, API).