Incident: A scheduled, manual deployment led to abnormally high load on our reactive data loading layer, overloading it. Application level caches were slow to update, and a subset of data returned to customers was stale.
Impact: Although application writes continued to be successful, a subset of customers experienced delays in viewing their changes to the Asana webapp from 2022-12-21 00:27 - 01:52 UTC. By 02:20 UTC, tabs that were loaded or refreshed no longer contained stale data. By 03:10 UTC, all users were no longer seeing stale data.
About 2% of customer API requests also experienced timeouts until 01:52 UTC. Affected API customers would have seen persistently failing API read requests.
Moving forward: We have identified operational changes to reduce the likelihood of incidents and decrease the time to resolution.
Our metric considers a weighted average of uptime experienced by users at each data center. The number of minutes of downtime shown reflects this weighted average.