Starting at 4:28pm UTC, we noticed a degradation in performance or downtime for some of our customers, depending on the datacenter.
At 4:35pm UTC, we decided to revert our last deployment, causing downtimes to end.
At 4:45pm UTC, the deployment revert was finished and performance was back to normal.
After investigation, we found out that the root cause of the issue was a change introduced in the latest deployment, causing our load balancers to stop routing traffic to the helpdesk. The issue was not caught during pre-deployment because of configuration differences between the deployment health checks and production checks.
For up to 20 minutes, some of our customers experienced degraded Helpdesk performance. For other customers, the Helpdesk was completely unavailable for up to 10 minutes.