Increase in latency and error rate

Incident Report for Gorgias

Postmortem

In our efforts to continuously improve our product we deployed a change that had an unforeseen negative performance impact. Unfortunately, this change caused downtime in one of our European clusters.

The length of the outage was due to the fact the newly introduced feature has been deployed multiple hours earlier, making it harder to pinpoint as the root cause of the issue affecting only one cluster.

After we found the root cause we reverted the change and the main database required some time to recover.

We sincerely apologize for the inconvenience. To avoid this kind of situation in the future, we will rely more and more on progressive rollouts and performance testing when releasing new features.

Posted 3 years ago. Sep 02, 2022 - 19:54 UTC

Resolved

This incident has been resolved.
Posted 3 years ago. Aug 30, 2022 - 23:53 UTC

Monitoring

We have resolved the incident and all delayed messages are caught up, we are continuing to monitor the situation.
Posted 3 years ago. Aug 30, 2022 - 19:34 UTC

Update

We are continuing to investigate this issue.
Posted 3 years ago. Aug 30, 2022 - 15:31 UTC

Update

We are still investigating this issue. The main database is much slower than usual, slowing the whole platform down.
Posted 3 years ago. Aug 30, 2022 - 13:52 UTC

Investigating

We are currently investigating this issue.
Posted 3 years ago. Aug 30, 2022 - 11:45 UTC
This incident affected: Helpdesk (REST API, Web App, Mobile Apps), Helpdesk Integrations (Email, Gmail, Shopify integration), and Helpdesk Clusters (us-east1-2607, europe-west3-86c1, us-central1-d8ff).