Increase in latency and error rate

Incident Report for Gorgias

Postmortem

In our efforts to continuously improve our product we deployed a change that had an unforeseen negative performance impact. Unfortunately, this change caused downtime in one of our European clusters.

The length of the outage was due to the fact the newly introduced feature has been deployed multiple hours earlier, making it harder to pinpoint as the root cause of the issue affecting only one cluster.

After we found the root cause we reverted the change and the main database required some time to recover.

We sincerely apologize for the inconvenience. To avoid this kind of situation in the future, we will rely more and more on progressive rollouts and performance testing when releasing new features.

Posted Sep 02, 2022 - 19:54 UTC

Resolved

This incident has been resolved.

Posted Aug 30, 2022 - 23:53 UTC

Monitoring

We have resolved the incident and all delayed messages are caught up, we are continuing to monitor the situation.

Posted Aug 30, 2022 - 19:34 UTC

Update

We are continuing to investigate this issue.

Posted Aug 30, 2022 - 15:31 UTC

Update

We are still investigating this issue. The main database is much slower than usual, slowing the whole platform down.

Posted Aug 30, 2022 - 13:52 UTC

Investigating

We are currently investigating this issue.

Posted Aug 30, 2022 - 11:45 UTC

This incident affected: Helpdesk (REST API, Web App, Mobile Apps), Helpdesk Integrations (Email, Gmail, Shopify integration), and Helpdesk Clusters (us-east1-2607, europe-west3-86c1, us-central1-d8ff).