Delay in receiving webhooks, chat messages
Incident Report for Gorgias
Postmortem

On November 26th the service in charge of receiving webhooks from our 3rd party integrations experienced an elevated error rate, from UTC 15:24 to 15:38 and UTC 17:34 and 17:58. This elevated error rate was due to a much higher than anticipated traffic related to the Black Friday event, which overloaded this service.

When we managed to fix the elevated error rate, our 3rd party integrations sent us again the messages that couldn’t be delivered, which created pressure to the services processing these messages. This pressure was mostly noticeable in the form of delayed evaluation for the Rules, which in turns delayed the display and sending of messages, including Chat messages.

This delay affected one of our datacenter (Gorgias operates several datacenters), roughly a quarter of all our customers. This delay in processing messages was measured to be on average 2min20sec from UTC 16:00 to UTC UTC 17:38 and 1min15sec from UTC 18.08 and 18.34, with delay brought to 0 between these 2 intervals.

‌Remediation

  • We are currently working on increasing the capacity of the system receiving webhooks, and more especially its ability to withstand acute bursts of webhooks

We are very sorry for the inconvenience and we apologize to our customers impacted by this incident.

Posted Dec 08, 2021 - 13:50 PST

Resolved
This incident has been resolved.
Posted Nov 26, 2021 - 10:37 PST
Update
We are continuing to investigate this issue.
Posted Nov 26, 2021 - 10:02 PST
Investigating
We are currently investigating this issue.
Posted Nov 26, 2021 - 09:53 PST
This incident affected: Helpdesk Clusters (us-east1-635c, us-east4-65cd, us-east1-2607, aus-southeast1-fcb9, europe-west3-86c1, us-central1-d8ff) and Helpdesk Integrations (Live Chat).