On November 26th the service in charge of receiving webhooks from our 3rd party integrations experienced an elevated error rate, from UTC 15:24 to 15:38 and UTC 17:34 and 17:58. This elevated error rate was due to a much higher than anticipated traffic related to the Black Friday event, which overloaded this service.
When we managed to fix the elevated error rate, our 3rd party integrations sent us again the messages that couldn’t be delivered, which created pressure to the services processing these messages. This pressure was mostly noticeable in the form of delayed evaluation for the Rules, which in turns delayed the display and sending of messages, including Chat messages.
This delay affected one of our datacenter (Gorgias operates several datacenters), roughly a quarter of all our customers. This delay in processing messages was measured to be on average 2min20sec from UTC 16:00 to UTC UTC 17:38 and 1min15sec from UTC 18.08 and 18.34, with delay brought to 0 between these 2 intervals.
Remediation
We are very sorry for the inconvenience and we apologize to our customers impacted by this incident.