Starting at 1:18pm EST, some wifi Notecards could not connect to Notehub, causing a delay in event delivery. All cellular Notecards continued to operate normally. The issue was partially resolved starting at 2:05pm and was fully resolved at 2:23pm.
An unexpected networking issue arose during a one-time maintenance procedure. We had tested this procedure in multiple pre-production environments and, the day before, had successfully performed the procedure on several production servers. An unexpected hosting provider issue, which arose during the rollout to the rest of production servers, caused a small number of servers to not respond to events from wifi notecards. Our monitoring detected this issue as soon as it occurred and we were able to prevent any further impact while we addressed the root cause.
Because of each Notecard’s resilient event caching architecture, no events were lost, just delayed. We were able to safely complete the maintenance procedure later in the day without impact.