On May 13th at 01:24 UTC, during verification of a Teleport Cloud platform release, regional proxy services for a subset of tenants lost connectivity with the Teleport Auth service. Connectivity was restored by restarting Auth services for those tenants.
On May 15th at 15:00 UTC, a review of the incident found additional tenants experiencing similar symptoms. The issue was traced to an internal cloud component responsible for caching Teleport Auth service IP addresses to facilitate multi-region connectivity. Restarting Auth services for the impacted tenants refreshed the IP cache allowing regional proxy services to connect.
Further diagnosis produced a set of performance improvements for the IP address cache component. These action items are in progress and scheduled for release in the coming weeks.
Due to the cloud platform running Teleport in high availability mode, connectivity for the majority of tenants and users remained stable because at least one Proxy service was healthy in an impacted region.