AWS has identified the root cause of the Endpoint unavailability:
Between 2:26 PM and 3:04 PM PDT(9:26PM ~ 10:04 PM UTC) we experienced increased packet loss for traffic destined to public endpoints in the US-EAST-1 Region, which affected Internet and public Direct Connect connectivity for endpoints in the US-EAST-1 Region.
This is, unfortunately, essentially the same impact we've seen in two previous incidents, although AWS's description of the cause is slightly different:
October 15th 2022: https://status.aptible.com/incidents/grf6gdrrszf9
Between 12:20 AM and 11:28 AM PDT, we experienced intermittent failures in Route53 Health Checks impacting Target Health evaluation in US-EAST-1. The issue has been resolved and the service is operating normally.
September 27th, 2021: (Only a couple of Endpoints were impacted, so no incident was created)
On September 27, 2021, between 8:45 AM and 2:09 PM PDT, Route53 experienced increased change propagation times for Health Check edits where unexpected failover to their secondary application load balancer (ALB) occurred despite their primary ALB targets being healthy. The issue has been resolved and the service is operating normally.
While AWS describes these incidents as "increased change propagation times", "intermittent failures", and "increased packet loss", and apparently do not qualify as an incident to be posted to https://status.aws.amazon.com, the observed impact to our customers is very clear: the impacted Endpoints are totally unreachable for a period.
As such, we will permanently implement the "temporary" change we made on October 15th: we will be disabling the Route53 health checks (and the associated custom error page) for all Endpoints, as this has been the root cause of these availability incidents.
As we indicated to customers during the Oct 15th and Nov 3rd incidents, you may restart any App in order to immediately disable the Route53 health check. Any App which has been deployed, restarted, or scaled since October 15th will already have it disabled, and we will make another announcement when we intend to disable it globally on all Apps for which it remains enabled.