Page Table Isolation Security Patching
Scheduled Maintenance Report for Aptible
Completed
This maintenance is complete. Apps and databases have been restarted across Enclave and are now running on instances protected against Meltdown.
Posted Jan 09, 2018 - 04:42 EST
Update
Our patching for Meltdown continues as planned. A majority of the databases in dedicated-tenancy environments have now been updated (all other resources have been updated already), and all customers whose databases have not been patched yet have been notified about upcoming restarts.

Once this patching window completes, we will assess the need to deploy additional mitigations to protect against the "Spectre" vulnerabilities that were announced concurrently with Meltdown. The Spectre vulnerabilities are more difficult to exploit than Meltdown (and less directly-applicable to Enclave), but at least two patch sets are being developed to harden Linux against those. Once these patches are incorporated into Linux, we will in all likelihood plan further maintenance windows to apply those patches. Additional app and databases may be necessary during those maintenance windows.
Posted Jan 08, 2018 - 12:29 EST
Update
The maintenance actions discussed in the previous status update are proceeding as scheduled, and we'd like to provide an update on the status of each item mentioned in that update:

> - We will start by updating Kernels across shared-tenancy (i.e. non-production) stacks.
This action item is complete. All Kernels on shared-tenancy stacks were updated as of 5:00pm EST (22:00 UTC). If you have a shared-tenancy Environment on Aptible Enclave, you'll see these restart/recovery operations in the Activity tab for each of your apps and databases (as well as in your Environment's Activity Reports).

> - We will then update Kernels across the most vulnerable instances on dedicated-tenancy (production) stacks.
We have begun the process of restarting app containers and SFTP containers on dedicated-tenancy stacks and will continue to do so through tomorrow. As discussed, we will also be restarting utility instances in each dedicated stack, between 11:00pm and 11:59pm tonight, January 4 (04:00-04:59 UTC). During this time, there will be a short window of downtime for outbound network connectivity, and `aptible ssh` sessions will temporarily be unavailable.

> - Finally, we will schedule restarts across database instances on dedicated-tenancy (production) stacks.
We have finalized a schedule for dedicated-tenancy database restarts, and have assigned a 4 hour time window for each dedicated stack, during which all databases on that stack will be restarted. If you have any dedicated databases, an Aptible team member will soon be reaching out to a member of your team (either your ops alert contact [if configured], or your billing contact) to let you know your scheduled maintenance window. If you have questions about these database restarts, just reply to us and we'll be happy to discuss.
Posted Jan 04, 2018 - 22:02 EST
In progress
We are starting to roll out Kernel updates in order to mitigate the "Meltdown" vulnerability (https://meltdownattack.com) across the instances hosting Enclave.

In order to optimize for both the security and availability of your resources deployed on Enclave, we will use the following approach:

- We will start by updating Kernels across shared-tenancy (i.e. non-production) stacks. By nature, these stacks are the most vulnerable, so we will proceed with these updates starting now. This will require a short downtime window (on the order of 30 to 45 seconds) for Databases deployed in shared-tenancy stacks, and will happen today. For apps deployed on shared-tenancy stacks, we will perform zero-downtime restarts. There will also be a brief interruption of service for `aptible ssh` sessions.
- We will then update Kernels across the most vulnerable instances on dedicated-tenancy (production) stacks. We'll be restarting apps (here again, with zero downtime) and SFTP databases (here again, with short downtime) across dedicated-tenancy stacks. We'll also need to restart various utility instances in these dedicated stacks (e.g. instances that perform Docker builds, etc.). These instances usually serve multiple purposes, including NAT. This will cause a short window of downtime for outbound network connectivity (however, inbound connections through Endpoints will continue to work).
- Finally, we will schedule restarts across database instances on dedicated-tenancy (production) stacks. These instances are less at-risk than app instances, and replacing them will cause downtime. In order to make it easier for our customers to plan around downtime, we will be reaching out this week via email to inform you of scheduled downtime windows for your production databases. We will do our best to accommodate requests to move downtime windows, when feasible.

Please feel free to reach out to Aptible Support if you have any questions.
Posted Jan 04, 2018 - 11:13 EST
Scheduled
A rumored major vulnerability affecting Intel CPUs is expected to be announced between 10pm UTC on January 5th (according to the Verge) and 12pm
UTC on January 4th (according to the Xen Project). This vulnerability is to be mitigated in the Linux Kernel through the "page-table isolation" (PTI) patch-set.

Since the vulnerability remains embargoed, we cannot say for certain what remediation actions we will have to take when it is finally announced. However, if the vulnerability is as bad as it is rumored to be, we may have to perform emergency patching across the fleet of instances hosting Enclave to safeguard the integrity of your resources and data.

As a customer, here is what you can expect the worst case to be. Keep in mind that nothing is certain at this point. We'll update our status page with exact action items as we learn more about this vulnerability:

- Applying Kernel updates will require rebooting the underlying instances hosting apps and database on Enclave. For apps,we can perform zero-downtime deploys to replace the underlying instances transparently. However, for databases, if we need to update the Kernel, there *will* be a short period of downtime (of about 30 seconds per database).

- Depending on the scope and ease of exploitation of this vulnerability, we may also opt to suspend interactive access to containers hosted on Enclave while we apply patches. This might mean suspending all operations across the platform, as well as access to SFTP databases (which provide interactive shells).

As mentioned earlier, not much is known for certain about this vulnerability at this time. Naturally, we'll keep our status page updated, and you should feel free to reach out to Aptible Support with any questions.

---

Here are a few relevant links to learn more about this vulnerability:

- https://www.theverge.com/2018/1/3/16844630/intel-processor-security-flaw-bug-kernel-windows-linux
- http://pythonsweetness.tumblr.com/post/169166980422/the-mysterious-case-of-the-linux-page-table
Posted Jan 03, 2018 - 17:05 EST