“Right Sizing means upgrading to latest-gen”
Corey makes the argument that upgrading an m3.2xlarge to a m5.2xlarge for a savings of 28% is the correct course of action. We have a user with > 30 m3.2xlarge instances whose CPU utilization is typically in the low digits, but which spikes to 60+% periodically. Whatever, workloads rarely crash because of insufficient CPU — they do, however, frequently crash because of insufficient memory. In this case, their memory utilization has never exceeded 50%.
Our optimizations, which account for this and other utilization requirements, indicate that the “best fit” for their workload is in fact an r5.large, which saves them ~75%. In this case, for their region, the calculation is:
- m3.2xlarge * 0.532000/hour * 730 hours/month * 30 = $11,650.80/month
- r5.large * 0.126000/hour * 730 hours/month * 30 = $2759.40
The approximate monthly difference is $8891.40/month
Now, these assume on-demand instances, and reserved instances can save you a substantial amount (29% in this case at $0.380 per instance/hour), but you’re locked in for at least a year and you’re still overpaying by 320%.
“An ‘awful lot of workloads are legacy’ -> Legacy workloads can’t be migrated”
So, this one’s a little harder to tackle just because “an awful lot” doesn’t correspond to a proportion, but let’s assume it means “100%” just to show how wrong this is according to the points he adduces:
“Older versions of operating systems don’t support the newer hypervisor.”
This one is super baffling. The hypervisor is a layer beneath the operating system, which means that, in a perfect world, an application running on a virtualized server should have no idea what hypervisor technology is actually being used. It’s not like a certain version of RedHat will only work on Xen and the moment you move to Nitro it jeffries up your operating system. Indeed, you can verify this by launching any version of any distro of Linux onto either Xen or Nitro.
Will applications need to be modified?
Most of the time, no. Some applications have relied on
undocumented behavior to detect they are running within
EC2 and they may require adjustment.
There may be some incompatibility in the network drivers, but it’s relatively easy to circumvent those — we rarely suggest from ENA-capable to ENA-disabled instance classes, but we can also install the drivers for you (and you can disable this form of modification)
Which brings me to his second point on this matter, viz.
“Workloads are ‘certified’ by either external vendors or internal divisions to run on certain versions of various bundled libraries.”
Given that we’ve just established that, in most cases, upgrading the hypervisor does not require an OS change, this point is moot.
“Upgrading is hard”
Unless you’re on a reserved instance, you can change instance type easily as follows:
- Log into console.aws.com
- Select the correct region
- Identify the correct instance type (cf. workload certification section)
- Set instance state to “stop” (CAUTION: SEE NOTE BELOW)
- Select “change instance type”
- Select the new instance type
- Set instance state to “start”
- Log into https://sun.sunshower.io
- Discover your system
- Click “Optimize” on the given instance type
Yes, whatever is running on that instance must be halted, and you may have to restart processes (use USERDATA!), but here’s the thing:
You’re going to have to move, anyway. Whether it’s a hardware failure (uncommon, but not as uncommon as you might believe), or a nuts security vulnerability in your OS or hardware or whatever, no VM can run forever. If you stop it even once, you might as well start it up with the correct instance type.
NOTE: Instance-local storage (e.g. if the SKU comes with something like 2x80GB NVMe SSDs) is not guaranteed to be available through a stop/start cycle. Be careful and only use this hardware for ephemeral workloads that require the performance/data locality.
“You’ll never find the proper instance type for your workload”
Corey’s last point is his strongest. Between all SKUs across all regions, including reserved instances and RDS instances, there are over 250,000 SKUs altogether. There are also dozens of metrics that you’ll need to consider when comparing these workloads against a given SKU. At Sunshower.io, we acknowledge this and have removed this particular barrier. You can, in seconds, discover the optimal instance/workload alignment according to your application, over any time period.
So here you go, Corey — some nice, cool lemonade to go with that hottest of takes.