Happy Fourth of July!
In case you were wondering, this isn’t just another Independence Day blog post talking about the Sunshower platform and how it will bring you freedom, blah blah blah. Rather, this is a blog post emphasizing that the ideals that led to the American Revolution, both within Great Britain and the colonies itself, are alive and well in the American startup culture.
What’s a Cloud Management Platform? (Part 2: Cloud Optimization Edition)
Two weeks ago, we talked about some of the ways that a Cloud Management Platforms (CMP) helps users relieve the headaches associated with DIY cloud resource management. This week, we’ll look at a few more compelling reasons to use a Cloud Management Platform like Sunshower.io for your cloud optimization and cloud resource management.
What’s a Cloud Management Platform? (And Why Do You Need One?) Part 1 of 2
Our official tagline at Sunshower.io is “beautifully simple cloud management and optimization.” But why do you need a Cloud Management Platform like Sunshower.io? When you work with a Cloud Service Provider (CSP) like AWS or Azure, doesn’t the CSP do the cloud optimization for you? Isn’t it the CSP’s job to make sure what you’re running in the cloud is rightsized, your applications are easy to view and manage, and that you’re getting the best possible value for your money? That’s what you’re paying them for, right?
In a word? Nope. That’s all on you, bub.
I like Corey Quinn — his newsletter and blog make some good points, but his recent post, Right Sizing Your Instances Is Nonsense, is a little off base. I encourage you to read it in its entirety.
“Right Sizing means upgrading to latest-gen”
Corey makes the argument that upgrading an m3.2xlarge to a m5.2xlarge for a savings of 28% is the correct course of action. We have a user with > 30 m3.2xlarge instances whose CPU utilization is typically in the low digits, but which spikes to 60+% periodically. Whatever, workloads rarely crash because of insufficient CPU — they do, however, frequently crash because of insufficient memory. In this case, their memory utilization has never exceeded 50%.
Our optimizations, which account for this and other utilization requirements, indicate that the “best fit” for their workload is in fact an r5.large, which saves them ~75%. In this case, for their region, the calculation is:
- m3.2xlarge * 0.532000/hour * 730 hours/month * 30 = $11,650.80/month
- r5.large * 0.126000/hour * 730 hours/month * 30 = $2759.40
The approximate monthly difference is $8891.40/month
Now, these assume on-demand instances, and reserved instances can save you a substantial amount (29% in this case at $0.380 per instance/hour), but you’re locked in for at least a year and you’re still overpaying by 320%.
“An ‘awful lot of workloads are legacy’ -> Legacy workloads can’t be migrated”
So, this one’s a little harder to tackle just because “an awful lot” doesn’t correspond to a proportion, but let’s assume it means “100%” just to show how wrong this is according to the points he adduces:
If you’ve heard of cloud computing at all, you’ve heard of Amazon Web Services (AWS), Microsoft Azure and Google Cloud. Between the three of them, they’ll be raking in over $50 billion in 2019. If you’re on the cloud, chances are good you’re using at least one of them.
The latest RightScale State of the Cloud Report pegs AWS adoption at 61%, Azure at 52% and Google Cloud at 19% (see the purple above). What’s more, almost all respondents (as denoted in blue) were experimenting with or planned to use one of the top three clouds. Which, if you math that up, means that 84% of respondents are going to be using AWS at some point, 77% will be using Azure and 55% will be using Google Cloud.
Multi-cloud strategies are definitively A Thing, contrary to some folks’ opinions and the overwhelming one-cloud-to-rule-them-all desire of AWS. So it’s worth comparing them. On a broad level, AWS rocks and rolls with capabilities set to lock you into their cloud, while Azure’s great for enterprises and Google Cloud’s your go-to if you want to do AI. But, as with all things, there’s more to it than that, and it’s not just where you can get the best cloud credit deals.
You wouldn’t think that the primary issue with optimizing cloud computing workloads would be getting good data. Figuring out math problems (hello, integer-constrained programming) worthy of a dissertation, sure. Writing a distributed virtual machine, maybe. Getting good data about a workload to run against good data about what the viable machines to put it on are? Not so much.
Well, you would be wrong. While the majority of the IP is in said math problems, the majority of the WORK is in the data — getting it and cleaning it up. And the data problem alone is enough to make you realize why everyone just picks an instance size and rolls with it until it doesn’t work anymore.
Last week we started the work to expand our platform from AWS-only to Azure. One of the first steps to that is what we call a “catalog”: a listing of all the possible virtual machine sizes across all possible regions with all of their pricing information (because, of course, pricing and availability vary). You would hope that this sort of catalog would be readily accessible from a cloud service provider (CSP). At the moment, the state-of-the-art is the work of many open-source contributors working together to scrape different CSP sets of documentation.
For AWS, we love ec2instances.info for this information, though we still had to get all of the region information in less savory ways. Different folks have attempted to do similar things for Azure, but Azure doesn’t make it easy. Pricing is different across Linux and Windows, because of course it is, but the information they give you when trying to look at pricing is missing some bits:
When two random things fit together perfectly, it creates a special kind of magic — Like stumbling across a way to bring order to the chaos of everyday life. Maybe that’s what makes these 22 photos so satisfying?
1) Cat crammed into a box
2) A pill and a ruler
Reserved Instances are an enormous investment.
At first glance, that statement might seem counter-intuitive. Reserved Instances (RIs) are widely advertised as the best way to save big on your Amazon Web Service (AWS) cloud compute bill. And in many cases, they are. With Reserved Instances, companies commit to long-term usage by agreeing to rent virtual machines for a set amount of time (typically 1 to 3 years) in exchange for a significantly lower rate than on-demand pricing. When viewed through this lens, they appear to be a vital part of an AWS cost management strategy.
Take Amazon EC2 as an example. When compared to on-demand pricing, Amazon EC2 RIs offer customers potentially deep discounts — sometimes as much as 75%, per their marketing. While reserving cloud capacity in advance seems like the smart thing to do because it has the potential to deliver a significant amount of savings, the savings promised by RIs often have a dangerous downside — and any missteps can have substantial costs for your company.
The calculations involved in deciding which RI to purchase can be frustratingly complicated. One year or 3 year contract? What about tenancy? Instance size? Region and zone? New or from the marketplace? And don’t forget about the nuance of offering class — do you want your RI standard, convertible or scheduled?
These calculations are difficult, but absolutely vital when committing to a RI. Rather than signing a contract for exactly what you have now (in terms of size, region, and tenancy) and guessing at the length of time that will fit your instance, it’s essential to understand the exact shape of your usage needs. Without that kind of granular insight into your workload, it’s impossible to choose a RI that will be the right fit six months from now, let alone three years in the future.
In the end, many companies buy RI capacity that ends up exceeding their actual needs, because they’re already using capacity that exceeds their needs. Unfortunately, committing to more capacity than you actually need can be very costly over the length of a RI contract. When that happens, the long-term return on investment (ROI) ultimately evaporates.
We frequently get asked what makes our AWS cost optimization so good. AWS cost management feels like it should be easy, and we talk to a lot of folks who think they’ve done a good job of it. The fact is, we’ve yet to see anyone who’s not wasting at least 40% of their EC2 bill. Let’s walk through it on our platform, and it’ll make sense why.
Fitting an Instance
It all starts with knowing what you’re actually using, resource-wise. Figuring this out as a human is surprisingly hard. For Sunshower, we look at the past month of a virtual machine’s life (if we have it — that’s our default) and sample every minute (by default, but it’s adjustable). After smoothing the data, that’s how we discover, in this case, 1 CPU (of the 8 they’re paying for) and 10G RAM (of the 30 they’re paying for) are actually being used.
In the screenshots below, you can see the resulting “shape” of the workload on the virtual machine. First, on the left: current vs utilized. The grey is what they’re currently paying for, and the purple is what they’re utilizing. Frankly, it LOOKS like a pretty good fit.
To compare, let’s look at the screenshot on the right: optimized vs utilized. There’s our purple triangle of utilization again. This time, you’ll see the optimized fit we found in blue. Even though the blue section looks a lot bigger, it actually reflects a substantial cost savings over the original, grey fit on the left.
How is that possible? The thing you’re really paying for, in most machines, is CPU and Memory. So, the closer a fit you can get on those, the better. In the image on the left, you can see that the majority of the overprovisioning is taking place in the most expensive areas of cloud spend: CPU and memory. Tightening that fit up in CPU and memory, like you see represented in blue in the image on the right, might look like an incremental change from the image on the left, but in reality it adds up.