Opinions around cloud-agnosticism vary, but generally it is considered a truism that for cloud-agnostic deployments, infrastructure-as-code (IaC) is the best, perhaps the only way to achieve them. Cloud-agnostic deployments are obviously not a goal for every organization, but we’ve found that as organizations become more sophisticated it frequently becomes a need. This is most often due to customers’ on-premise needs or their own hybrid-cloud requirements. Yet despite their sophistication, these organizations almost universally struggle to implement cloud-agnostic deployments, even with infrastructure-as-code tools.
Tools like Terraform and Kubernetes are the most often used for cloud-agnostic infrastructure-as-code, and while we love and use them as much as the next DevOps nerd, we’ve found that this solution can run into a variety of problems. Broadly, in our experience, these cloud deployment and management problems fall into the following categories:
Visibility (what infrastructure am I renting, and where?)
Infrastructure design and architecture
Cost analysis and optimization, especially as it pertains to infrastructure utilization
Parameterization (secrets, credentials, etc.)
There aren’t very many cloud management tools that capture all of the categories. Those that exist do have issues with “common denominator” problems — that is, concerns around what functionality remains consistent across clouds. We find that folks overestimate the degree to which common denominator problems exist and how problematic they actually are. While the names and APIs for services vary wildly across cloud platforms, once you get past the syntax, they’re functionally the same. Indeed, we find that abstracting out sensible defaults in our abstraction layer solves most of those problems. Usually, the problems arise when mapping individual capability concepts — e.g. porting function-as-a-service-based code from AWS Lambda to another cloud. Even more typically, it comes from providing a common view of infrastructure elements between clouds, tenants, regions, etc.
We’ve sought to alleviate these problems with Troposphere, our drag-and-drop orchestration engine that can take whatever your current infrastructure-as-code solution is and make it cloud-agnostic. Over the coming posts, we’ll address the issues we’ve seen in implementing cloud-agnostic infrastructure-as-code solutions, and how to address them.
Like most folks coming into this new year and new decade, we’ve been reflecting on the past. Sunshower.io is far less than a decade old (we turn two in March!), but even a year is a long time in the life of a small business. Despite our small size, it was a banner year.
First Six Months
The first six months of the year were largely occupied by releasing the Sunshower platform to the public. Our web application plays host to Stratosphere, allowing you to visualize your AWS EC2 infrastructure around the globe, and Anvil, allowing you to save 66% on your AWS EC2 bill with right-sizing.
We also spent a lot of the first half of the year doing the “startup circuit” — taking second in a startup pitch competition, doing lots of networking events with investors, and interviewing at Y Combinator.
Last Six Months
From there, we abruptly got pulled into the land of government contracting. A large defense contractor contacted us about doing a white paper together for the Air Force, which we submitted in July. We spent August waiting to hear if we we would receive an RFP, and September writing the proposal. Our contacts at the Air Force think our offering is incredibly valuable, and last we heard, the proposal was in the technology evaluation phase. Interested in what we pitched them? Download our marketing white paper.
This fall, we’ve renewed our commitment to open-source software. Most notably, we refactored our plugin framework into Zephyr, so other organizations can reap the benefit of a non-OSGi system for lifecycle and dependency management. We also renewed our work on Aire, a UI framework built on Aurelia and UIKit.
Last but not least, we finished out the year with a bit of a rebrand. We have a new logo and doubled down on our bright colors. We also changed our tagline to reflect our movement away from optimization and towards application and infrastructure management and deployment.
In case you were wondering, this isn’t just another Independence Day blog post talking about the Sunshower platform and how it will bring you freedom, blah blah blah. Rather, this is a blog post emphasizing that the ideals that led to the American Revolution, both within Great Britain and the colonies itself, are alive and well in the American startup culture.
What’s a Cloud Management Platform? (Part 2: Cloud Optimization Edition)
Two weeks ago, we talked about some of the ways that a Cloud Management Platforms (CMP) helps users relieve the headaches associated with DIY cloud resource management. This week, we’ll look at a few more compelling reasons to use a Cloud Management Platform like Sunshower.io for your cloud optimization and cloud resource management.
What’s a Cloud Management Platform? (And Why Do You Need One?) Part 1 of 2
Our official tagline at Sunshower.io is “beautifully simple cloud management and optimization.” But why do you need a Cloud Management Platform like Sunshower.io? When you work with a Cloud Service Provider (CSP) like AWS or Azure, doesn’t the CSP do the cloud optimization for you? Isn’t it the CSP’s job to make sure what you’re running in the cloud is rightsized, your applications are easy to view and manage, and that you’re getting the best possible value for your money? That’s what you’re paying them for, right?
Corey makes the argument that upgrading an m3.2xlarge to a m5.2xlarge for a savings of 28% is the correct course of action. We have a user with > 30 m3.2xlarge instances whose CPU utilization is typically in the low digits, but which spikes to 60+% periodically. Whatever, workloads rarely crash because of insufficient CPU — they do, however, frequently crash because of insufficient memory. In this case, their memory utilization has never exceeded 50%.
Our optimizations, which account for this and other utilization requirements, indicate that the “best fit” for their workload is in fact an r5.large, which saves them ~75%. In this case, for their region, the calculation is:
The approximate monthly difference is $8891.40/month
Now, these assume on-demand instances, and reserved instances can save you a substantial amount (29% in this case at $0.380 per instance/hour), but you’re locked in for at least a year and you’re still overpaying by 320%.
“An ‘awful lot of workloads are legacy’ -> Legacy workloads can’t be migrated”
So, this one’s a little harder to tackle just because “an awful lot” doesn’t correspond to a proportion, but let’s assume it means “100%” just to show how wrong this is according to the points he adduces:
If you’ve heard of cloud computing at all, you’ve heard of Amazon Web Services (AWS), Microsoft Azure and Google Cloud. Between the three of them, they’ll be raking in over $50 billion in 2019. If you’re on the cloud, chances are good you’re using at least one of them.
The latest RightScale State of the Cloud Report pegs AWS adoption at 61%, Azure at 52% and Google Cloud at 19% (see the purple above). What’s more, almost all respondents (as denoted in blue) were experimenting with or planned to use one of the top three clouds. Which, if you math that up, means that 84% of respondents are going to be using AWS at some point, 77% will be using Azure and 55% will be using Google Cloud.
Multi-cloud strategies are definitively A Thing, contrary to some folks’ opinions and the overwhelming one-cloud-to-rule-them-all desire of AWS. So it’s worth comparing them. On a broad level, AWS rocks and rolls with capabilities set to lock you into their cloud, while Azure’s great for enterprises and Google Cloud’s your go-to if you want to do AI. But, as with all things, there’s more to it than that, and it’s not just where you can get the best cloud credit deals.
You wouldn’t think that the primary issue with optimizing cloud computing workloads would be getting good data. Figuring out math problems (hello, integer-constrained programming) worthy of a dissertation, sure. Writing a distributed virtual machine, maybe. Getting good data about a workload to run against good data about what the viable machines to put it on are? Not so much.
Well, you would be wrong. While the majority of the IP is in said math problems, the majority of the WORK is in the data — getting it and cleaning it up. And the data problem alone is enough to make you realize why everyone just picks an instance size and rolls with it until it doesn’t work anymore.
Last week we started the work to expand our platform from AWS-only to Azure. One of the first steps to that is what we call a “catalog”: a listing of all the possible virtual machine sizes across all possible regions with all of their pricing information (because, of course, pricing and availability vary). You would hope that this sort of catalog would be readily accessible from a cloud service provider (CSP). At the moment, the state-of-the-art is the work of many open-source contributors working together to scrape different CSP sets of documentation.
For AWS, we love ec2instances.info for this information, though we still had to get all of the region information in less savory ways. Different folks have attempted to do similar things for Azure, but Azure doesn’t make it easy. Pricing is different across Linux and Windows, because of course it is, but the information they give you when trying to look at pricing is missing some bits:
When two random things fit together perfectly, it creates a special kind of magic — Like stumbling across a way to bring order to the chaos of everyday life. Maybe that’s what makes these 22 photos so satisfying?