Everything is a Data Problem

You wouldn’t think that the primary issue with optimizing cloud computing workloads would be getting good data. Figuring out math problems (hello, integer-constrained programming) worthy of a dissertation, sure. Writing a distributed virtual machine, maybe. Getting good data about a workload to run against good data about what the viable machines to put it on are? Not so much.

Well, you would be wrong. While the majority of the IP is in said math problems, the majority of the WORK is in the data — getting it and cleaning it up. And the data problem alone is enough to make you realize why everyone just picks an instance size and rolls with it until it doesn’t work anymore.

Last week we started the work to expand our platform from AWS-only to Azure. One of the first steps to that is what we call a “catalog”: a listing of all the possible virtual machine sizes across all possible regions with all of their pricing information (because, of course, pricing and availability vary). You would hope that this sort of catalog would be readily accessible from a cloud service provider (CSP). At the moment, the state-of-the-art is the work of many open-source contributors working together to scrape different CSP sets of documentation.

For AWS, we love ec2instances.info for this information, though we still had to get all of the region information in less savory ways. Different folks have attempted to do similar things for Azure, but Azure doesn’t make it easy. Pricing is different across Linux and Windows, because of course it is, but the information they give you when trying to look at pricing is missing some bits:

Screenshot comparing B-Series instances on Azure

That’s right, you get vCPU, RAM and storage. No notion of IOPS or networking, which might be enough for some folks, but we think you deserve better. But hey, maybe we’ll add the B1S to our estimate and see what that looks like?

Azure B1-Series estimator screenshot

I mean, I guess that’s better? Per some definition of those words? The hidden $50 for storage transaction units makes me want to die a little, though. Pretty impressive how less than two cents an hour can balloon so fast, isn’t it? Notably we’re still not getting information beyond vCPU, RAM and disk space.

So, how do we get that? We go spelunking through the Azure docs. Again, it get split into Linux vs Windows, though as far as I’ve been able to tell thus far, they are wholly the same. Digging into them for the B-series, we finally start to get something meaty!

B-series screenshot from Azure docs

Behold! IOPS. No information on networking, though, beyond the number of NICs (network interface controllers). Well, that’s a bummer. Is that the case for all of the machines? Ha, no — just scroll down to the D-Series.

D-series screenshot from Microsoft Azure docs

This is where we run into trouble for our page-scripting heroes. Microsoft Azure sort-of-kind-of provides the same information about each of its instance families, but not universally, leaving you to extrapolate expected network bandwidth and so much more.

All that being what it is, we’d like to introduce you to our Azure catalog and the tools to generate it, and to encourage you to fill in any information you can fill in. And soon we’ll be introducing you to our Azure offering itself … after we fix the problem of getting data from Azure Monitor, of course. 😂


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: