Building a Home Cloud with Proxmox: DNS + Terraform


In our last post, we configured a Ceph storage cluster, which we’ll be using as the storage for our virtual machines that we’ll be using to host Kubernetes.

Before we get to that, however, we need to configure our DNS environment. Recall that, in part 2, we configured a Windows server to act as our DNS and DHCP server. For this, you don’t need a Windows Server installation, you just need an RFC 2136-compliant DNS server like PowerDNS. Keep in mind that RFC 2136 has some security vulnerabilities, so don’t use dynamic DNS in a high-security environment or an environment wherein untrusted devices may be on the network.

To provision our DNS names, we’ll be interacting with our DNS server via the Terraform DNS provider. Terraform’s easy to set up, so we won’t be covering that.

Note that this solution requires a fair amount of work in Bash. Other solutions like Chef/Puppet/Ansible/Salt may strictly be better-suited for this sort of work, but since our environment is quite homogeneous (99% Debian), we’ll just script this using basic tools.

A note on Terraform Typing

Terraform’s variable/parameter management system, specifically its inability to share variables between modules can result in some duplication between modules. This isn’t the cleanest, but I don’t feel like it’s a big enough problem to warrant bringing in other tools like Terragrunt. Terraform’s type-system supports a flavor of structural typing wherein you can declare a variable with a subset of another variable’s fields and supply the first variable as the second variable. For instance:

variable "A" {
type = object({
name = string
ip = string
variable "B" {
type = object({
name = string
view raw hosted with ❤ by GitHub


   // module B declares B, and can use a value of type A as the value for b:B
    b = var.a

We’ll be using this feature to share variable values across modules.


Since we’re reserving the upper range of our /24 subnet for MetalLB dynamically-provisioned IP addresses, we’ll configure the worker nodes with static IP addresses assigned from the lower range of our subnet. Since we have a small number of physical machines, we’ll opt for fewer, larger worker-nodes instead of more, smaller worker-nodes. The subnet-ranges we’ll select for these roles are as follows:

  1. 2-10: infrastructure (domain-controller, physical node IPs, etcd VMs, Kubernetes leader nodes)
  2. 10-30: static addresses (worker nodes)
  3. 30-200: DHCP leases for dynamic devices on the network + MetalLB IPs
  4. 200-255: available

To set that up in Windows Server:

  1. Navigate to the Server Manager
  2. From the top-right Tools menu, select DHCP
  3. Right-click on the IPv4 node under your domain-controller host-name and select New Scope
  4. Provide a Start address of <subnet prefix>.30 and an end-address of 200
  5. Continue through the wizard providing your gateway addresses and your exclusions. For the exclusions, select <subnet prefix>-1 through 30.
  6. Finish the wizard and click Create to create your scope.
  7. Right-click on the created scope node in the tree. Select the DNS tab, then enable DNS dynamic updates. Select the Dynamically update DNS records only if requested by the DHCP clients.
  8. Click Ok to save your changes.

Configure your DD-WRT Router

Navigate back to your gateway router. Most capable routers will have an option to either act as a DHCP server (the default), or to act as a DHCP forwarder. For DD-WRT, select the `DHCP Forwarder option and provide the IP address of your DHCP server as the target.

Configure your DNS Server

Go back to your domain controller.

  1. From the Server Manager, select DNS
  2. Locate the Forward Lookup Zone node in the DNS Manager tree that you created (ours was Right-click on the Forward Lookup Zone node and click Properties
  3. In the General Tab, select the Nonsecure and Secure option from the Dynamic updates configuration. Note: This is not suitable for an open-network or one admitting untrusted/unknown devices
  4. Click Ok to apply.

Congrats! We’re now ready to provision our DNS in Terraform.

Terraform Configuration

Note: We’ve provided our Terraform configurations here, so feel free to point them at your infrastructure if you’ve been following along instead of creating your own.

In the directory of your Terraform project ($TFDIR), create a directory called dns.
1. In your dns directory, create 3 files:,, and

Your file should contain:

terraform {
required_providers {
dns = {
source = "hashicorp/dns"
provision DNS entries for each host in the `hosts` list
resource "dns_a_record_set" "virtual_machine_dns" {
for_each = {for vm in var.hosts: => vm}
zone =
name =
addresses = [
provision the DNS A-record for the API server
resource "dns_a_record_set" "api_server_dns" {
addresses = [
zone =
name = var.api_dns
view raw hosted with ❤ by GitHub

Now, in your Terraform project $TFDIR (the parent of the dns directory you’re currently in, create,, and Symlink the dns/ to this directory as via ln -s $(pwd)/ $(pwd)/dns/ to make your DNS variables available to your root configuration.

Parent module

In your $TFDIR/, add the following block:

terraform {
required_version = ">=0.14"
required_providers {
dns = {
source = "hashicorp/dns"
version = "3.0.1"
view raw hosted with ❤ by GitHub

Configure the DNS provider to point to your RFC-2136-compliant DNS server:

provider "dns" {
update {
server = var.dns_server.server
view raw hosted with ❤ by GitHub

Call the dns submodule:

Create DNS entries for virtual machines
module "dns_configuration" {
for_each = var.cluster_nodes
source = "./dns"
dns_server = var.dns_server
hosts = each.value
api_dns = var.api_server
api_domain = var.domain
api_ip = var.load_balancer
view raw hosted with ❤ by GitHub

At this point, we should be ready to actually provide our Terraform variables.

  1. Create a file in your $TFDIR directory called–Terraform will automatically pick up any in the root directory according to the naming convention *.auto.tfvars. Be sure to never check these into source-control. I add the .gitignore rule **/*.tfvars

In your $TFDIR/ file, add the following configurations (substitute with your values):

dns_server = {
// replace with your zone-name (configured above)
zone = "" // note the trailing period here--it's mandatory
// replace with your DNS server's IP
server = ""
// this generates the DNS configuration for ``
api_dns = "kubernetes"
api_domain = ""
api_ip = "" // provide the API you

Finally, we’ll add our Kubernetes cluster DNS configurations:

cluster_nodes = {
etcd_nodes = [
name = "etcd-1"
ip = ""
name = "etcd-2"
ip = ""
name = "etcd-3"
ip = ""
k8s_leaders = [
name = "k8s-leader-1"
ip = ""
name = "k8s-leader-2"
ip = ""
k8s_workers = [
name = "k8s-worker-1"
ip = ""
name = "k8s-worker-2"
ip = ""
name = "k8s-worker-3"
ip = ""

At this point, running terraform apply --auto-approve should quickly generate the DNS entries. You should navigate to your DNS server and refresh the forward lookup zone to confirm that your entries exist.


In this part, we configured DHCP, DNS, and provided a Terrform module that provisions DNS entries for each of the VMs that we’ll provision in subsequent posts. Next time, we’ll create a Terraform module that will provision virtual machines of the correct roles for our Kubernetes cluster. Our final post of the series will show how to actually deploy the Kubernetes cluster–stay tuned!

Building a Home Cloud with Proxmox Part 3: Configuring Ceph and Cloud-Init


Last time, we configured a Windows Server domain controller to handle DNS, DHCP, and ActiveDirectory (our LDAP implementation). In this post, we’ll be configuring the Ceph distributed filesystem which is supported out-of-the-box by Proxmox.

Ceph Storage Types

Ceph supports several storage types, but the types we’re interested in for our purposes are

  1. Filesystem
  2. Block

A Ceph filesystem behaves like a standard filesystem–it can contain files and directories. The advantage to a Ceph filesystem for our purposes is that we can access any files stored within it from any location in the cluster, whereas with local-storage (e.g. local-lvm or whatever), we can only access them on the node they reside upon.

Similarly, a Ceph block device is suitable for container and virtual machine hard-drives, and these drives can be accessed from anywhere in the cluster. This node-agnosticism is important for several reasons:

  1. We’re able to create base images for all our virtual machines and clone them to any node, from any node
  2. We’re able to use Ceph as a persistent volume provider in Kubernetes

In this scheme, your Ceph IO performance is likely to be bottlenecked by your intra-node network performance. Our 10Gbps network connections between nodes seems to perform reasonably well, but is noticeably slower than local storage (we may characterize this at some point, but this series isn’t it).

Drive Configuration

In our setup, we have 4, 4 TB drives on each node. Ceph seems to prefer identical hardware configurations on each node it’s deployed on, so if your nodes differ here, you may want to restrict your Ceph deployment to identical nodes.

Create your Ceph partitions on each node

Careful Ensure that the drives you’re installing Ceph onto aren’t being used by anything–this will erase everything on them. You can colocate Ceph partitions with other partitions on a drive, but that’s not the configuration we’re using here (and how well it performs depends on quite a few factors)

We’re going to use /dev/sdb for our Ceph filesystem on each node. To prepare each drive, open a console into its host and run the following:

  1. fdisk /dev/sdb
  2. d to delete all the partitions. If you have more than one, repeat this process until there are no partitions left.
  3. n to create a new partition
    1 g to create a new GPT partition table
  4. w to write the changes

Your disks are now ready for Ceph!

Install Ceph

On each node, navigate to the left-hand configuration panel, then click on the Ceph node. Initially, you’ll see a message indicating that Ceph is not installed. Select the advanced option and click Install to continue through the wizard.

When you’re presented with the options, ensure that the osd_pool_default_size and osd_pool_default_min_size configurations are set to the number of nodes that you intend to install Ceph on. While the min_size can be less than the pool_size, I have not had good luck with this configuration in my tests.

Once you’ve installed Ceph on each node, navigate to the Monitor node under the Ceph configuration node and create at least 1 monitor and at least 1 manager, depending on your resiliency requirements.

Create an object-storage daemon (OSD)

Our next step is to allocate Ceph the drives we previously provisioned (/dev/sdb). Navigate to the OSD node under the Ceph node, and click create OSD. In our configuration, we use the same disk for both the storage area and the Ceph database (DB) disk. Repeat this process for each node you want to participate in this storage cluster.

Create the Ceph pool

We’re almost there! Navigate to the Pools node under the Ceph node and click create. You’ll be presented with some options, but the options important to us are:

  1. size: set to the number of OSDs you created earlier
  2. min-size: set to the number of OSDs you created earlier

Ceph recommends about 100 placement groups per OSD, but the number must be a power of 2, and the minimum is 8. More placement groups means better reliability in the presence of failures, but it also means replicating more data which may slow things down. Since we’re not managing dozens or hundreds of drives, I opted for fewer placement groups (16).

Click Create–you should now have a Ceph storage pool!

Create your Ceph Block Storage (RBD)

You should now be able to navigate up to the cluster level and click on the storage configuration node.

  1. Click Add and select RBD.
  2. Give it a memorable ID that’s also volume-friendly (lower case, no spaces, only alphanumeric + dashes). We chose ceph-block-storage
  3. Select the cephfs_data pool (or whatever you called it in the previous step)
  4. Select the monitor node you want to use
  5. Ensure that the Nodes value is All (no restrictions) unless you want to configure that.
  6. Click Add!

Congrats! You should now have a Ceph block storage pool!

Create your Ceph directory storage

In the same view (Datacenter > Storage):

  1. Click Add and select CephFS
  2. Give it a memorable ID (same rules as in the previous step), we called ours ceph-fs
  3. Ensure that the content is selected to all the available options (VZDump backup file, ISO image, Container Template, Snippets)
  4. Ensure the Use Proxmox VE managed hyper-converged cephFS option is selected

Click Add–you now have a location you can store VM templates/ISOs in that is accessible to every node participating in the CephFS pool! This is important because it radically simplifies configuration via Terraform, which we’ll be writing about in subsequent posts.

Create a Cloud-Init ready VM Template

We use Debian for virtually everything we do, but these steps should work for Ubuntu as well.

  1. Upload a Debian ISO (we use small installation image) to your CephFS storage
  2. Perform a minimal installation. The things you’ll want to pay attention to here are:
    • Ensure your domain-name is set to whatever domain-name you’ve configured on your domain-controller
    • Only install the following packages: SSH Server and Standard System Utilities
  3. Finish the Debian installation wizard and power on your VM. Make a note of the ID (probably 100 or something) I refer to as VM_ID in subsequent steps
  4. In the Proxmox console for your VM, perform the following steps:
    • apt-get update
    • apt-get install cloud-init net-tools sudo
  5. Once APT is finished installing those packages, edit your sshd_config file at /etc/ssh/sshd_config using your favorite editor (e.g. vi /etc/ssh/sshd_config) and ensure that PermitRootLogin is set to yes from its default of prohibit-password. These are all air-gapped in our environment, so I’m not too worried about the security ramifications of this. If you are, be sure to adjust subsequent configurations to use certificate-based auth.
  6. Save the file and shut down the virtual machine

Now, let’s enable CloudInit!

On any of your nodes joined to the CephFS filesystem:

  1. Open a console
  2. Configure a CloudInit drive:
    • Using the id for the Ceph RBD/block store (referred to as <BLOCK_STORE_ID> we configured above (ours is ceph-block-storage), and the VM_ID from the previous section, create a drive by entering qm set <VM_ID> --ide2 <BLOCK_STORE_ID>:cloudinit. For instance, in our configuration, with VM_ID = 100, this is qm set 100 --ide2 ceph-block-storage:cloudinit. This should complete without errors.
  3. Verify that your VM has a cloud-init drive by navigating to the VM, then selecting the Cloud-Init node. You should see some values there instead of a message indicating Cloud-Init isn’t installed.

You should also verify that this machine is installed on the correct block-storage by attempting to clone it to another node. If everything’s configured properly, cloning a VM to a different node should work seamlessly. At this point, you can convert your VM to a template for subsequent use.


In this post we installed and configured Ceph block and filestores, and also created a Cloud-Init capable VM template. At this point, we’re ready to begin configuring our HA Kubernetes cluster using Terraform from Hashicorp. Our Terraform files will be stored in our public devops Github repository

Building a Home Cloud with Proxmox Part 2: Domain Configuration


Last time I went over how to configure a router equipped with DD-WRT for managing a home cloud, as well as installing the awesome Proxmox virtualization environment. This time, we’ll go over how to configure a Windows domain controller to manage ActiveDirectory profiles, as well as DNS.

General configuration

The end goal is to provision a cluster that’s running a Windows domain controller (for DNS and ActiveDirectory), as well as a Kubernetes cluster. Once we have that we’ll deploy most of our services into the Kubernetes cluster, and we’ll be able to grant uniform access to every object via the domain controller.

Note that multitenancy isn’t a goal for this home cloud, although the domain controller is where we would add that. We could also partition our subnet into more subnets, providing functionality analogous to public cloud providers’ notions of Virtual Private Clouds.

Step 1: Create a Windows Server VM

  1. Ensure you have a suitable storage location:
    1. Open a shell to one of your nodes
    2. Type lsblk–it should show you a tree of your available hard drives and partitions
    3. If you don’t see a partition under your drives, type fdisk. In fdisk <drive>, where <drive> is the drive you want (probably not /dev/sda–that’s likely where Proxmox is installed), create a GPT partition by typing g. Write the changes by entering w
    4. Now you should be able to provision a VM on that drive/partition in the subsequent steps.
  2. Download the Windows Server ISO
  3. Upload the Windows Server ISO to Proxmox:
    1. The Windows server ISO is too large to upload via the GUI, so pick a node (e.g., and scp it up: scp <iso file> root@<node>:/var/lib/vz/template/iso, for example:scp windows_server_2019.iso
    2. Create a new VM
      • Select Windows as the type, the default version should be correct, but verify it’s something like 10/2019/
      • Give it at least 2 cores, 1 socket is fine. The disk should have at least 50GB available.
      • I provided mine with 16 GB RAM, but we have pretty big machines. 4 works fine, and you can change this later.
    3. One of the prompts will be to assign your server to a domain. You should enter the domain that you want to control (ours being

You should be able to access your Windows Server VM now, but it’s not quite ready.

Configure your Domain Controller’s DNS

The first thing we’ll want to do is ensure that our domain controller is at a deterministic location. For this, we’ll want to
1. Assign it a static DHCP lease in our router
1. Assign it a static IP locally
1. Assign it a DNS entry in our router

Assign the DHCP lease

  1. In the Windows VM, open an elevated console and run ipconfig /all. This will show all of your IP configurations, as well as the MAC addresses they’re associated with. Locate the entry for your subnet (ours is, and locate the associated MAC address (format AA:AA:AA:AA:AA:AA, where AA is a 2-letter alphanumeric string).
  2. Once you have your MAC address, log into your router, navigate to the Services/Services subtab, find the Static Leases section, and create an entry whose MAC address is the one you just obtained, and whose IP address is towards the low-end of your subnet (we selected Choose a memorable hostname (we went with and enter that, too.

Save and apply your changes.

Assign the static IP locally

  1. In your Windows VM, navigate to Control Panel > Network and Internet > Network Connections. You’ll probably only have one.
  2. Right-click on it and select Properties. Navigate to Internet Protocol Version 4 (TCP/IPv4)
  3. Select Use the following IP address
    • For IP Address, enter the IP you set in the previous section (Assign the DHCP lease)
    • For Subnet mask, (assuming a /24 network), enter You can look these up here
    • For Default gateway, select your router IP (ours is

For the DNS Server Entries, you can set anything you like (that is a valid DNS server). We like Google’s DNS servers (, Notice that we’re not actually setting our router to be a DNS server, even though it is configured to be one. That’s because we generally want the domain controller to be our DNS server.

Click Ok and restart just to be sure.

Configure your Domain Controller

Your Windows Server VM will need to function as a domain controller (and DNS server) for this to work. After you’ve created the Windows Server VM, log into it as an administrator. You should be presented with the Server Manager Dialog.

  1. From the top-right, click the manage button, select Add Roles and Features. This will bring you to the Roles & Features wizard.
  2. Select Role-based or feature-based installation. Click Next
  3. Select your server. Click Next
  4. Select the following roles
    • Active Directory Domain Services
    • Active Directory Lightweight Directory Services
    • Active Directory Rights Management Services -> Active Directory Rights Management Server
    • DHCP Server
    • DNS Server

Click Install–grab some coffee while Windows does its stuff. It’ll restart at least once.

Configure DNS

Almost done! We’re going to add all the entries that we had previously added in the router to the Domain Controller so that we only need to reference one DNS server: our domain controller. Finally, we’ll join a home computer to the domain.

Add the DNS records

  1. From the Server Manager window, select Tools and click DNS from the dropdown.
  2. Expand the Forward Lookup Zones node in the tree. You should see your domain under there, possibly along with other entries such as _msdcs.<your domain.
  3. Select <your domain>. Ours is
  4. Right-click <your domain>, add new host (A or AAAA)
  5. For each of your Proxmox nodes, add the name (e.g. athena)–the fully-qualified domain name should automatically populate.
    • Provide the IP address of the associated node. If you followed along closely to the previous entry, they’ll be ( up to 1 + number of nodes)
  6. Create an entry for your router such as (e.g. and point it at your gateway IP (e.g.
  7. Create an entry for your controller such as (e.g. and point it at your controller’s static IP.

Save and restart your controller.

Create a ActiveDirectory User

  1. Log back into your domain controller and, from the server manager select the Tools menu, then Active Directory Users and Computers.
  2. From the left-hand navigator tree, expand your domain node and right-click on the Users Sub-node.
  3. Select New, and then User. Fill in your information, and make a note of the username and password, we’ll use it in the next step.

Join a computer to your domain!

On your local computer, navigate to:
1. Control Panel > System and Security > System
1. Under the section Computer name, domain, and workgroup settings, select Change settings. The System Properties dialog should appear.
1. Select the Change button to the right of To rename this computer or change its domain or workgroup, click Change
1. Set your computer name (kitchen, workstation, etc.)
1. Set member of to domain with your domain value (e.g.

Click OK–this can take a bit.

Once that’s done, restart your computer. You should now be able to log in as <DOMAIN>/ (configured in the ActiveDirectory section above)!

Your profile should automatically use your domain controller as its DNS server, so you should be able to ping all of your entries.


Windows Server makes it really easy to install and manage domains, almost enough to justify its steep price-tag. A domain server allows you to control users, groups, permissions, accounts, applications, DNS, etc. in a centralized fashion. Later on, when we configure Kubernetes, we’ll take advantage of Windows Server’s powerful DNS capabilities to automatically provision DNS entries associated with IPs allocated from our cloud subnet and use them to front services deployed into our own high-availability Kubernetes cluster.

If you don’t want to/can’t use Windows Server, you can replicate this functionality using Apache Directory, and PowerDNS. I may do a post on how to setup and configure these, but my primary goal is to move on to the Kubernetes cluster configuration, and managing DNS and LDAP are two relatively complex topics that are greatly simplified by Windows Server.

Building a Home Cloud with Proxmox: Part 1

We’re back!

Whew! As you’re probably already aware, 2020 was an incredibly weird year–but we’re getting back on the wagon and going leaner in 2021! Last year, we’d installed solar power to take advantage of Colorado’s 300+ days of sunshine, and we had some relatively beefy workstations left over from our defense work. At the same time, we noticed that our R&D AWS spend is a little high for our budget right now, so we decided to build a local cloud on top of the wonderful Proxmox virtualization management software. Let’s get started!


To begin with, we reviewed which services from AWS we needed, and came up with the following list:

  1. RDS
  2. Jenkins CI/CD
  3. Kubernetes (We hosted our own with KOPS on AWS–highly recommend it)
  4. Route53
  5. AWS Directory Service
  6. Nexus
  7. Minio

(I haven’t really decided if I want to use OpenShift or not for this cluster yet).

We’re also probably going to end up using MetalLB to allocate load-balancers for our bare-metal(ish) cluster.

Let’s get to it!

Step 0: Hardware configuration

Our setup is 3 HP Z840s with the following configuration:
1. 2TB RAM
1. 24 logical CPUs on 2 sockets
1. 4 4TB Samsung PRO 860 SSDs

For ease-of-use, we also recommend the BYTECC 4 HDMI port KVM switch (for these machines, anyway, YMMV)

Step 1: Configure your Network

  1. Check your network configuration. We have each box hooked up directly to a Linksys AC32000 running DD-WRT.
    1. (Note that these instructions are for DD-WRT–if you don’t use that, you’ll have to find equivalent configurations for your router).
    2. IMPORTANT: Ensure that your network is at least a /24 (256 total IPs)–you’ll end up using more than you expect.
    3. Useful tip: Install your nodes towards the low-end of your subnet. Ours are at 192.168.1.[2, 3, 4] with the gateway
  2. Next, configure your DD-WRT installation to supply DNS over your LAN by:
    1. Navigating to and logging in
    2. Navigate to services from the top level, then ensure the services sub-tab is selected
    3. Select Use NVRAM for client lease DB (this will allow your configuration to survive power-outages and restarts)
    4. Set Used Domain to LAN & WAN
    5. Set your LAN Domain to whatever you want it to be. This is purely local, so it can be anything. We own so we just used that. home.local or whatever should work just fine as well.
    6. Ensure DNSMasq is enabled
    7. Add bash

      to your Additional DNSMasq Options Text box
    8. Save & Apply your configuration

Step 2: Install Proxmox

Proxmox is an open-source alternative to (approximately?) vSphere from VMWare, and we’ve used it for a variety of internal workloads. It’s quite robust and stable, so I encourage you to check it out!

The first thing is to burn a USB drive with the Proxmox ISO. UEFI hasn’t worked well for me on these machines (they’re a little old), so the Rufus configuration I used was:

Partition scheme: MBR
File System: FAT32
Cluster Size: 4096 bytes (default)

I went ahead and checked the Add fixes for old BIOSes option since these are pretty old bioses. Hitting start will ask you if you want to burn the image in ISO image mode or DD image mode. With these 840s, ISO mode didn’t work for me, but DD did. Hit START and grab a coffee!

Once your USB drive is gtg, install Proxmox VE on each of the machines:

  1. Plug your USB drive into the first node in your cluster
  2. Restart/power on your node
  3. Select your USB drive from the boot options. If you followed our instructions for Rufus, it’ll probably be under Legacy instead of UEFI or whatever.
  4. IMPORTANT: Click through the wizard until it asks you which drive/partition you want to install Proxmox into. Select Options and reduce the installation size to about 30GB. I don’t know what the minimum is, but 30GB works great for this setup. This even gives you some space to upload ISOs to the default storage location. (Note that if you don’t select this the installation will consume the entire drive).
  5. Continue on the installation until you see your network configuration.
    1. Select your network (ours is–yours may not be, depending on how you configured your router (above))
    2. Input your node IP (we went with for the first, .3 for the second, etc.)
    3. Add a cool hostname. If you configured your router as we did above, you should be able to input <name>.. For instance, for our 3 nodes we went with, and
  6. Make a note of the username and password you chose for the root. Slap that into Lastpass–you’ll thank yourself later.

Repeat this process for each node you have. Once that’s complete, navigate to any of the nodes at https://<ip&gt;:8006 where <ip> is the IP that you configured in the Proxmox network installation step. Your browser will yell at you that this isn’t a trusted site since it uses a self-signed certificate by default, but that’s OK. Accept the certificate in your browser, then login with the username and password you provided. In the left-hand pane of your Proxmox server, select the top-level Datacenter node, then click Create Cluster. This will take a second, at which point you’ll be able to close the dialog and select Join Information. Copy the contents of the Join Information text-area.

Once you have your Join Information, navigate to each of the other nodes in your cluster. Log in, then select the top level Datacenter node once again. This time, click on the Join Cluster button. Paste the Join Information into the text area, then enter the root password of the first node in the root password text field. In a second or two, you should see 2 cluster nodes under the Datacenter configuration. Repeat this process with all of the nodes you set Proxmox up on!

Configure DNS

You may need to configure your DNS within your router at this point. Click on the Shell button for each node, and run:
1. apt-get update
1. apt-get install net-tools

Then, run ifconfig. You should see something like:

enp1s0: flags=4163&lt;UP,BROADCAST,RUNNING,MULTICAST&gt;  mtu 1500
        ether d8:9d:67:f4:5e:a0  txqueuelen 1000  (Ethernet)
        RX packets 838880  bytes 173824493 (165.7 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 907011  bytes 170727177 (162.8 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 17  memory 0xef500000-ef520000  

lo: flags=73&lt;UP,LOOPBACK,RUNNING&gt;  mtu 65536
        inet  netmask
        inet6 ::1  prefixlen 128  scopeid 0x10&lt;host&gt;
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 669  bytes 161151 (157.3 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 669  bytes 161151 (157.3 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

vmbr0: flags=4163&lt;UP,BROADCAST,RUNNING,MULTICAST&gt;  mtu 1500
        inet  netmask  broadcast
        inet6 fe80::da9d:67ff:fef4:5ea0  prefixlen 64  scopeid 0x20&lt;link&gt;
        ether d8:9d:67:f4:5e:a1  txqueuelen 1000  (Ethernet)
        RX packets 834450  bytes 158495280 (151.1 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 902499  bytes 166814715 (159.0 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

This is on Your list might look different. Note that under the vmbr0 entry there’s a set of letters/digits to the right of ether (in my case, d8:9d:67:f4:5e:a1). Make a note of these for each node, then go back to your gateway, log in, and navigate to Services once again. The default subtab Services should also be selected–if it’s not, select that.

You’ll see a field called Static Leases–click add. You’ll see a set of 4 fields:
1. MAC Address
1. Hostname
1. IP Address
1. Client Lease Time

For each of the host/MAC addresses you found previously (e.g. -> d8:9d:67:f4:5e:a1), fill the MAC address field with the MAC address, the hostname with the corresponding hostname, and the IP address of the node. Save & Apply Settings. You should be able to visit any of the nodes at their corresponding DNS name. For instance, to access the Proxmox cluster from athena, I can visit


At the end of this exercise, you should have a Proxmox cluster with at least 1 node that’s accessible via DNS on your local cluster!

Cloud-Agnostic Infrastructure-As-Code Trials and Tribulations

Opinions around cloud-agnosticism vary, but generally it is considered a truism that for cloud-agnostic deployments, infrastructure-as-code (IaC) is the best, perhaps the only way to achieve them. Cloud-agnostic deployments are obviously not a goal for every organization, but we’ve found that as organizations become more sophisticated it frequently becomes a need. This is most often due to customers’ on-premise needs or their own hybrid-cloud requirements. Yet despite their sophistication, these organizations almost universally struggle to implement cloud-agnostic deployments, even with infrastructure-as-code tools.

Tools like Terraform and Kubernetes are the most often used for cloud-agnostic infrastructure-as-code, and while we love and use them as much as the next DevOps nerd, we’ve found that this solution can run into a variety of problems. Broadly, in our experience, these cloud deployment and management problems fall into the following categories:

  1. Visibility (what infrastructure am I renting, and where?)
  2. Infrastructure design and architecture
  3. Infrastructure deployment
  4. Cost analysis and optimization, especially as it pertains to infrastructure utilization
  5. Parameterization (secrets, credentials, etc.)

There aren’t very many cloud management tools that capture all of the categories. Those that exist do have issues with “common denominator” problems — that is, concerns around what functionality remains consistent across clouds. We find that folks overestimate the degree to which common denominator problems exist and how problematic they actually are. While the names and APIs for services vary wildly across cloud platforms, once you get past the syntax, they’re functionally the same. Indeed, we find that abstracting out sensible defaults in our abstraction layer solves most of those problems. Usually, the problems arise when mapping individual capability concepts — e.g. porting function-as-a-service-based code from AWS Lambda to another cloud. Even more typically, it comes from providing a common view of infrastructure elements between clouds, tenants, regions, etc.

We’ve sought to alleviate these problems with Troposphere, our drag-and-drop orchestration engine that can take whatever your current infrastructure-as-code solution is and make it cloud-agnostic. Over the coming posts, we’ll address the issues we’ve seen in implementing cloud-agnostic infrastructure-as-code solutions, and how to address them.

One Configurable, Cross-Platform Self-Extracting Executable Generator to Rule Them All

Generating configurable, cross-platform self-extracting executables has been the purview of commercial software for, well, forever. We wanted folks of all operating systems to be able to download and run our software locally. We also didn’t want to break the bank. After much deliberation, we decided to build the solution ourselves, making cross-platform installer generation easy and accessible.

Introducing the Zephyr Installer Plugin Ecosystem

zephyr installer logoWith Zephyr Build Plugins, you can generate any self-extracting executable targeting any supported platform from any supported platform, as well as executables for every supported platform on any supported platform from the same build. Even better? Generated executables do not require any JVM to be installed on the end
user’s system to function.

You can create self-extracting executables with code-signing, as well as modifiable icons, metadata and run permissions for Windows, Mac and Linux. Additionally Zephyr Build Plugins can generate ICO and ICNS icons from PNG, SVG and other formats. We currently support Maven for all platforms, but plan to support Gradle as well. Bazel and Ant support will be available with a commercial license.

Use Cases

Create a self-extracting executable targeting (Windows, Mac, or Linux)

Executables can be built on your current operating system (Mac, Windows, or Linux) with no modifications or third-party requirements.

Sign generated executables

Executables can be signed for any supported platform with the same configuration. Sign Windows executables with Authenticode on Mac, Linux, or Windows, or sign Mac app packages with CodeSign on Mac, Linux, or Windows.

Generate ICO/ICNS files

Traditionally, having ICO or ICNS icon files on hand has been a prerequisite, requiring third-party commercial tools, online icon generators, etc. The Zephyr build ecosystem allows you to generate ICO/ICNS files from standard raster formats like PNG and SVG with a variety of sizes.

Attach ICO/ICNS files to your executable

Inserting branding icons into executables has been a platform-dependent chore, but Zephyr allows you to brand your generated executable in a platform-independent way.

Create installers for JVM-based programs

Previously, installers for JVM-based programs required the installer to download the JVM, or forced the end-user to install it. Zephyr allows you to launch IzPack installers using a JVM bundled with your application

Automate everything!

Since these tools are included as build plugins for the most popular build systems, you can completely eliminate any manual steps in your installer generation process!

You can find the Maven plugin documentation at

Reflecting on 2019

Like most folks coming into this new year and new decade, we’ve been reflecting on the past. is far less than a decade old (we turn two in March!), but even a year is a long time in the life of a small business. Despite our small size, it was a banner year.

First Six Months

The first six months of the year were largely occupied by releasing the Sunshower platform to the public. Our web application plays host to Stratosphere, allowing you to visualize your AWS EC2 infrastructure around the globe, and Anvil, allowing you to save 66% on your AWS EC2 bill with right-sizing.

We also spent a lot of the first half of the year doing the “startup circuit” — taking second in a startup pitch competition, doing lots of networking events with investors, and interviewing at Y Combinator.

Last Six Months

From there, we abruptly got pulled into the land of government contracting. A large defense contractor contacted us about doing a white paper together for the Air Force, which we submitted in July. We spent August waiting to hear if we we would receive an RFP, and September writing the proposal. Our contacts at the Air Force think our offering is incredibly valuable, and last we heard, the proposal was in the technology evaluation phase. Interested in what we pitched them? Download our marketing white paper.

This fall, we’ve renewed our commitment to open-source software. Most notably, we refactored our plugin framework into Zephyr, so other organizations can reap the benefit of a non-OSGi system for lifecycle and dependency management. We also renewed our work on Aire, a UI framework built on Aurelia and UIKit.

Last but not least, we finished out the year with a bit of a rebrand. We have a new logo and doubled down on our bright colors. We also changed our tagline to reflect our movement away from optimization and towards application and infrastructure management and deployment.

We’re looking forward to what 2020 has in store!

Why We Wrote Zephyr

Since releasing Zephyr, we’ve been asked by numerous people why we wrote Zephyr instead of sticking to OSGi. Our goal was pretty simple: create an extensible system suitable for SaaS or on-prem.  We looked in our toolbox and knew that we could do this using OSGi, Java, and Spring, and so that’s how it started.

How We Started

First, we wrote our extensible distributed graph reduction machine: Gyre.  This allowed us to describe computations as graphs. It generated a maximally-parallel schedule, did its best to figure out whether to ship a) a computation to data or b) data to a computation or c) both to an underutilized node and executed the schedule.

Then we wrote Anvil, our general-purpose optimization engine that efficiently solved linear and non-linear optimization problems. These were described as Gyre graphs (including how the Gyre could better execute tasks based off of its internal metrics). We deployed Anvil and Gyre together as bundles into an OSGi runtime.  Obviously, Anvil couldn’t operate without Gyre, and so we referenced Gyre services in Anvil.  But Anvil and Gyre themselves were extensible.  We wrote additional solvers and dynamically installed them into Anvil, or wrote different concurrency/distribution/serialization strategies and deployed them into Gyre, and gradually added more and more references.

Then we wrote Troposphere, our deployment engine. Troposphere would execute its tasks on Gyre, and Anvil would optimize them. Troposphere would define types of tasks, and we exported them as requirements to be satisfied by capabilities. (For example, Troposphere would define a “discovery” task, and an AWS EC2 plugin would fulfill that capability.)

Handling OSGi with Spring

Being a small team, we pretty much only used one actual framework (Spring), so we deployed yet another bundle containing only the Spring classpath, to be depended on by any bundle that required it.  We initially used bnd to generate our package import/export statements in our manifest, and pulled in the bnd Gradle plugin as part of the build, but the reality was that if a plugin depended on Troposphere, then it pretty much always depended on Gyre, Anvil, and Spring.

If Anvil contains a service-reference to Gyre, and Troposphere contains one to Anvil, you get the correct start-order.  But if you stop Gyre while Troposphere is running?  Well, that’s a stale reference, and Troposphere needs to handle it, which means refactoring Troposphere and Gyre to use service factories, prototype service factories, or whatever else.

But we just wanted to write Spring and Java.  To really use Spring in an OSGi-friendly way, you have to use Blueprints, and now you’re back to writing XML in addition to all of the OSGi-y things you’re doing in your code. The point isn’t that OSGi’s way doesn’t work — it does. These are solid technologies written by smart people. The point is that introduces a lot of additional complexity, and you’re forced to really understand both Spring and OSGi to be productive when Spring is the only framework that’s actually providing value (in the form of features) to your users because the extensibility component (OSGi) is a management concern.

What Zephyr gets us that OSGi didn’t


We’re big fans of unit tests, and we write a lot of them.  Ideally, if you’re sure components A and B both work, then the combination of A and B should work.  The reality is that sometimes they don’t for a huge variety of reasons. For example, for us, using any sort of concurrency mechanism outside of Gyre could severely bork Gyre, which could and did bamboozle dozens of plugins. We’re small enough that we could just set a pattern and decree that hey, that is the pattern, and catch violations in reviews or PMD rules. But once again, we just wanted to write integration tests and we wanted to use Spring Test to do it.

With OSGi, you can create projects whose test classpath matches the deployment classpath (although statically), and we did.  We also wrote harnesses and simulations that would set up OSGi and deploy plugins from Maven, etc., and it all worked. But it was still complex, and it wasn’t just Spring Test. This was, and continues to be, a big source of pain for us.  The fact of the matter is that, once again, Spring was providing the developer benefit and OSGi was introducing complexity.

Quick Startup/Shutdown Times

We use a lot of Spring’s features and perform DB migrations in a variety of plugins — not an unusual use case.  A plugin might only take a few seconds to start, but amortized over dozens of plugins, startup time became pretty noticeable.  There are some ways to configure parallel bundle lifecycle, but they’re pretty esoteric, sometimes implementation-dependent, and always require additional metadata or code. With Zephyr, we get parallel deployments out-of-the-box and as the default, reducing startup times from 30+ seconds to 5 or so.

Remote Plugins

One of our requirements is the ability to run plugins whose processes and lifecycles reside outside of Zephyr’s JVM. OSGi (understandably) wasn’t designed to support this, but Zephyr was.

Getting it right with Zephyr

We spent about two years wrangling OSGi and Spring, by turns coping with these and other problems either in code or operations. It was generally successful, but there was always an understanding that we were paying a high price in terms of time and complexity. After the first dozen or so plugins, we’d really come to understand what we wanted from a plugin framework.

To boot, we are pretty good at graph processing, and it had been clear to us for a while that the plugin management issues we were continually encountering were graph problems. Classpath dependency issues could be easily understood through the transitive closure of a plugin, and most of our plugins had the same transitive closure. Even if they didn’t, that was the disjoint-subgraph problem and we could easily cope with that. Correct parallel start schedules were easily found and correctly executed by Coffman-Graham scheduling, and we could tweak all of these subgraphs through subgraph-induction under a property.  Transitive reduction allowed us to easily and transparently avoid problems caused by non-idempotent plugin management operations.

Once we’d implemented those, we discovered that a lot of the problems we struggled with just went away. Required services could never become stale, and optional services just came and went.  A lot of the OSGi-Spring integration code we’d written became dramatically simpler, and we could provide simple but powerful Spring Test extensions that felt very natural.

What’s Next

But we’re not stopping with Spring: Zephyr can support any platform and any JVM language, and we’re planning on creating support for Clojure, Kotlin, and Scala initially as installable runtimes. We’re investigating NodeJS support via Graal and should have some announcements about that in the new year. Spring is already supported, and we hope to add Quarkus and Dropwizard soon. And keep in mind that these integrations should require little or no knowledge of Zephyr at all.

We’re also in the process of open-sourcing a beautiful management UI, a powerful repository, and a host of other goodies — stay tuned!

Introducing Zephyr: A Java Plugin System for the 21st Century

At, we write software for people who write software. We’re pleased to announce something new to help folks scale their software: Zephyr, a next-generation plugin framework written in Java. Zephyr is an OSGi alternative — inspired by the best parts of it while dramatically reducing complexity and improving interoperability with existing frameworks and ecosystems.

Zephyr was born from our frustration with existing module systems. We started off using Wildfly and embedding OSGi, but this proved inadequate for the complex dependency graphs we encountered while developing the Sunshower platform. In particular, continually copy/pasting around manifests to import the dozens of packages from various frameworks was tedious and error-prone (and auto-generating them wasn’t much better, in fact). It greatly increased the complexity of our builds and deployments as we’d continually need to rev released versions of modules. This is to say nothing of the complexities of testing module interactions, or the joys of a ClassNotFoundException appearing suddenly after weeks of smooth operation caused by a forgotten Package-Imports declaration.

After over 18 months of working around framework limitations, we looked at the “Kernel” that arose from coping with these problems and decided “Hey, this is pretty useful. Let’s get rid of underlying systems and just use that.” And now we’re open-sourcing it.

Small but mighty, Zephyr aggressively and automatically parallelizes management operations while running in less than 512KB of memory. It intelligently manages all aspects of plugin lifecycle, including dependency resolution. Deploying new plugins is quick and painless. And, of course, setting up plugin dependencies for tests is, well, a breeze.

While we wrote it in Java, Zephyr works with whatever languages you normally use by installing language runtimes as plugins. You can have multiple frameworks running side by side, eliminating a lot of overhead associated with rewrites, scaling and transitioning architectures.

Zephyr is available on Github under an MIT license. Enterprise support contracts are available. Go check out the website, the docs or the repository. We’d love to have you involved!

Startup Culture is an American Revolution

Happy Fourth of July!

In case you were wondering, this isn’t just another Independence Day blog post talking about the Sunshower platform and how it will bring you freedom, blah blah blah. Rather, this is a blog post emphasizing that the ideals that led to the American Revolution, both within Great Britain and the colonies itself, are alive and well in the American startup culture.