Building a Home Cloud with Proxmox Part 3: Configuring Ceph and Cloud-Init

Overview

Last time, we configured a Windows Server domain controller to handle DNS, DHCP, and ActiveDirectory (our LDAP implementation). In this post, we’ll be configuring the Ceph distributed filesystem which is supported out-of-the-box by Proxmox.

Ceph Storage Types

Ceph supports several storage types, but the types we’re interested in for our purposes are

  1. Filesystem
  2. Block

A Ceph filesystem behaves like a standard filesystem–it can contain files and directories. The advantage to a Ceph filesystem for our purposes is that we can access any files stored within it from any location in the cluster, whereas with local-storage (e.g. local-lvm or whatever), we can only access them on the node they reside upon.

Similarly, a Ceph block device is suitable for container and virtual machine hard-drives, and these drives can be accessed from anywhere in the cluster. This node-agnosticism is important for several reasons:

  1. We’re able to create base images for all our virtual machines and clone them to any node, from any node
  2. We’re able to use Ceph as a persistent volume provider in Kubernetes

In this scheme, your Ceph IO performance is likely to be bottlenecked by your intra-node network performance. Our 10Gbps network connections between nodes seems to perform reasonably well, but is noticeably slower than local storage (we may characterize this at some point, but this series isn’t it).

Drive Configuration

In our setup, we have 4, 4 TB drives on each node. Ceph seems to prefer identical hardware configurations on each node it’s deployed on, so if your nodes differ here, you may want to restrict your Ceph deployment to identical nodes.

Create your Ceph partitions on each node

Careful Ensure that the drives you’re installing Ceph onto aren’t being used by anything–this will erase everything on them. You can colocate Ceph partitions with other partitions on a drive, but that’s not the configuration we’re using here (and how well it performs depends on quite a few factors)

We’re going to use /dev/sdb for our Ceph filesystem on each node. To prepare each drive, open a console into its host and run the following:

  1. fdisk /dev/sdb
  2. d to delete all the partitions. If you have more than one, repeat this process until there are no partitions left.
  3. n to create a new partition
    1 g to create a new GPT partition table
  4. w to write the changes

Your disks are now ready for Ceph!

Install Ceph

On each node, navigate to the left-hand configuration panel, then click on the Ceph node. Initially, you’ll see a message indicating that Ceph is not installed. Select the advanced option and click Install to continue through the wizard.

When you’re presented with the options, ensure that the osd_pool_default_size and osd_pool_default_min_size configurations are set to the number of nodes that you intend to install Ceph on. While the min_size can be less than the pool_size, I have not had good luck with this configuration in my tests.

Once you’ve installed Ceph on each node, navigate to the Monitor node under the Ceph configuration node and create at least 1 monitor and at least 1 manager, depending on your resiliency requirements.

Create an object-storage daemon (OSD)

Our next step is to allocate Ceph the drives we previously provisioned (/dev/sdb). Navigate to the OSD node under the Ceph node, and click create OSD. In our configuration, we use the same disk for both the storage area and the Ceph database (DB) disk. Repeat this process for each node you want to participate in this storage cluster.

Create the Ceph pool

We’re almost there! Navigate to the Pools node under the Ceph node and click create. You’ll be presented with some options, but the options important to us are:

  1. size: set to the number of OSDs you created earlier
  2. min-size: set to the number of OSDs you created earlier

Ceph recommends about 100 placement groups per OSD, but the number must be a power of 2, and the minimum is 8. More placement groups means better reliability in the presence of failures, but it also means replicating more data which may slow things down. Since we’re not managing dozens or hundreds of drives, I opted for fewer placement groups (16).

Click Create–you should now have a Ceph storage pool!

Create your Ceph Block Storage (RBD)

You should now be able to navigate up to the cluster level and click on the storage configuration node.

  1. Click Add and select RBD.
  2. Give it a memorable ID that’s also volume-friendly (lower case, no spaces, only alphanumeric + dashes). We chose ceph-block-storage
  3. Select the cephfs_data pool (or whatever you called it in the previous step)
  4. Select the monitor node you want to use
  5. Ensure that the Nodes value is All (no restrictions) unless you want to configure that.
  6. Click Add!

Congrats! You should now have a Ceph block storage pool!

Create your Ceph directory storage

In the same view (Datacenter > Storage):

  1. Click Add and select CephFS
  2. Give it a memorable ID (same rules as in the previous step), we called ours ceph-fs
  3. Ensure that the content is selected to all the available options (VZDump backup file, ISO image, Container Template, Snippets)
  4. Ensure the Use Proxmox VE managed hyper-converged cephFS option is selected

Click Add–you now have a location you can store VM templates/ISOs in that is accessible to every node participating in the CephFS pool! This is important because it radically simplifies configuration via Terraform, which we’ll be writing about in subsequent posts.

Create a Cloud-Init ready VM Template

We use Debian for virtually everything we do, but these steps should work for Ubuntu as well.

  1. Upload a Debian ISO (we use small installation image) to your CephFS storage
  2. Perform a minimal installation. The things you’ll want to pay attention to here are:
    • Ensure your domain-name is set to whatever domain-name you’ve configured on your domain-controller
    • Only install the following packages: SSH Server and Standard System Utilities
  3. Finish the Debian installation wizard and power on your VM. Make a note of the ID (probably 100 or something) I refer to as VM_ID in subsequent steps
  4. In the Proxmox console for your VM, perform the following steps:
    • apt-get update
    • apt-get install cloud-init net-tools sudo
  5. Once APT is finished installing those packages, edit your sshd_config file at /etc/ssh/sshd_config using your favorite editor (e.g. vi /etc/ssh/sshd_config) and ensure that PermitRootLogin is set to yes from its default of prohibit-password. These are all air-gapped in our environment, so I’m not too worried about the security ramifications of this. If you are, be sure to adjust subsequent configurations to use certificate-based auth.
  6. Save the file and shut down the virtual machine

Now, let’s enable CloudInit!

On any of your nodes joined to the CephFS filesystem:

  1. Open a console
  2. Configure a CloudInit drive:
    • Using the id for the Ceph RBD/block store (referred to as <BLOCK_STORE_ID> we configured above (ours is ceph-block-storage), and the VM_ID from the previous section, create a drive by entering qm set <VM_ID> --ide2 <BLOCK_STORE_ID>:cloudinit. For instance, in our configuration, with VM_ID = 100, this is qm set 100 --ide2 ceph-block-storage:cloudinit. This should complete without errors.
  3. Verify that your VM has a cloud-init drive by navigating to the VM, then selecting the Cloud-Init node. You should see some values there instead of a message indicating Cloud-Init isn’t installed.

You should also verify that this machine is installed on the correct block-storage by attempting to clone it to another node. If everything’s configured properly, cloning a VM to a different node should work seamlessly. At this point, you can convert your VM to a template for subsequent use.

Conclusion

In this post we installed and configured Ceph block and filestores, and also created a Cloud-Init capable VM template. At this point, we’re ready to begin configuring our HA Kubernetes cluster using Terraform from Hashicorp. Our Terraform files will be stored in our public devops Github repository

1 comment

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: