Last time, we configured a Windows Server domain controller to handle DNS, DHCP, and ActiveDirectory (our LDAP implementation). In this post, we’ll be configuring the Ceph distributed filesystem which is supported out-of-the-box by Proxmox.
Ceph Storage Types
Ceph supports several storage types, but the types we’re interested in for our purposes are
A Ceph filesystem behaves like a standard filesystem–it can contain files and directories. The advantage to a Ceph filesystem for our purposes is that we can access any files stored within it from any location in the cluster, whereas with local-storage (e.g.
local-lvm or whatever), we can only access them on the node they reside upon.
Similarly, a Ceph block device is suitable for container and virtual machine hard-drives, and these drives can be accessed from anywhere in the cluster. This node-agnosticism is important for several reasons:
- We’re able to create base images for all our virtual machines and clone them to any node, from any node
- We’re able to use Ceph as a persistent volume provider in Kubernetes
In this scheme, your Ceph IO performance is likely to be bottlenecked by your intra-node network performance. Our 10Gbps network connections between nodes seems to perform reasonably well, but is noticeably slower than local storage (we may characterize this at some point, but this series isn’t it).
In our setup, we have 4, 4 TB drives on each node. Ceph seems to prefer identical hardware configurations on each node it’s deployed on, so if your nodes differ here, you may want to restrict your Ceph deployment to identical nodes.
Create your Ceph partitions on each node
Careful Ensure that the drives you’re installing Ceph onto aren’t being used by anything–this will erase everything on them. You can colocate Ceph partitions with other partitions on a drive, but that’s not the configuration we’re using here (and how well it performs depends on quite a few factors)
We’re going to use
/dev/sdb for our Ceph filesystem on each node. To prepare each drive, open a console into its host and run the following:
dto delete all the partitions. If you have more than one, repeat this process until there are no partitions left.
nto create a new partition
gto create a new GPT partition table
wto write the changes
Your disks are now ready for Ceph!
On each node, navigate to the left-hand configuration panel, then click on the
Ceph node. Initially, you’ll see a message indicating that Ceph is not installed. Select the
advanced option and click
Install to continue through the wizard.
When you’re presented with the options, ensure that the
osd_pool_default_min_size configurations are set to the number of nodes that you intend to install Ceph on. While the
min_size can be less than the
pool_size, I have not had good luck with this configuration in my tests.
Once you’ve installed Ceph on each node, navigate to the
Monitor node under the
Ceph configuration node and create at least 1 monitor and at least 1 manager, depending on your resiliency requirements.
Create an object-storage daemon (OSD)
Our next step is to allocate Ceph the drives we previously provisioned (
/dev/sdb). Navigate to the
OSD node under the Ceph node, and click
create OSD. In our configuration, we use the same disk for both the storage area and the Ceph database (DB) disk. Repeat this process for each node you want to participate in this storage cluster.
Create the Ceph pool
We’re almost there! Navigate to the
Pools node under the
Ceph node and click
create. You’ll be presented with some options, but the options important to us are:
- size: set to the number of OSDs you created earlier
- min-size: set to the number of OSDs you created earlier
Ceph recommends about 100 placement groups per OSD, but the number must be a power of 2, and the minimum is 8. More placement groups means better reliability in the presence of failures, but it also means replicating more data which may slow things down. Since we’re not managing dozens or hundreds of drives, I opted for fewer placement groups (16).
Create–you should now have a Ceph storage pool!
Create your Ceph Block Storage (RBD)
You should now be able to navigate up to the cluster level and click on the
storage configuration node.
- Give it a memorable ID that’s also volume-friendly (lower case, no spaces, only alphanumeric + dashes). We chose
- Select the
cephfs_datapool (or whatever you called it in the previous step)
- Select the monitor node you want to use
- Ensure that the
All (no restrictions)unless you want to configure that.
Congrats! You should now have a Ceph block storage pool!
Create your Ceph directory storage
In the same view (Datacenter > Storage):
- Give it a memorable ID (same rules as in the previous step), we called ours
- Ensure that the content is selected to all the available options (VZDump backup file, ISO image, Container Template, Snippets)
- Ensure the
Use Proxmox VE managed hyper-converged cephFSoption is selected
Add–you now have a location you can store VM templates/ISOs in that is accessible to every node participating in the CephFS pool! This is important because it radically simplifies configuration via Terraform, which we’ll be writing about in subsequent posts.
Create a Cloud-Init ready VM Template
We use Debian for virtually everything we do, but these steps should work for Ubuntu as well.
- Upload a Debian ISO (we use small installation image) to your
- Perform a minimal installation. The things you’ll want to pay attention to here are:
- Ensure your domain-name is set to whatever domain-name you’ve configured on your domain-controller
- Only install the following packages:
Standard System Utilities
- Finish the Debian installation wizard and power on your VM. Make a note of the ID (probably 100 or something) I refer to as
VM_IDin subsequent steps
- In the Proxmox console for your VM, perform the following steps:
apt-get install cloud-init net-tools sudo
- Once APT is finished installing those packages, edit your
/etc/ssh/sshd_configusing your favorite editor (e.g.
vi /etc/ssh/sshd_config) and ensure that
PermitRootLoginis set to
yesfrom its default of
prohibit-password. These are all air-gapped in our environment, so I’m not too worried about the security ramifications of this. If you are, be sure to adjust subsequent configurations to use certificate-based auth.
- Save the file and shut down the virtual machine
Now, let’s enable CloudInit!
On any of your nodes joined to the CephFS filesystem:
- Open a console
- Configure a CloudInit drive:
- Using the id for the Ceph RBD/block store (referred to as
<BLOCK_STORE_ID>we configured above (ours is
ceph-block-storage), and the
VM_IDfrom the previous section, create a drive by entering
qm set <VM_ID> --ide2 <BLOCK_STORE_ID>:cloudinit. For instance, in our configuration, with VM_ID = 100, this is
qm set 100 --ide2 ceph-block-storage:cloudinit. This should complete without errors.
- Using the id for the Ceph RBD/block store (referred to as
- Verify that your VM has a cloud-init drive by navigating to the VM, then selecting the
Cloud-Initnode. You should see some values there instead of a message indicating Cloud-Init isn’t installed.
You should also verify that this machine is installed on the correct block-storage by attempting to clone it to another node. If everything’s configured properly, cloning a VM to a different node should work seamlessly. At this point, you can convert your VM to a template for subsequent use.
In this post we installed and configured Ceph block and filestores, and also created a Cloud-Init capable VM template. At this point, we’re ready to begin configuring our HA Kubernetes cluster using Terraform from Hashicorp. Our Terraform files will be stored in our public devops Github repository