Highly available kubernetes cluster with etcd, Longhorn and MetalLB (on Proxmox)
- Techno Tim: https://github.com/techno-tim/k3s-ansible
- Inspired by Hardware Haven video: https://youtu.be/S_pp_nc5QuI
- https://github.com/ehlesp/smallab-k8s-pve-guide
- and multiple other sources.
Why kubernetes?
In my current setup, I'm running multiple services as either docker containers (inside a VM or LXC on proxmox) or directly inside a LXC. Traffic to public services like this blog and my Wiki is routed to my containers/VM's via HAProxy on my pfSense machine. My internal services are exposed using a Nginx reverse proxy (LXC), which gets its TLS certificates from that same pfSense machine.
This works fine, but there is one downside. My Proxmox servers need to be rebooted once in a while (more often as I'm always playing around with them). I also have had servers lock up on multiple occasions, also requiring me to reboot them. When rebooting, my public services are temporarily not available. My sites are not that important, but still, this can be done better.
Another reason to dig into this is for professional reasons. My company, jodiBooks, currently runs its services on EC2 instances in the cloud. So far, without major issues. But it would be awesome if those services would automatically scale under load or recover from issues.
So in comes Kubernetes or K8s. It comes in many flavors; k0s, k3s and microk8s among others. Then there are also the cloud based variants like AKS, EKS, GKE, LKE and many more. For my homelab I have initially chosen k3s as it seems to be the one that is the most documented and relatively lightweight (more on that later). In the future I will explore EKS for production/business purposes.
Goal of this post
There are numerous guides on the internet on how to install k3s. All have there specific use-cases or interpretations. I have watched multiple YouTube videos and explored even more written tutorials and instructions. I still have no idea how Kubernetes fundamentally works and what the commands are, but I learn best by doing. This post explains what I tried and my initial findings. A list of commands and a how-to can be found on the wiki:
- With the Ansible script: https://wiki.joeplaa.com/en/tutorials/highly-available-kubernetes-cluster-on-proxmox
- Manually: https://wiki.joeplaa.com/en/tutorials/highly-available-kubernetes-cluster-on-proxmox-2
I want to gradually explore what I can do with kubernetes and how to migrate my existing services. By doing it step by step I hope to find out which services and software I actually need. Also, which services are overkill or simply too resource intensive.
I also want to (initially) keep using an "external" reverse proxy (Nginx in LXC for private sites, HAProxy on pfSense for public sites) and an "external" certificate manager (ACME on pfSense). This because it works and I don't want to make it unnecessarily complicated, I hope. But, this is where my appoach seems to deviate from most guides I found, which all use Traefik, Nginx or HAProxy as a service inside of kubernetes (I'll call that "internal" here as opposed to my "external" use-case). So I had to figure out which commands to skip or change and where to edit the ansible script.
Part 1: using the Ansible script
Initally I build a cluster of 3 "control" nodes (called master
nodes in the script) and 2 "worker" nodes (simply called node
in the script). I have done this accross two Proxmox servers linked together in a cluster, but this can also be done on a single Proxmox machine. I than entered my specific data in the Ansible script and ran it.
Open question 1: If I run a HA kubernetes cluster (3 nodes) with HA storage (Longhorn) on top of a HA Proxmox cluster (3 nodes) with HA storage (Ceph), isn't that duplicating the "HA" part? And do I than have 6 copies of my data?
Open question 2: Which services should I run in Kubernetes and which in Proxmox containers? And why?
Finding 1
In the Hardware Haven video he installed Rancher to get a webUI for the cluster. Rancher however seemed to use a lot of resources out of the box (https://stackoverflow.com/questions/60736392/rancher-high-cpu-utilization-even-with-zero-clusters-under-management). It is also huge and abstracted a lot of details away. This is not a downside per se, but as I am a total noob, I need to start with transparency and I don't want to learn a huge tool and the platform at the same time.
Instead of using Rancher, I opted to use a lighter desktop solution: OpenLens. This doesn't run on the cluster, thus not using up resources when I am not using it.
Finding 2
After installing Longhorn, MetalLB, Prometheus-Grafana and Netdata, the k3s_server
binary uses around 20-25% (when using 4 cores) of the VM's CPU resources at "idle". So I lose 1 core while I'm not running anything yet. This seems high to me. Given that the netdata
and the accompanying go.d.plugin
also use 8-10% suggest something is taxing the system "a lot".
I found several post online about this issue and this seems to be an active discussion on Github:
Finding 3
This one is really "nooby", but I imagined kubernetes to work differently. You cannot just shut down a node, worker
or master
. The "shut-down" node needs to be "drained" first, meaning all pods are migrated to another node and the "shut-down" node is marked as "unschedulable".
kubectl cordon my-node # Mark my-node as unschedulable
kubectl drain my-node # Drain my-node in preparation for maintenance
kubectl uncordon my-node # Mark my-node as schedulable
Part 2: Installing k3s manually
While looking for a solution for finding 3, I stumbled upon the graceful-node-shutdown feature. This is configured in the KubeletConfiguration
, which I didn't know how to do using the Ansible script. Now I was completely lost needing to learn Kubernetes concepts, configuration and yaml syntax, but also Ansible concepts and scripting at the same time. So I decided to ditch Ansible and install k3s
manually. In the end, I only need to install it once, maybe twice in the near future.
While looking for information on how to configure the graceful shutdown, I stumbled on a Reddit thread with a link to a full Proxmox and k3s installation guide. I used that guide as a template for my second how-to. This is what I changed:
- G001: Obviously I'm using different hardware. The goal is a 3 node HA cluster with Ceph storage in the future.
- G002: I'm using Proxmox 8.1 instead of 7.0.
- G005: Left the storage as configured by the installer: zfs pool on machine 1, lvm pool on machine 2.
- G011: I didn't disable RPC services, zfs, ceph, SPICE proxy and cluster and high availability services as I plan on using them.
- G012: I didn't disable ipv6 on the Proxmox hosts as that would (for some reason) prevent the Ghost container from starting inside my docker VM.
- G014: I have some additional firewall rules conmfigured as per this guide.
- G017: I plan on using multiple Proxmox hosts in a HA setup, so I needed to create an SDN instead of an additional
Linux Bridge
. - G018: I created 3 master nodes with 2 vCPU's and 2 GB RAM and three worker nodes with 4 vCPU's and 8 GB RAM. More on that in G025.
- G019: see remark 3.
- G020: I used Debian 12 instead of 11, used machine type
q35
and cpu flagsmd-clear
,pcid
,ssbd
,pdpe1gb
andaes
(all supported on my Ivy Bridge and Skylake cpu's). - G021: In Debian bookworm, the apt configuration
non-free
should be changed tonon-free non-free-firmware
. - G024: Didn't disable swap.
- G025:
- See remark 8.
- Used different IP addresses.
- I also named my servers
k3smaster0x
and my agentsk3sworker0x
. - Because I wanted to use multiple masters, I had to go to G908 for the modified k3s configuration files for all the nodes
/etc/rancher/k3s/config.yaml
. - I don't install Traefik; I'm running an external Nginx proxy.
- I used a newer k3s version:
v1.28.3+k3s2
- G026: I installed
kubectl
with Snap on my Ubuntu machine:sudo snap install kubectl
- G029: Skipped as I use pfSense for that.
- G030: Skipped as I use OpenLens.
- G031: Skipped, didn't install Traefik.
- G033-G034: Didn't install
- G035: Don't use node storage; use either NFS or Longhorn (see below).
Todo
Use local Nexus repository for docker image proxyUse/mount NFS storageMigrate Ghost blog to cluster
- Migrate Wiki to cluster
- Migrate Grafana (Proxmox monitoring) to cluster
- Migrate Qbittorrent to cluster