Highly available kubernetes cluster with etcd, Longhorn and MetalLB (on Proxmox)

Homelab Nov 23, 2023
Original sources:
- Techno Tim: https://github.com/techno-tim/k3s-ansible
- Inspired by Hardware Haven video: https://youtu.be/S_pp_nc5QuI
- https://github.com/ehlesp/smallab-k8s-pve-guide
- and multiple other sources.

Why kubernetes?

In my current setup, I'm running multiple services as either docker containers (inside a VM or LXC on proxmox) or directly inside a LXC. Traffic to public services like this blog and my Wiki is routed to my containers/VM's via HAProxy on my pfSense machine. My internal services are exposed using a Nginx reverse proxy (LXC), which gets its TLS certificates from that same pfSense machine.

This works fine, but there is one downside. My Proxmox servers need to be rebooted once in a while (more often as I'm always playing around with them). I also have had servers lock up on multiple occasions, also requiring me to reboot them. When rebooting, my public services are temporarily not available. My sites are not that important, but still, this can be done better.

Another reason to dig into this is for professional reasons. My company, jodiBooks, currently runs its services on EC2 instances in the cloud. So far, without major issues. But it would be awesome if those services would automatically scale under load or recover from issues.

So in comes Kubernetes or K8s. It comes in many flavors; k0s, k3s and microk8s among others. Then there are also the cloud based variants like AKS, EKS, GKE, LKE and many more. For my homelab I have initially chosen k3s as it seems to be the one that is the most documented and relatively lightweight (more on that later). In the future I will explore EKS for production/business purposes.

Goal of this post

There are numerous guides on the internet on how to install k3s. All have there specific use-cases or interpretations. I have watched multiple YouTube videos and explored even more written tutorials and instructions. I still have no idea how Kubernetes fundamentally works and what the commands are, but I learn best by doing. This post explains what I tried and my initial findings. A list of commands and a how-to can be found on the wiki:

  1. With the Ansible script: https://wiki.joeplaa.com/en/tutorials/highly-available-kubernetes-cluster-on-proxmox
  2. Manually: https://wiki.joeplaa.com/en/tutorials/highly-available-kubernetes-cluster-on-proxmox-2

I want to gradually explore what I can do with kubernetes and how to migrate my existing services. By doing it step by step I hope to find out which services and software I actually need. Also, which services are overkill or simply too resource intensive.

I also want to (initially) keep using an "external" reverse proxy (Nginx in LXC for private sites, HAProxy on pfSense for public sites) and an "external" certificate manager (ACME on pfSense). This because it works and I don't want to make it unnecessarily complicated, I hope. But, this is where my appoach seems to deviate from most guides I found, which all use Traefik, Nginx or HAProxy as a service inside of kubernetes (I'll call that "internal" here as opposed to my "external" use-case). So I had to figure out which commands to skip or change and where to edit the ansible script.

Part 1: using the Ansible script

Initally I build a cluster of 3 "control" nodes (called master nodes in the script) and 2 "worker" nodes (simply called node in the script). I have done this accross two Proxmox servers linked together in a cluster, but this can also be done on a single Proxmox machine. I than entered my specific data in the Ansible script and ran it.

As a side note: I'm also playing with the idea of making my Proxmox cluster highly available in combination with Ceph storage for my VM disks. The idea here is that if a Proxmox node goes offline, the VM's and containers running on that node are restarted on the others.

Open question 1: If I run a HA kubernetes cluster (3 nodes) with HA storage (Longhorn) on top of a HA Proxmox cluster (3 nodes) with HA storage (Ceph), isn't that duplicating the "HA" part? And do I than have 6 copies of my data?

Open question 2: Which services should I run in Kubernetes and which in Proxmox containers? And why?

Finding 1

In the Hardware Haven video he installed Rancher to get a webUI for the cluster. Rancher however seemed to use a lot of resources out of the box (https://stackoverflow.com/questions/60736392/rancher-high-cpu-utilization-even-with-zero-clusters-under-management). It is also huge and abstracted a lot of details away. This is not a downside per se, but as I am a total noob, I need to start with transparency and I don't want to learn a huge tool and the platform at the same time.

Instead of using Rancher, I opted to use a lighter desktop solution: OpenLens. This doesn't run on the cluster, thus not using up resources when I am not using it.

Finding 2

After installing Longhorn, MetalLB, Prometheus-Grafana and Netdata, the k3s_server binary uses around 20-25% (when using 4 cores) of the VM's CPU resources at "idle". So I lose 1 core while I'm not running anything yet. This seems high to me. Given that the netdata and the accompanying go.d.plugin also use 8-10% suggest something is taxing the system "a lot".

I found several post online about this issue and this seems to be an active discussion on Github:

Idle CPU usage of k3s is high and continuously increases over time
Over time I have regularly noticed resource usage slowly creeps up and up until this idle cluster is using obnoxiously high levels of CPU.
k3s causes a high load average · Issue #294 · k3s-io/k3s
Describe the bug I’m not sure if it’s a bug, but I think it’s not an expected behaviour. When running k3s on any computer, it causes a very high load average. To have a concrete example, I’ll expla…
High CPU and Memory Load on single node k3s cluster · k3s-io/k3s · Discussion #5769
HI, i have an ubuntu 20.02 machine, which has k3s installe. After installing rancher, the memory and cpu consumption of the k3s service is skyrocketing. versions: “Ubuntu 20.04.1 LTS” - export INST…

Finding 3

This one is really "nooby", but I imagined kubernetes to work differently. You cannot just shut down a node, worker or master. The "shut-down" node needs to be "drained" first, meaning all pods are migrated to another node and the "shut-down" node is marked as "unschedulable".

Safely Drain a Node
This page shows how to safely drain a node, optionally respecting the PodDisruptionBudget you have defined. Before you begin This task assumes that you have met the following prerequisites: You do not require your applications to be highly available during the node drain, or You have read about the…
kubectl cordon my-node                                                # Mark my-node as unschedulable
kubectl drain my-node                                                 # Drain my-node in preparation for maintenance
kubectl uncordon my-node                                              # Mark my-node as schedulable

Part 2: Installing k3s manually

While looking for a solution for finding 3, I stumbled upon the graceful-node-shutdown feature. This is configured in the KubeletConfiguration, which I didn't know how to do using the Ansible script. Now I was completely lost needing to learn Kubernetes concepts, configuration and yaml syntax, but also Ansible concepts and scripting at the same time. So I decided to ditch Ansible and install k3s manually. In the end, I only need to install it once, maybe twice in the near future.

While looking for information on how to configure the graceful shutdown, I stumbled on a Reddit thread with a link to a full Proxmox and k3s installation guide. I used that guide as a template for my second how-to. This is what I changed:

  1. G001: Obviously I'm using different hardware. The goal is a 3 node HA cluster with Ceph storage in the future.
  2. G002: I'm using Proxmox 8.1 instead of 7.0.
  3. G005: Left the storage as configured by the installer: zfs pool on machine 1, lvm pool on machine 2.
  4. G011: I didn't disable RPC services, zfs, ceph, SPICE proxy and cluster and high availability services as I plan on using them.
  5. G012: I didn't disable ipv6 on the Proxmox hosts as that would (for some reason) prevent the Ghost container from starting inside my docker VM.
  6. G014: I have some additional firewall rules conmfigured as per this guide.
  7. G017: I plan on using multiple Proxmox hosts in a HA setup, so I needed to create an SDN instead of an additional Linux Bridge.
  8. G018: I created 3 master nodes with 2 vCPU's and 2 GB RAM and three worker nodes with 4 vCPU's and 8 GB RAM. More on that in G025.
  9. G019: see remark 3.
  10. G020: I used Debian 12 instead of 11, used machine type q35 and cpu flags md-clear, pcid, ssbd, pdpe1gb and aes (all supported on my Ivy Bridge and Skylake cpu's).
  11. G021: In Debian bookworm, the apt configuration non-free should be changed to non-free non-free-firmware.
  12. G024: Didn't disable swap.
  13. G025:
    1. See remark 8.
    2. Used different IP addresses.
    3. I also named my servers k3smaster0x and my agents k3sworker0x.
    4. Because I wanted to use multiple masters, I had to go to G908 for the modified k3s configuration files for all the nodes /etc/rancher/k3s/config.yaml.
    5. I don't install Traefik; I'm running an external Nginx proxy.
    6. I used a newer k3s version: v1.28.3+k3s2
  14. G026: I installed kubectl with Snap on my Ubuntu machine: sudo snap install kubectl
  15. G029: Skipped as I use pfSense for that.
  16. G030: Skipped as I use OpenLens.
  17. G031: Skipped, didn't install Traefik.
  18. G033-G034: Didn't install
  19. G035: Don't use node storage; use either NFS or Longhorn (see below).


Use local Nexus repository for docker image proxyUse/mount NFS storageMigrate Ghost blog to cluster

  • Migrate Wiki to cluster
  • Migrate Grafana (Proxmox monitoring) to cluster
  • Migrate Qbittorrent to cluster