Highly available kubernetes cluster with etcd, Longhorn and MetalLB (on Proxmox)

Homelab Nov 23, 2023

🙏

Original sources:
- Techno Tim: https://github.com/techno-tim/k3s-ansible
- Inspired by Hardware Haven video: https://youtu.be/S_pp_nc5QuI
- https://github.com/ehlesp/smallab-k8s-pve-guide
- and multiple other sources.

Why kubernetes?

In my current setup, I'm running multiple services as either docker containers (inside a VM or LXC on proxmox) or directly inside a LXC. Traffic to public services like this blog and my Wiki is routed to my containers/VM's via HAProxy on my pfSense machine. My internal services are exposed using a Nginx reverse proxy (LXC), which gets its TLS certificates from that same pfSense machine.

This works fine, but there is one downside. My Proxmox servers need to be rebooted once in a while (more often as I'm always playing around with them). I also have had servers lock up on multiple occasions, also requiring me to reboot them. When rebooting, my public services are temporarily not available. My sites are not that important, but still, this can be done better.

Another reason to dig into this is for professional reasons. My company, jodiBooks, currently runs its services on EC2 instances in the cloud. So far, without major issues. But it would be awesome if those services would automatically scale under load or recover from issues.

So in comes Kubernetes or K8s. It comes in many flavors; k0s, k3s and microk8s among others. Then there are also the cloud based variants like AKS, EKS, GKE, LKE and many more. For my homelab I have initially chosen k3s as it seems to be the one that is the most documented and relatively lightweight (more on that later). In the future I will explore EKS for production/business purposes.

Goal of this post

There are numerous guides on the internet on how to install k3s. All have there specific use-cases or interpretations. I have watched multiple YouTube videos and explored even more written tutorials and instructions. I still have no idea how Kubernetes fundamentally works and what the commands are, but I learn best by doing. This post explains what I tried and my initial findings. A list of commands and a how-to can be found on the wiki:

With the Ansible script: https://wiki.joeplaa.com/en/tutorials/highly-available-kubernetes-cluster-on-proxmox
Manually: https://wiki.joeplaa.com/en/tutorials/highly-available-kubernetes-cluster-on-proxmox-2

I want to gradually explore what I can do with kubernetes and how to migrate my existing services. By doing it step by step I hope to find out which services and software I actually need. Also, which services are overkill or simply too resource intensive.

I also want to (initially) keep using an "external" reverse proxy (Nginx in LXC for private sites, HAProxy on pfSense for public sites) and an "external" certificate manager (ACME on pfSense). This because it works and I don't want to make it unnecessarily complicated, I hope. But, this is where my appoach seems to deviate from most guides I found, which all use Traefik, Nginx or HAProxy as a service inside of kubernetes (I'll call that "internal" here as opposed to my "external" use-case). So I had to figure out which commands to skip or change and where to edit the ansible script.

Part 1: using the Ansible script

Initally I build a cluster of 3 "control" nodes (called master nodes in the script) and 2 "worker" nodes (simply called node in the script). I have done this accross two Proxmox servers linked together in a cluster, but this can also be done on a single Proxmox machine. I than entered my specific data in the Ansible script and ran it.

⁉️

As a side note: I'm also playing with the idea of making my Proxmox cluster highly available in combination with Ceph storage for my VM disks. The idea here is that if a Proxmox node goes offline, the VM's and containers running on that node are restarted on the others.

Open question 1: If I run a HA kubernetes cluster (3 nodes) with HA storage (Longhorn) on top of a HA Proxmox cluster (3 nodes) with HA storage (Ceph), isn't that duplicating the "HA" part? And do I than have 6 copies of my data?

Open question 2: Which services should I run in Kubernetes and which in Proxmox containers? And why?

Finding 1

In the Hardware Haven video he installed Rancher to get a webUI for the cluster. Rancher however seemed to use a lot of resources out of the box (https://stackoverflow.com/questions/60736392/rancher-high-cpu-utilization-even-with-zero-clusters-under-management). It is also huge and abstracted a lot of details away. This is not a downside per se, but as I am a total noob, I need to start with transparency and I don't want to learn a huge tool and the platform at the same time.

Instead of using Rancher, I opted to use a lighter desktop solution: OpenLens. This doesn't run on the cluster, thus not using up resources when I am not using it.

Finding 2

After installing Longhorn, MetalLB, Prometheus-Grafana and Netdata, the k3s_server binary uses around 20-25% (when using 4 cores) of the VM's CPU resources at "idle". So I lose 1 core while I'm not running anything yet. This seems high to me. Given that the netdata and the accompanying go.d.plugin also use 8-10% suggest something is taxing the system "a lot".

I found several post online about this issue and this seems to be an active discussion on Github:

Finding 3

This one is really "nooby", but I imagined kubernetes to work differently. You cannot just shut down a node, worker or master. The "shut-down" node needs to be "drained" first, meaning all pods are migrated to another node and the "shut-down" node is marked as "unschedulable".

kubectl cordon my-node                                                # Mark my-node as unschedulable
kubectl drain my-node                                                 # Drain my-node in preparation for maintenance
kubectl uncordon my-node                                              # Mark my-node as schedulable

Part 2: Installing k3s manually

While looking for a solution for finding 3, I stumbled upon the graceful-node-shutdown feature. This is configured in the KubeletConfiguration, which I didn't know how to do using the Ansible script. Now I was completely lost needing to learn Kubernetes concepts, configuration and yaml syntax, but also Ansible concepts and scripting at the same time. So I decided to ditch Ansible and install k3s manually. In the end, I only need to install it once, maybe twice in the near future.

While looking for information on how to configure the graceful shutdown, I stumbled on a Reddit thread with a link to a full Proxmox and k3s installation guide. I used that guide as a template for my second how-to. This is what I changed:

G001: Obviously I'm using different hardware. The goal is a 3 node HA cluster with Ceph storage in the future.
G002: I'm using Proxmox 8.1 instead of 7.0.
G005: Left the storage as configured by the installer: zfs pool on machine 1, lvm pool on machine 2.
G011: I didn't disable RPC services, zfs, ceph, SPICE proxy and cluster and high availability services as I plan on using them.
G012: I didn't disable ipv6 on the Proxmox hosts as that would (for some reason) prevent the Ghost container from starting inside my docker VM.
G014: I have some additional firewall rules conmfigured as per this guide.
G017: I plan on using multiple Proxmox hosts in a HA setup, so I needed to create an SDN instead of an additional Linux Bridge.
G018: I created 3 master nodes with 2 vCPU's and 2 GB RAM and three worker nodes with 4 vCPU's and 8 GB RAM. More on that in G025.
G019: see remark 3.
G020: I used Debian 12 instead of 11, used machine type q35 and cpu flags md-clear, pcid, ssbd, pdpe1gb and aes (all supported on my Ivy Bridge and Skylake cpu's).
G021: In Debian bookworm, the apt configuration non-free should be changed to non-free non-free-firmware.
G024: Didn't disable swap.
G025:
1. See remark 8.
2. Used different IP addresses.
3. I also named my servers k3smaster0x and my agents k3sworker0x.
4. Because I wanted to use multiple masters, I had to go to G908 for the modified k3s configuration files for all the nodes /etc/rancher/k3s/config.yaml.
5. I don't install Traefik; I'm running an external Nginx proxy.
6. I used a newer k3s version: v1.28.3+k3s2
G026: I installed kubectl with Snap on my Ubuntu machine: sudo snap install kubectl
G029: Skipped as I use pfSense for that.
G030: Skipped as I use OpenLens.
G031: Skipped, didn't install Traefik.
G033-G034: Didn't install
G035: Don't use node storage; use either NFS or Longhorn (see below).

Todo

Use local Nexus repository for docker image proxyUse/mount NFS storageMigrate Ghost blog to cluster

Migrate Wiki to cluster
Migrate Grafana (Proxmox monitoring) to cluster
Migrate Qbittorrent to cluster

Recommended for you

AWS

IPv6 adventures 4: AWS

a month ago • 16 min read

Network Engineering

IPv6 adventures 3: IPv6 only hosts

4 months ago • 5 min read

More Nexus repositories

5 months ago • 6 min read

IPv6 adventures 4: AWS

IPv6 adventures 3: IPv6 only hosts

More Nexus repositories

IPv6 adventures part 2: more roadblocks

Highly available kubernetes cluster with etcd, Longhorn and MetalLB (on Proxmox)

Why kubernetes?

Goal of this post

Part 1: using the Ansible script

Finding 1

Finding 2

Finding 3

Part 2: Installing k3s manually

Todo

Use local Nexus repository for docker image proxyUse/mount NFS storageMigrate Ghost blog to cluster

Tags

Joep van de Laarschot

Recommended for you

IPv6 adventures 4: AWS

IPv6 adventures 3: IPv6 only hosts

More Nexus repositories