Proxmox + Docker in LXC containers

Homelab May 1, 2024

The old way

When I started with Proxmox, I installed Proxmox on a disk pool with ZFS. In the meantime the exact disk layout has changed, but the filesystem is still ZFS. Docker didn't natively support ZFS as a filesystem. It worked, but it would be really slow. In 2024 this shouldn't be the case anymore as Docker now has a ZFS storage driver. More on that in "The new way section".

The best way to get Docker working, also the recommended way, is by using a full VM. This VM will probably format its virtual disk as ext4 which will perform great with Docker. However, a VM uses more resource than an LXC container. LXC, however, uses the filesystem of the OS (host) it runs on, in my case ZFS. Luckily there is a workaround.

  1. Create EXT4 partition on ZFS:
  2. Create a mount point: that stores Docker data /var/lib/docker on this partition instead of on ZFS:

Pros, cons and gotchas

This approach works perfectly fine. However, there are some things to think about:

  • If you migrate an LXC container to another Proxmox host, the container is stopped on the old host, replicated on the new host, started on the new host and destroyed on the old host. This will result in downtime during the replication step. When using a VM, there is virtually no downtime.
  • You can save backup time and space if you uncheck "Backup" on the mount point. Why backup images, containers and volumes, right? That's part of the reason of using Docker in the first place. Those should all be ephemeral.
  • If however you use Docker volumes to store data instead of a mapped path, all your data is lost if you unchecked "Backup" on the mount point and restore the container from a backup. Yes, that happened to me.
  • If you decide to backup the mount point, you probably run into exit code 23 (something like: command 'rsync --stats -h -X -A --numeric-ids -aH --delete --no-whole-file --sparse --one-file-system --relative '--exclude=/tmp/?*' '--exclude=/var/tmp/?*' '--exclude=/var/run/?*.pid' /proc/295998/root//./ /var/tmp/vzdumptmp296420_101' failed: exit code 23). This can be solved by setting the acltype to posixacl for the temporary backup directory.
    • Create a backup temp folder /tmp/vzdump: sudo mkdir /tmp/vzdump
    • Create a zfs mount rpool/tmp: sudo zfs set mountpoint=/tmp/vzdump rpool/tmp
    • Set acl type to POSIX for mount: sudo zfs set acltype=posixacl rpool/tmp
    • Edit tmpdir: /tmp/vzdump in /etc/vzdump.conf: sudo nano /etc/vzdump
# vzdump default settings

tmpdir: /tmp/vzdump
#dumpdir: DIR
#storage: STORAGE_ID
#mode: snapshot|suspend|stop
#bwlimit: KBPS
#performance: [max-workers=N][,pbs-entries-max=N]
#ionice: PRI
#lockwait: MINUTES
#stopwait: MINUTES
#stdexcludes: BOOLEAN
notification-policy: failure
#prune-backups: keep-INTERVAL=N[,...]
#exclude-path: PATHLIST
#pigz: N
notes-template: {{guestname}}
# xyz.conf
lxc.apparmor.profile: unconfined
lxc.cgroup2.devices.allow: a

The new way (not working)

Use the ZFS storage driver
Learn how to optimize your use of ZFS driver.
  • Create /etc/docker/daemon.json and add:
  "storage-driver": "zfs"
  • Restart docker: systemctl restart docker
  • And it doesn't work: failed to start daemon: error initializing graphdriver: prerequisites for driver not satisfied (wrong filesystem?): zfs

Right... So I tried to reboot the container, add a dedicated mount point to /var/lib/docker, made sure the filesystem is zfs: df -T.

root@docker-test:~# df -T
Filesystem                   Type     1K-blocks   Used Available Use% Mounted on
rpool/data/subvol-100-disk-0 zfs        8388608 808320   7580288  10% /
rpool/data/subvol-100-disk-1 zfs        8388608    256   8388352   1% /var/lib/docker
none                         tmpfs          492      4       488   1% /dev
efivarfs                     efivarfs       176     93        79  55% /sys/firmware/efi/efivars
tmpfs                        tmpfs    131892140      0 131892140   0% /dev/shm
tmpfs                        tmpfs     52756856     84  52756772   1% /run
tmpfs                        tmpfs         5120      0      5120   0% /run/lock

None of the above worked. It seems that docker communicates with zfs, which runs on the host, not inside the LXC.

I tried mapping /dev/zfs to the container: lxc.mount.entry: /dev/zfs dev/zfs none rbind,create=file but this gave me a new error: failed to start daemon: error initializing graphdriver: Cannot find root filesystem rpool/data/subvol-100-disk-1: exit status 1: "/usr/sbin/zfs list -rHp -t filesystem -o name,origin,used,available,mountpoint,compression,type,volsize,quota,referenced,written,logicalused,usedbydataset rpool/data/subvol-100-disk-1" => cannot open 'rpool/data/subvol-100-disk-1': dataset does not exist.

This sounds reasonable as that mount is from the host. Telling the LXC what this mount means by passing /proc/mounts/ (lxc.mount.entry: /proc/self/mounts proc/self/mounts none rbind,create=file) made the container not boot at all (Too many levels of symbolic links).