Mellanox ConnectX-3 on Proxmox 8.2

Homelab Sep 17, 2024

Enable interfaces in Linux

Disable guests auto-booting on restart

When I'm installing/configuring new stuff on my server, I don't want to wait for all guests to have booted after each reboot. Therefor I temporarily disabled this feature:

systemctl disable pve-guests.service

Download and install config software

Debian already comes with the correct driver. We just need to configure the card and enable the right kernel modules. Go to the download page, which is currently here: https://network.nvidia.com/products/adapter-software/firmware-tools/ and download version 4.22 of the mft tools. The version is important as this is the last version that supports ConnectX-3 cards, newer version will not find the device:

root@pve:~# mst start
Starting MST (Mellanox Software Tools) driver set
Loading MST PCI module - Success
Loading MST PCI configuration module - Success
Create devices
Unloading MST PCI module (unused) - Success
Unloading MST PCI configuration module (unused) - Success
root@pve:~# mst status
MST modules:
------------
MST PCI module is not loaded
MST PCI configuration module is not loaded

PCI Devices:
------------

No devices were found.
ℹ️
My cards came with the latest firmware installed. You might need to upgrade this before proceeding. See initial source for more info.

Unpack the tools and install.

wget https://www.mellanox.com/downloads/MFT/mft-4.22.1-417-x86_64-deb.tgz
tar -xvzf mft-4.22.1-417-x86_64-deb.tgz
cd mft-4.22.1-417-x86_64-deb
./install

You should now be able to start the tools and check the status of the card.

mst start
mst status

If so, configure the card. In this example the card will be configured with 8 virtual devices (SR-IOV), 4 on each port. If you don't need virtual devices, set SRIOV_EN=0 and ignore NUM_OF_VFS=8.

mlxconfig -d /dev/mst/mt4099_pciconf0 q
mlxconfig -d /dev/mst/mt4099_pciconf0 set SRIOV_EN=1 NUM_OF_VFS=8

Reboot the machine.

Create a file /etc/modprobe.d/mlx4_core.conf and add these config values:

options mlx4_core num_vfs=4,4,0 port_type_array=2,2 probe_vf=4,4,0
options mlx4_core enable_sys_tune=1
options mlx4_en inline_thold=0
options mlx4_core log_num_mgm_entry_size=-7

Load the modules and make them persistent after reboot.

modprobe -r mlx4_en mlx4_ib
modprobe mlx4_en
update-initramfs -u

Check that all interfaces are initialized.

ip link

Reboot to make sure everything still works.

lspci -vvv | grep Mellanox

Re-enable guests auto-booting on restart

Now that we're finished, we can re-enable guests auto-booting. This command will also start this boot process.

systemctl enable pve-guests.service && /usr/bin/pvesh --nooutput create /nodes/localhost/startall

Enable interfaces in Proxmox

By now the interfaces should show up in the GUI. I added comments to make it easier to identify each interface. The first 4 virtual interfaces seem to be on physical port 1, the next 4 on port 2. Unfortunately this is not easily deducible from the interface naming:

As per the guide I configured 4 VFs per port but ip link shows 8 VFs on each port. Apparently this is some known issue because the card only advertises a single PCI device rather than a device per port. Anyway in my case the first 4 VFs seemed to map to port 1 and the last 4 VFs to port 2 but this was just trial and error. If someone knows of a better way of determining which port the VFs map to please let me know.
https://forum.proxmox.com/threads/how-to-configure-mellanox-connectx-3-cards-for-sriov-and-vfs.121927/post-685667

For now I only enabled port 1, ens1 in my case. Maybe in the future I might pass-through a virtual port to TrueNAS. For this port I created a bridge vmbr1 for which I enabled Autostart and VLAN aware. Apply these settings in the GUI.

Because you can't limit the number of vlan's on a bridge in the GUI, we have to edit these settings through the command line. This is because there is a limitation on "old Mellanox cards" (max 128 vlan id's). If you add more, you will get an error message like below:

ens1 : error: ens1: failed to set vid `{127, 10, 1931, /..all 4k VLANS listed../, 4092, 4093, 4094}` (cmd '/sbin/bridge -force -batch - [vlan add vid 127-4094 dev ens1 ]' failed: returned 1 (RTNETLINK answers: No space left on device 
TASK ERROR: command 'ifreload -a' failed: exit code 1

Open the interfaces config:

nano /etc/network/interfaces

and find the part that start with auto vmbr1 and change bridge-vids 2-4094 to a lower range: bridge-vids 2-120. For me this will work as my vlan id's are within this range. You can also specify your id's like bridge-vids 100 200 300.

Reload the settings:

ifreload -a

Tags