iSCSI vs NFS vs SMB

Oct 18, 2022

Intro

Having a TrueNAS system gives you the opportunity to use multiple types of network attached storage. Depending on the use case or OS, you can use iSCSI, NFS or SMB shares.

I use SMB shares for file sharing between our desktops, laptops and "desktop" VM's and the server. Why SMB? Mainly because it is easy to set up and is natively supported on Windows.

NFS file sharing is used as a backup target for my Proxmox server. I have also used it as a target for some VM's to store VM specific data (e.g. TeamCity server needs a n artifact storage location), but I have migrated those to iSCSI drives recently.

So that's where iSCSI comes in. With iSCSI I am able to create dedicated (network) drives for specific VM's and I can use it for Proxmox as an additional virtual disk location. The latter is currently in the idea-phase only as I want to upgrade my network connections to 10+ Gbit first.

Performance differences

Apart from some practical considerations like OS support, ease of configuration and use case, each share type also performs differently. This is obviously dependent on your network and storage layout and configuration, so results may vary. For my hardware scroll down to the end of this post.

To test the performance on my setup, I created a VM on my Proxmox server with 4 vCPU's and 4GB of memory. On this VM I installed fio and iperf3.



Network speed

First I used iperf3 to test the network speed between my TrueNAS server and this test VM. The single threaded performance was as expected, approaching 1 Gbit/s.

JPL-TRUENAS# iperf3 -c 10.33.60.33
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   113 MBytes   951 Mbits/sec  111    868 KBytes       
[  5]   1.00-2.00   sec   112 MBytes   936 Mbits/sec  194    427 KBytes       
[  5]   2.00-3.00   sec   111 MBytes   929 Mbits/sec  188    521 KBytes       
[  5]   3.00-4.00   sec   111 MBytes   934 Mbits/sec  256    526 KBytes       
[  5]   4.00-5.00   sec   111 MBytes   933 Mbits/sec  294    476 KBytes       
[  5]   5.00-6.00   sec  82.4 MBytes   692 Mbits/sec   78    418 KBytes       
[  5]   6.00-7.00   sec   110 MBytes   926 Mbits/sec  279    241 KBytes       
[  5]   7.00-8.00   sec   112 MBytes   939 Mbits/sec    0    620 KBytes       
[  5]   8.00-9.00   sec   112 MBytes   939 Mbits/sec    0    844 KBytes       
[  5]   9.00-10.00  sec   112 MBytes   936 Mbits/sec  261    579 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.06 GBytes   912 Mbits/sec  1661             sender
[  5]   0.00-10.04  sec  1.06 GBytes   907 Mbits/sec                  receiver

But as I've set up link aggregation (lagg) of two gigabit connections the theoretical speed should be able to be double that. Let's see what happens if I use two parallel threads:

JPL-TRUENAS# iperf3 -c 10.33.60.33 -P 2
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   627 MBytes   526 Mbits/sec  573             sender
[  5]   0.00-10.04  sec   626 MBytes   523 Mbits/sec                  receiver
[  7]   0.00-10.00  sec   492 MBytes   413 Mbits/sec  700             sender
[  7]   0.00-10.04  sec   491 MBytes   410 Mbits/sec                  receiver
[SUM]   0.00-10.00  sec  1.09 GBytes   939 Mbits/sec  1273             sender
[SUM]   0.00-10.04  sec  1.09 GBytes   933 Mbits/sec                  receiver

Hmm, not what I expected. What about 4 threads?

JPL-TRUENAS# iperf3 -c 10.33.60.33 -P 4
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   301 MBytes   253 Mbits/sec  517             sender
[  5]   0.00-10.04  sec   300 MBytes   251 Mbits/sec                  receiver
[  7]   0.00-10.00  sec  1.08 GBytes   928 Mbits/sec  1146             sender
[  7]   0.00-10.04  sec  1.08 GBytes   924 Mbits/sec                  receiver
[  9]   0.00-10.00  sec   364 MBytes   306 Mbits/sec  580             sender
[  9]   0.00-10.04  sec   364 MBytes   304 Mbits/sec                  receiver
[ 11]   0.00-10.00  sec   454 MBytes   381 Mbits/sec  153             sender
[ 11]   0.00-10.04  sec   453 MBytes   379 Mbits/sec                  receiver
[SUM]   0.00-10.00  sec  2.17 GBytes  1.87 Gbits/sec  2396             sender
[SUM]   0.00-10.04  sec  2.17 GBytes  1.86 Gbits/sec                  receiver

Interesting...

Anyway, we now know what's the limitation of the network: ~912 Mbit/s (114 MB/s) single threaded and ~1.86 Gbits/s (232.5 MB/s) multithreaded. While doing some more research I read that for iSCSI it is better to use multipath-IO instead of a lagg with LACP: "Throw LACP out the window, use MPIO instead."

ESXI 6.7 iSCSI with 4 NICs + FreeNAS with 4 NICs in LACP for iSCSI
Hi all, I set up a FreeNAS-11.3-U3.2 in order to use it as a SAN (using i SCSI) for an ESXi 6.7 free version. I am aware that some features of the ESXi free version are not available (like LACP), therefore this thread :) ESXi 6.7 Free SWITCH...

Network share speed (attempt 1)

I created a benchmarking dataset on my storage pool in TrueNAS. Then I created 3 sub-datasets/zvol: benchmark-iscsi, benchmark-nfs and benchmark-smb. Each with a size quota of 120 GiB. This might seem a random number, but the idea was that it should fit 6 test files of 16 GB each, so ~96 GB (the 16 GB is actually random).

⚠️
There is one more thing to consider when using a ZFS pool. ZFS makes intense use of memory with its ARC mechanism. Some people thus argue that you need to create a test file that is at least twice as big as your RAM. In my case that would mean a test file of 256 GB. Now, while running the tests, I learned that you can create a single test file and share it between tests. So I wouldn't have needed to create 6. But as I ran into a lot of other issues (man, doing benchmarks is hard), for now, we can forget about this test file.

For each dataset I created a matching file share, which I mounted in the test VM (the iSCSI drive was formatted as ext4).

joep@jpl-test4:~$ df -h
Filesystem                                          Size  Used Avail Use% Mounted on
tmpfs                                               393M  1.3M  392M   1% /run
/dev/mapper/ubuntu--vg-ubuntu--lv                   9.8G  4.9G  4.4G  53% /
tmpfs                                               2.0G     0  2.0G   0% /dev/shm
tmpfs                                               5.0M     0  5.0M   0% /run/lock
/dev/sda2                                           1.7G  127M  1.5G   8% /boot
//10.33.60.10/benchmark-smb                         120G  192K  120G   1% /mnt/smb
10.33.60.10:/mnt/store1/benchmarking/benchmark-nfs  120G  128K  120G   1% /mnt/nfs
tmpfs                                               393M  4.0K  393M   1% /run/user/1000
/dev/sdb1                                           118G   24K  112G   1% /mnt/iscsi

First I tried running tests with fio from the command line, but somehow the results were "horrible". I probably used the wrong settings. After some searching and experimenting I found "W3-Top" (installation instructions can be found in appendix B).

W3-Top made it really easy to do the benchmarking.

  1. First you open a browser to jpl-test4:5050.
  2. Click the logo on the top-left and select "Disk Benchmark".
  3. Click the "Start Disk Benchmark" button, select the disk (or share) to benchmark.
  4. Check the settings (I left everything at defaults apart from the "Working Set" which I increased to 16384 (16 GB).
  5. Start.

After running this test for all shares, we can compare them. W3-Top keeps a list of historic test runs, which can be opened to see more details.

Now, which one is fastest? Well, that depends. It really is a mixed bag. The one that really stands out to me is random writes with SMB.

⚠️
NFS writes data by default in synchronous mode. SMB on the other hand can only write asynchronously, which is much faster but less safe. See "ZFS sync/async + ZIL/SLOG, explained" or "Some insights into SLOG/ZIL with ZFS on FreeNAS".

To get a fair comparison I disabled sync writes on all datasets, i.e. they all use async writes. At least that's the theory. If I look at the results I can only explain the high SMB random write speeds by saying this is due to the async nature of SMB. But that's just guessing, as I lack the proper knowledge.

When the "Engine" mode is set to auto, W3-Top select io_uring as the engine. This engine wasn't available on FreeBSD (TrueNAS), so I reran the tests in W3-Top using an engine that is: posixaio. Again the results are summarized using Excel and look very similar.

Disk/pool speed (attempt 1)

To verify that we were actually hitting the network limits and not the disk (zfs pool) limits, we first have to benchmark the disk pool. To test the network speed I used W3-Top, but I didn't want to install it on my TrueNAS server (I don't even know if that's possible). So I dug into the source code to reverse engineer the exact fio command it uses to benchmark.

This gave me the following script to run on my TrueNAS server:

STHREADS=1
MTHREADS=16

BLOCKSIZE=4k
TESTFILESIZE=16G
ENGINE=posixaio

# pool
fio --name=seq_read_iscsi --ioengine=${ENGINE} --direct=1 --gtod_reduce=1 --output=seq_read_iscsi.txt --bs=${BLOCKSIZE} --iodepth=${STHREADS} --size=${TESTFILESIZE} --runtime=30 --ramp_time=0 --readwrite=read --time_based --group_reporting --numjobs=${STHREADS} --filename=/mnt/store1/benchmarking/test.tmp
fio --name=seq_write_iscsi --ioengine=${ENGINE} --direct=1 --gtod_reduce=1 --output=seq_write_iscsi.txt --bs=${BLOCKSIZE} --iodepth=${STHREADS} --size=${TESTFILESIZE} --runtime=30 --ramp_time=0 --readwrite=write --time_based --group_reporting --numjobs=${STHREADS} --filename=/mnt/store1/benchmarking/test.tmp
fio --name=rand_read${STHREADS}_iscsi --ioengine=${ENGINE} --direct=1 --gtod_reduce=1 --output=rand_read${STHREADS}_iscsi.txt --bs=${BLOCKSIZE} --iodepth=${STHREADS} --size=${TESTFILESIZE} --runtime=30 --ramp_time=0 --readwrite=randread --time_based --group_reporting --numjobs=${STHREADS} --filename=/mnt/store1/benchmarking/test.tmp
fio --name=rand_write${STHREADS}_iscsi --ioengine=${ENGINE} --direct=1 --gtod_reduce=1 --output=rand_write${STHREADS}_iscsi.txt --bs=${BLOCKSIZE} --iodepth=${STHREADS} --size=${TESTFILESIZE} --runtime=30 --ramp_time=0 --readwrite=randwrite --time_based --group_reporting --numjobs=${STHREADS} --filename=/mnt/store1/benchmarking/test.tmp
fio --name=rand_read${MTHREADS}_iscsi --ioengine=${ENGINE} --direct=1 --gtod_reduce=1 --output=rand_read${MTHREADS}_iscsi.txt --bs=${BLOCKSIZE} --iodepth=${MTHREADS} --size=${TESTFILESIZE} --runtime=30 --ramp_time=0 --readwrite=randread --time_based --group_reporting --numjobs=${MTHREADS} --filename=/mnt/store1/benchmarking/test.tmp
fio --name=rand_write${MTHREADS}_iscsi --ioengine=${ENGINE} --direct=1 --gtod_reduce=1 --output=rand_write${MTHREADS}_iscsi.txt --bs=${BLOCKSIZE} --iodepth=${MTHREADS} --size=${TESTFILESIZE} --runtime=30 --ramp_time=0 --readwrite=randwrite --time_based --group_reporting --numjobs=${MTHREADS} --filename=/mnt/store1/benchmarking/test.tmp

Speeds and especially IOPS for reads are way above the results I got from the network tests. So it seems unlikely the pool is a bottleneck.

Network share speed (attempt 2)

However, to make the comparison completely fair, I now wanted to run the script I used to benchmark the pool on TrueNAS to benchmark the network shares. Alas, I ran into the same as mentioned earlier: the results were "horrible". I tried many different settings, but all gave me way worst results than I got from W3-Top.

At first I wanted to throw in the towel at this point. But than I remembered there is another benchmark suite (phoronix-test-suite) that does run on FreeBSD as shown by Tom (Lawrence Systems) in his TrueNAS Core vs Scale benchmark video.

To get this to work follow the steps below. This will install version 10.8.4 (the latest as of writing) of the suite, install the fio tests and configure your mounts as benchmark targets.

1. Install the test suite

wget http://phoronix-test-suite.com/releases/repo/pts.debian/files/phoronix-test-suite_10.8.4_all.deb
sudo apt update
sudo apt install --no-install-recommends php8.1 php8.1-zip
sudo dpkg -i phoronix-test-suite_10.8.4_all.deb

2. Install the fio tests and dependencies

phoronix-test-suite install pts/fio

3. Modify the test-definition

nano ~/.phoronix-test-suite/test-profiles/pts/fio-1.15.0/test-definition.xml

Now scroll all the way down and modify to (source: https://github.com/phoronix-test-suite/phoronix-test-suite/issues/202#issuecomment-396281442):

    <Option>
      <DisplayName>Disk Target</DisplayName>
      <Identifier>custom-field</Identifier>
      <ArgumentPrefix></ArgumentPrefix>
      <ArgumentPostfix></ArgumentPostfix>
      <DefaultEntry>0</DefaultEntry>
      <Menu>
        <Entry>
          <Name>iSCSI</Name>
          <Value>/mnt/iscsi</Value>
          <Message></Message>
        </Entry>
        <Entry>
          <Name>NFS</Name>
          <Value>/mnt/nfs</Value>
          <Message></Message>
        </Entry>
        <Entry>
          <Name>SMB</Name>
          <Value>/mnt/smb</Value>
          <Message></Message>
        </Entry>
      </Menu>
    </Option>

4. Run the tests

phoronix-test-suite benchmark pts/fio-1.15.0
  • Disk Test Configuration: 5 --> "Test All Options"
  • Engine: 2 --> "POSIX AIO"
  • Buffered: 2 --> "No"
  • Direct: 2 --> "Yes"
  • Block Size: 1 --> "4KB"
  • Disk Target: 4 --> Test All Options
  • Save results: Y
  • Enter a name for the result file: fio-shares-benchmark
  • Enter a unique name to describe this test run: fio-shares-benchmark-20221018-1230

Disk/pool speed (attempt 2)

1. Create a jail

I used the TrueNAS GUI to create a jail named phoronix-test-suite based on FreeBSD 13.1. I also added a Mount Point from the benchmarking dataset to a local (jail) mount point /mnt/benchmarking.

2. Enable SSH inside the jail

In the TrueNAS GUI open the shell to the jail. First enable SSH so you can run commands from a local terminal (source: https://manjaro.site/how-to-enable-ssh-on-truenas-jail/).

echo 'sshd_enable="YES"' >> /etc/rc.conf
service sshd start
service sshd status

Now give the root user a password:

passwd root

3. Install the test suite

Login using Putty or SSH and install the suite.

pkg update
pkg install benchmarks/phoronix-test-suite

4. Install the fio tests and dependencies

phoronix-test-suite install pts/fio

5. Modify the test-definition

nano ~/.phoronix-test-suite/test-profiles/pts/fio-1.15.0/test-definition.xml

Now scroll all the way down and modify to (source: https://github.com/phoronix-test-suite/phoronix-test-suite/issues/202#issuecomment-396281442):

    <Option>
      <DisplayName>Disk Target</DisplayName>
      <Identifier>custom-field</Identifier>
      <ArgumentPrefix></ArgumentPrefix>
      <ArgumentPostfix></ArgumentPostfix>
      <DefaultEntry>0</DefaultEntry>
      <Menu>
        <Entry>
          <Name>Mounted-Directory</Name>
          <Value>/mnt/benchmarking</Value>
          <Message></Message>
        </Entry>
        <Entry>
          <Name>Temp-Directory</Name>
          <Value>/tmp</Value>
          <Message></Message>
        </Entry>
      </Menu>
    </Option>

6. Run the tests

phoronix-test-suite benchmark pts/fio-1.15.0
  • Disk Test Configuration: 5 --> "Test All Options"
  • Engine: 2 --> "POSIX AIO"
  • Buffered: 2 --> "No"
  • Direct: 2 --> "Yes"
  • Block Size: 13 --> Test All Options (I was curious and eventually used 1 (4k) to compare with the network share results)
  • Disk Target: 1 --> "Mounted-Directory"
  • Save results: Y
  • Enter a name for the result fil: fio-pool-benchmark
  • Enter a unique name to describe this test run: fio-pool-benchmark-20221018-1430

7. Disable cache on the dataset

When the results came back, see below, it was more than clear that I was benchmarking the ARC cache in RAM instead of the actual disk array (pool) performance. So I disabled the cache and reran all the tests.

zfs get primarycache store1/benchmarking
zfs set primarycache=none store1/benchmarking

zfs get secondarycache store1/benchmarking
zfs set secondarycache=none store1/benchmarking

8. Rerun the tests

Results (finally)

After running the pts (phoronix test suite) benchmarks we can finally compare the results. Again mind that I did nothing special to disable or circumvent the ARC caching, I only used direct=1 ("yes" in pts) and buffered=0 ("no" in pts) which in theory" bypasses the page cache and therefor memory is no longer used". And as mentioned earlier I forced all datasets to async mode.

Also I did run the test on a production system, meaning it was being used by other applications when I ran the tests. Although it was used very lightly, this might have influenced the tests a bit.

Pool benchmark results

20221018-1224-fio Benchmarks - OpenBenchmarking.org
ZFS pool (actually ARC) benchmark results, different blocksizes

As expected when the blocksize is increased, the throughput increases while the IOPS drop. Overall the performance is great, but I am actually testing my RAM here. What is funny to see it the throughput number for the 128 KB blocksize, which aligns with the recordsize of the dataset.

Also, if I understand this correctly, IOPS for sequential read/writes don't mean that much. In this mode, throughput is the limiting factor. Anyhow, the numbers for sequential vs random match very closely. This is very likely because of caching.

ℹ️
Memory speeds
To get the numbers into perspective, I did a quick memory benchmark (pts/memory). Not all tests of this suite ran on FreeBSD, but here are some results:
- Memory Copy - Array Size: 1024 MiB: 4954 MiB/s
- Memory Copy, Fixed Block Size - Array Size: 1024 MiB: 5227 MiB/s
- CacheBench Read Cache: 6372 MB/s
- CacheBench Write Cache: 16481 MB/s
Benchmarks - OpenBenchmarking.org
ZFS pool (disabled cache) benchmark results, different blocksizes

So what are the actual results when we disable caching? Well the numbers for small random reads/writes plummet. Which is to be expected as my pool geometry (RAIDZ) was chosen for sequential performance. Only later I started experimenting with virtual machine storage which is better served on RAID mirror vdevs (https://openzfs.github.io/openzfs-docs/Performance and Tuning/Workload Tuning.html#pool-geometry).

I don't really understand what is going on here. Write performance seems way to high to me. There is still caching going on. To be continued...

Share benchmark results

With the pool results in mind, I decided to do 3 runs for the share benchmarks: 4 KB blocksize, 128 KB and 8MB.

Fio-shares-benchmark Performance - OpenBenchmarking.org
4 KB blocksize iSCSI-NFS-SMB benchmark results
Fio-shares-128k-20221018-1640 Benchmarks - OpenBenchmarking.org
128 KB blocksize iSCSI-NFS-SMB benchmark results
Fio-shares-8mb-20221018-1715 Benchmarks - OpenBenchmarking.org
8 MB blocksize iSCSI-NFS-SMB benchmark results

With a blocksize of 4 KB the throughput is very low. This lines up with the "horrible" results I got when running fio manually. But when I ran the same test with 128 KB blocks, the throughput shot up. When using 8 MB blocks, the tests saturated my line speed at ~125 MB/s. Looking at IOPS, the results are "mirrored". High IOPS with small blocksizes, low IOPS with higher blocksizes.

4 KB blocksize iSCSI-NFS-SMB benchmark results
128 KB blocksize iSCSI-NFS-SMB benchmark results
8 MB blocksize iSCSI-NFS-SMB benchmark results

Conclusion

Benchmarking is hard. Not only finding and using the tools, but also running the right tests that actually tell you something useful. Did I achieve that? I don't know. The results show a mixed message to me. The W3-Top results don't line up with the phoronix-test-suite results. As I can't reproduce them with a manual command, I am going to ignore them for now and focus on the phoronix results, which came very close.

Small files

When your application reads and writes lots of small files, iSCSI is the best option. It has the highest IOPS (~10% higher) and highest throughput when handling small blocksizes.

Large files

If you're working with large files, NFS shines. It has the highest throughput and the lowest CPU utilization (see W3-Top results). SMB is not bad too, but lags a bit behind in read speeds.

In between

If your application or use case is somewhere in between or you have mixed workloads, which is probably the case in the real world, it really doesn't matter which you choose. Ease of setup is way more important here.

So, which one

I currently use all of them and these test results confirm to me that I made the right choice.

  1. NFS for my Proxmox backups (large files) to a dataset with large recordsize is perfect.
  2. SMB for my Windows shares with mixed file sizes gets the best performance. iSCSI would not work here as the files need to be shared between multiple machines.
  3. iSCSI for virtual disks is more of a mixed bag. One application writes lots of small files, so iSCSI is perfect (they also recommend it). iSCSI for Sonatype blob storage and TeamCity artifact storage might not be the best option (they might need some tuning).

Appendix A: Hardware used

Proxmox server (test VM)

  • CPU: Xeon E5-2650 v2 @ 2.60GHz, Cache 16384 KB, 4 Cores
  • RAM: (4GB) DDR3 1333 Mhz
  • NIC: 4-port HP 331i Adapter (Broadcom Gigabit Ethernet BCM5719) (combined in LACP bond with layer 3+4 Hash policy)
  • OS host: Proxmox 7.2-11, Kernel 5.15.60-1-pve (x86_64)
  • OS guest: Ubuntu 22.04.1 LTS, Kernel 5.15.0-50-generic (x86_64)

TrueNAS server

  • CPU: Xeon E5-2420 v2 @ 2.20GHz, Cache 15 MB, 6 Cores
  • RAM: 128GB DDR3 1066 Mhz
  • NIC: 2-port Broadcom Gigabit Ethernet BCM5720 (combined in LACP lagg)
  • OS: TrueNAS Core 13-u2
  • Disks: HITACHI HUS723030ALS640 (3TB). Pool: 7 disk RAID-Z2

Switch

  • HP 2530-24G-PoEP Switch (J9773A)
  • All trunks (bond/lagg) use layer 4 based load balancing

Appendix B: Install W3-Top on Ubuntu 22.04 (as of October 2022)

I ended up installing it as a service after using this workaround to get .NET core (libssl version issue) working in Ubuntu 22.04. Summary of all the commands to install, including the libssl workaround:

wget "http://security.ubuntu.com/ubuntu/pool/main/o/openssl1.0/libssl1.0.0_1.0.2n-1ubuntu5.10_amd64.deb"
wget "http://security.ubuntu.com/ubuntu/pool/main/o/openssl1.0/libssl1.0-dev_1.0.2n-1ubuntu5.10_amd64.deb"
sudo dpkg -i libssl1.0.0_1.0.2n-1ubuntu5.10_amd64.deb
sudo dpkg -i libssl1.0-dev_1.0.2n-1ubuntu5.10_amd64.deb

script=https://raw.githubusercontent.com/devizer/glist/master/install-dotnet-dependencies.sh;
(wget -q -nv --no-check-certificate -O - $script 2>/dev/null || curl -ksSL $script) | bash

export HTTP_HOST=0.0.0.0 HTTP_PORT=5050
export RESPONSE_COMPRESSION=True
export INSTALL_DIR=/opt/w3top
script=https://raw.githubusercontent.com/devizer/w3top-bin/master/install-w3top-service.sh
(wget -q -nv --no-check-certificate -O - $script 2>/dev/null || curl -ksSL $script) | bash

Appendix C: fio results TrueNAS pool

RUN_seq_read: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.28
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=175MiB/s][r=44.9k IOPS][eta 00m:00s]
RUN_seq_read: (groupid=0, jobs=1): err= 0: pid=2597: Mon Oct 17 13:13:08 2022
  read: IOPS=46.0k, BW=180MiB/s (189MB/s)(5396MiB/30001msec)
   bw (  KiB/s): min=176248, max=342465, per=100.00%, avg=184405.57, stdev=28071.60, samples=58
   iops        : min=44062, max=85616, avg=46101.02, stdev=7017.87, samples=58
  cpu          : usr=22.16%, sys=40.36%, ctx=1381355, majf=0, minf=1
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=1381310,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=180MiB/s (189MB/s), 180MiB/s-180MiB/s (189MB/s-189MB/s), io=5396MiB (5658MB), run=30001-30001msec
RUN_seq_write: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.28
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=143MiB/s][w=36.6k IOPS][eta 00m:00s]
RUN_seq_write: (groupid=0, jobs=1): err= 0: pid=2647: Mon Oct 17 13:13:40 2022
  write: IOPS=36.8k, BW=144MiB/s (151MB/s)(4314MiB/30001msec); 0 zone resets
   bw (  KiB/s): min=142474, max=233480, per=100.00%, avg=147387.50, stdev=11605.94, samples=58
   iops        : min=35618, max=58370, avg=36846.62, stdev=2901.53, samples=58
  cpu          : usr=18.19%, sys=31.98%, ctx=1104435, majf=0, minf=1
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,1104382,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=144MiB/s (151MB/s), 144MiB/s-144MiB/s (151MB/s-151MB/s), io=4314MiB (4524MB), run=30001-30001msec
RUN_rand_read1: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.28
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=175MiB/s][r=44.8k IOPS][eta 00m:00s]
RUN_rand_read1: (groupid=0, jobs=1): err= 0: pid=2674: Mon Oct 17 13:14:12 2022
  read: IOPS=46.6k, BW=182MiB/s (191MB/s)(5456MiB/30001msec)
   bw (  KiB/s): min=175872, max=358379, per=100.00%, avg=186375.76, stdev=34966.23, samples=59
   iops        : min=43968, max=89594, avg=46593.64, stdev=8741.52, samples=59
  cpu          : usr=24.11%, sys=37.44%, ctx=1396722, majf=0, minf=1
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=1396670,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=182MiB/s (191MB/s), 182MiB/s-182MiB/s (191MB/s-191MB/s), io=5456MiB (5721MB), run=30001-30001msec
RUN_rand_write1: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.28
Starting 1 process
Jobs: 1 (f=1): [w(1)][100.0%][w=141MiB/s][w=36.1k IOPS][eta 00m:00s]
RUN_rand_write1: (groupid=0, jobs=1): err= 0: pid=2723: Mon Oct 17 13:14:44 2022
  write: IOPS=36.1k, BW=141MiB/s (148MB/s)(4232MiB/30002msec); 0 zone resets
   bw (  KiB/s): min=141773, max=193230, per=100.00%, avg=144573.29, stdev=6521.19, samples=59
   iops        : min=35443, max=48307, avg=36143.08, stdev=1630.26, samples=59
  cpu          : usr=19.76%, sys=31.41%, ctx=1083383, majf=0, minf=1
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,1083326,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=141MiB/s (148MB/s), 141MiB/s-141MiB/s (148MB/s-148MB/s), io=4232MiB (4437MB), run=30002-30002msec
RUN_rand_read16: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=16
fio-3.28
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=1069MiB/s][r=274k IOPS][eta 00m:00s]
RUN_rand_read16: (groupid=0, jobs=1): err= 0: pid=2759: Mon Oct 17 13:15:16 2022
  read: IOPS=202k, BW=788MiB/s (826MB/s)(23.1GiB/30001msec)
   bw (  KiB/s): min=513097, max=1100504, per=99.72%, avg=804288.36, stdev=224776.01, samples=59
   iops        : min=128274, max=275126, avg=201072.14, stdev=56194.33, samples=59
  cpu          : usr=24.22%, sys=75.77%, ctx=780, majf=0, minf=1
  IO depths    : 1=0.1%, 2=12.5%, 4=26.9%, 8=53.9%, 16=6.7%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=93.7%, 8=0.0%, 16=6.3%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=6049150,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: bw=788MiB/s (826MB/s), 788MiB/s-788MiB/s (826MB/s-826MB/s), io=23.1GiB (24.8GB), run=30001-30001msec
RUN_rand_write16: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=16
fio-3.28
Starting 1 process
Jobs: 1 (f=1): [w(1)][100.0%][w=461MiB/s][w=118k IOPS][eta 00m:00s]
RUN_rand_write16: (groupid=0, jobs=1): err= 0: pid=2806: Mon Oct 17 13:15:48 2022
  write: IOPS=118k, BW=462MiB/s (484MB/s)(13.5GiB/30001msec); 0 zone resets
   bw (  KiB/s): min=441884, max=479584, per=100.00%, avg=473305.83, stdev=6573.66, samples=59
   iops        : min=110471, max=119896, avg=118326.07, stdev=1643.34, samples=59
  cpu          : usr=23.62%, sys=76.23%, ctx=15490, majf=0, minf=1
  IO depths    : 1=0.1%, 2=0.7%, 4=22.4%, 8=68.3%, 16=8.6%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=92.1%, 8=0.2%, 16=7.7%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,3546430,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
  WRITE: bw=462MiB/s (484MB/s), 462MiB/s-462MiB/s (484MB/s-484MB/s), io=13.5GiB (14.5GB), run=30001-30001msec