Antonio Paolillo - antonio.paolillo [at] huawei.com
Last updated: 2023-03-01
This page provides benchmark results to support the conversation on LKML about rtmutex_lock (see https://lkml.org/lkml/2023/1/20/702).
We used lock_torture to evaluate the different proposed patches on two different server machines:
We ran the evaluation on a Ubuntu 22.04 distro, with custom kernels based on v6.2-rc6 (6d796c50f84ca79f1722bb131799e5a5710c4700). The different kernels are combination of patches:
The following provides hyperlinks to the code of the actual changes applied on these different kernels:
We ran lock_torture several times, exploring the following parameter space:
torture_type="rtmutex_lock"
,nwriters_stress=[1, 2, 3, 4, 8, 16, 32, 64, 95]
,stat_interval=4
,stutter=0
,shuffle_interval=0
.For each value of nwriters_stress
, we ran the configuration 5 times.
By feeding the lock_torture
kthread pids to taskset -p
, we overruled
the scheduling such that the distribution of kthreads to CPUs is fixed.
We also disabled "irq balance" and "numa balance" daemons, fixed the
frequency to 1.5GHz using the "userspace" cpufreq governor and isolated
all the cores used (using isolcpus=1-95
at boot-time) to avoid any
source of interference.
As a warm-up phase, we ignored the first reported results and only considered the latest 60 seconds of execution (after all kthreads migrated to their final CPU). The reported throughput is computed by dividing the reported number of operations by the duration of the measurement for each dot (60 seconds), so higher is better.
Here follows the results on taishan200-96c (the 'rel' column is the mean relative to the mean of the stock kernel, in percent, and each mean is the average over 5 independent runs):
Kernel: | k0-stock-6.2.0-rc6 | k1-rmacq | k2-readonce | k3-alongor | k4-rmacq+readonce | k5-rmacq+alongor | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Statistic (kops/s): | mean | std | mean | std | rel | mean | std | rel | mean | std | rel | mean | std | rel | mean | std | rel |
nwriters_stress: | |||||||||||||||||
1 | 899.91 | 24.95 | 880.10 | 29.62 | -2% | 871.57 | 44.27 | -3% | 888.65 | 37.90 | -1% | 898.63 | 29.82 | -0% | 889.83 | 25.64 | -1% |
2 | 359.30 | 25.92 | 416.83 | 32.77 | +16% | 360.65 | 28.32 | +0% | 404.79 | 42.64 | +13% | 380.65 | 21.29 | +6% | 404.37 | 23.27 | +13% |
3 | 314.97 | 24.32 | 308.41 | 9.68 | -2% | 315.00 | 9.97 | +0% | 313.86 | 13.47 | -0% | 313.47 | 4.01 | -0% | 322.77 | 20.82 | +2% |
4 | 328.02 | 15.09 | 330.65 | 29.33 | +1% | 314.83 | 24.28 | -4% | 305.71 | 12.72 | -7% | 322.95 | 10.39 | -2% | 343.32 | 13.73 | +5% |
8 | 292.16 | 22.03 | 288.85 | 10.50 | -1% | 288.28 | 18.84 | -1% | 285.42 | 24.58 | -2% | 310.23 | 26.08 | +6% | 285.67 | 20.03 | -2% |
16 | 297.03 | 26.89 | 281.89 | 29.22 | -5% | 265.19 | 33.73 | -11% | 279.02 | 22.43 | -6% | 284.40 | 36.21 | -4% | 285.21 | 36.33 | -4% |
32 | 187.36 | 28.59 | 175.71 | 19.77 | -6% | 186.44 | 48.15 | -0% | 206.59 | 14.11 | +10% | 174.08 | 24.30 | -7% | 185.80 | 45.12 | -1% |
64 | 148.13 | 48.65 | 172.48 | 34.29 | +16% | 154.59 | 47.05 | +4% | 164.22 | 29.81 | +11% | 142.13 | 47.40 | -4% | 136.39 | 29.95 | -8% |
95 | 174.35 | 57.89 | 148.59 | 38.03 | -15% | 156.85 | 43.64 | -10% | 132.92 | 32.35 | -24% | 126.44 | 28.24 | -27% | 146.82 | 60.04 | -16% |
Kernel: | k0-stock-6.2.0-rc6 | k1-rmacq | k2-readonce | k3-alongor | k4-rmacq+readonce | k5-rmacq+alongor | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Statistic (kops/s): | mean | std | mean | std | rel | mean | std | rel | mean | std | rel | mean | std | rel | mean | std | rel |
nwriters_stress: | |||||||||||||||||
1 | 899.91 | 24.95 | 880.10 | 29.62 | -2% | 871.57 | 44.27 | -3% | 888.65 | 37.90 | -1% | 898.63 | 29.82 | -0% | 889.83 | 25.64 | -1% |
2 | 359.30 | 25.92 | 416.83 | 32.77 | +16% | 360.65 | 28.32 | +0% | 404.79 | 42.64 | +13% | 380.65 | 21.29 | +6% | 404.37 | 23.27 | +13% |
3 | 314.97 | 24.32 | 308.41 | 9.68 | -2% | 315.00 | 9.97 | +0% | 313.86 | 13.47 | -0% | 313.47 | 4.01 | -0% | 322.77 | 20.82 | +2% |
4 | 328.02 | 15.09 | 330.65 | 29.33 | +1% | 314.83 | 24.28 | -4% | 305.71 | 12.72 | -7% | 322.95 | 10.39 | -2% | 343.32 | 13.73 | +5% |
8 | 292.16 | 22.03 | 288.85 | 10.50 | -1% | 288.28 | 18.84 | -1% | 285.42 | 24.58 | -2% | 310.23 | 26.08 | +6% | 285.67 | 20.03 | -2% |
16 | 297.03 | 26.89 | 281.89 | 29.22 | -5% | 265.19 | 33.73 | -11% | 279.02 | 22.43 | -6% | 284.40 | 36.21 | -4% | 285.21 | 36.33 | -4% |
32 | 187.36 | 28.59 | 175.71 | 19.77 | -6% | 186.44 | 48.15 | -0% | 206.59 | 14.11 | +10% | 174.08 | 24.30 | -7% | 185.80 | 45.12 | -1% |
64 | 148.13 | 48.65 | 172.48 | 34.29 | +16% | 154.59 | 47.05 | +4% | 164.22 | 29.81 | +11% | 142.13 | 47.40 | -4% | 136.39 | 29.95 | -8% |
95 | 174.35 | 57.89 | 148.59 | 38.03 | -15% | 156.85 | 43.64 | -10% | 132.92 | 32.35 | -24% | 126.44 | 28.24 | -27% | 146.82 | 60.04 | -16% |
And the results on gigabyte-96c:
Kernel: | k0-stock-6.2.0-rc6 | k1-rmacq | k2-readonce | k3-alongor | k4-rmacq+readonce | k5-rmacq+alongor | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Statistic (kops/s): | mean | std | mean | std | rel | mean | std | rel | mean | std | rel | mean | std | rel | mean | std | rel |
nwriters_stress: | |||||||||||||||||
1 | 713.72 | 25.68 | 707.32 | 17.73 | -1% | 718.81 | 12.63 | +1% | 712.80 | 13.57 | -0% | 709.17 | 14.10 | -1% | 730.33 | 9.14 | +2% |
2 | 376.25 | 8.19 | 400.09 | 16.24 | +6% | 396.71 | 26.09 | +5% | 412.61 | 17.80 | +10% | 396.48 | 7.02 | +5% | 409.90 | 14.61 | +9% |
3 | 415.07 | 16.83 | 410.19 | 19.82 | -1% | 423.39 | 9.68 | +2% | 417.28 | 10.23 | +1% | 424.94 | 17.48 | +2% | 422.92 | 11.75 | +2% |
4 | 286.77 | 26.63 | 285.13 | 6.80 | -1% | 297.33 | 23.62 | +4% | 296.49 | 16.60 | +3% | 303.99 | 30.38 | +6% | 296.93 | 9.90 | +4% |
8 | 296.56 | 20.45 | 308.97 | 12.53 | +4% | 305.49 | 19.91 | +3% | 294.24 | 17.24 | -1% | 294.71 | 24.03 | -1% | 294.09 | 25.20 | -1% |
16 | 257.34 | 33.94 | 266.03 | 29.60 | +3% | 270.72 | 35.22 | +5% | 252.28 | 50.16 | -2% | 263.83 | 45.84 | +3% | 247.42 | 41.01 | -4% |
32 | 278.78 | 51.45 | 215.35 | 68.40 | -23% | 259.77 | 87.44 | -7% | 217.26 | 79.67 | -22% | 201.23 | 70.46 | -28% | 282.47 | 116.65 | +1% |
64 | 75.82 | 64.87 | 194.52 | 137.19 | +157% | 35.57 | 12.14 | -53% | 74.24 | 72.04 | -2% | 71.29 | 45.55 | -6% | 77.93 | 43.57 | +3% |
95 | 60.37 | 68.13 | 198.38 | 116.93 | +229% | 43.12 | 17.60 | -29% | 58.80 | 36.47 | -3% | 57.78 | 63.00 | -4% | 61.33 | 71.18 | +2% |
We can safely conclude that the patches do not significatively affect the throughput of the lock_torture benchmark for rtmutex_lock. The values for nwriters_stress>=64 can safely be ignored as they are too spread.
[1] https://e.huawei.com/uk/products/servers/taishan-server/taishan-2280-v2
[2] https://en.wikichip.org/wiki/hisilicon/kunpeng/920-4826
[3] https://www.gigabyte.com/Rack-Server/R182-Z91-rev-100
Below you will find results in form of interactive charts, for each kernel and for each machine.
Here are the results for each individual measured dots (since each configuration has been repeated 5 times):
And here are the aggregated results (using the mean as aggregator):