Meltdown, Spectre 対応での性能測定

Meltdown, Spectre の対応で、メモリ空間の分離の処理が重くなるとして色々話題になっているので、深い意味はないけど自宅のマシンで UnixBench を比較。Ubuntu 16.04 on hp microserver Gen8 にて測定。

環境

CPU

$ cat /proc/cpuinfo  | grep 'model name'
model name      : Intel(R) Celeron(R) CPU G1610T @ 2.30GHz
model name      : Intel(R) Celeron(R) CPU G1610T @ 2.30GHz

カーネルバージョン

いずれも Ubuntu 標準のリポジトリの物

環境 カーネルバージョン
対応後 4.4.0-109-generic #132-Ubuntu SMP Tue Jan 9 19:52:39 UTC 2018 x86_64 x86_64 x86_64
対応前 4.4.0-104-generic #127-Ubuntu SMP Mon Dec 11 12:16:42 UTC 2017 x86_64 x86_64 x86_64

結果

広く指摘されている通り、システムコールのオーバーヘッドが大きくなっていることがわかりやすく出ている。

試験名 対応前 対応後 対応後の相対値
System Call Overhead 3237.3 965.2 29.8%
Pipe Throughput 2913.0 1509.0 51.8%
Pipe-based Context Switching 1447.4 809.6 55.9%
Process Creation 1825.5 1570.2 86.0%
Execl Throughput 2016.8 1806.5 89.6%
Shell Scripts (8 concurrent) 2908.6 2739.7 94.2%
Shell Scripts (1 concurrent) 3129.3 2960.0 94.6%
File Copy 1024 bufsize 2000 maxblocks 2592.4 2519.9 97.2%
File Copy 4096 bufsize 8000 maxblocks 4940.0 4831.9 97.8%
File Copy 256 bufsize 500 maxblocks 1712.1 1687.9 98.6%
Double-Precision Whetstone 1229.2 1228.9 100.0%
Dhrystone 2 using register variables 4475.4 4592.7 102.6%

ログ

$ uname -a
Linux storage 4.4.0-104-generic #127-Ubuntu SMP Mon Dec 11 12:16:42 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
$ ./Run
make all
make[1]: Entering directory '/home/nhirokinet/byte-unixbench/UnixBench'
make distr
make[2]: Entering directory '/home/nhirokinet/byte-unixbench/UnixBench'
Checking distribution of files
./pgms  exists
./src  exists
./testdir  exists
./tmp  exists
./results  exists
make[2]: Leaving directory '/home/nhirokinet/byte-unixbench/UnixBench'
make programs
make[2]: Entering directory '/home/nhirokinet/byte-unixbench/UnixBench'
make[2]: Nothing to be done for 'programs'.
make[2]: Leaving directory '/home/nhirokinet/byte-unixbench/UnixBench'
make[1]: Leaving directory '/home/nhirokinet/byte-unixbench/UnixBench'
sh: 1: 3dinfo: not found

   #    #  #    #  #  #    #          #####   ######  #    #   ####   #    #
   #    #  ##   #  #   #  #           #    #  #       ##   #  #    #  #    #
   #    #  # #  #  #    ##            #####   #####   # #  #  #       ######
   #    #  #  # #  #    ##            #    #  #       #  # #  #       #    #
   #    #  #   ##  #   #  #           #    #  #       #   ##  #    #  #    #
    ####   #    #  #  #    #          #####   ######  #    #   ####   #    #

   Version 5.1.3                      Based on the Byte Magazine Unix Benchmark

   Multi-CPU version                  Version 5 revisions by Ian Smith,
                                      Sunnyvale, CA, USA
   January 13, 2011                   johantheghost at yahoo period com


1 x Dhrystone 2 using register variables  1 2 3 4 5 6 7 8 9 10

1 x Double-Precision Whetstone  1 2 3 4 5 6 7 8 9 10

1 x Execl Throughput  1 2 3

1 x File Copy 1024 bufsize 2000 maxblocks  1 2 3

1 x File Copy 256 bufsize 500 maxblocks  1 2 3

1 x File Copy 4096 bufsize 8000 maxblocks  1 2 3

1 x Pipe Throughput  1 2 3 4 5 6 7 8 9 10

1 x Pipe-based Context Switching  1 2 3 4 5 6 7 8 9 10

1 x Process Creation  1 2 3

1 x System Call Overhead  1 2 3 4 5 6 7 8 9 10

1 x Shell Scripts (1 concurrent)  1 2 3

1 x Shell Scripts (8 concurrent)  1 2 3

2 x Dhrystone 2 using register variables  1 2 3 4 5 6 7 8 9 10

2 x Double-Precision Whetstone  1 2 3 4 5 6 7 8 9 10

2 x Execl Throughput  1 2 3

2 x File Copy 1024 bufsize 2000 maxblocks  1 2 3

2 x File Copy 256 bufsize 500 maxblocks  1 2 3

2 x File Copy 4096 bufsize 8000 maxblocks  1 2 3

2 x Pipe Throughput  1 2 3 4 5 6 7 8 9 10

2 x Pipe-based Context Switching  1 2 3 4 5 6 7 8 9 10

2 x Process Creation  1 2 3

2 x System Call Overhead  1 2 3 4 5 6 7 8 9 10

2 x Shell Scripts (1 concurrent)  1 2 3

2 x Shell Scripts (8 concurrent)  1 2 3

========================================================================
   BYTE UNIX Benchmarks (Version 5.1.3)

   System: storage: GNU/Linux
   OS: GNU/Linux -- 4.4.0-104-generic -- #127-Ubuntu SMP Mon Dec 11 12:16:42 UTC 2017
   Machine: x86_64 (x86_64)
   Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
   CPU 0: Intel(R) Celeron(R) CPU G1610T @ 2.30GHz (4589.4 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
   CPU 1: Intel(R) Celeron(R) CPU G1610T @ 2.30GHz (4589.4 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
   16:04:12 up 9 days, 16:38,  1 user,  load average: 1.91, 2.93, 2.06; runlevel 2017-12-29

------------------------------------------------------------------------
Benchmark Run: Mon Jan 08 2018 16:04:12 - 16:32:59
2 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables       26624684.7 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     3379.6 MWIPS (9.9 s, 7 samples)
Execl Throughput                               3837.8 lps   (29.6 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        800558.6 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          238528.1 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       1921655.9 KBps  (30.0 s, 2 samples)
Pipe Throughput                             1811369.4 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                 162381.4 lps   (10.0 s, 7 samples)
Process Creation                               9848.2 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   8493.7 lpm   (60.1 s, 2 samples)
Shell Scripts (8 concurrent)                   1618.4 lpm   (60.1 s, 2 samples)
System Call Overhead                        3039305.9 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   26624684.7   2281.5
Double-Precision Whetstone                       55.0       3379.6    614.5
Execl Throughput                                 43.0       3837.8    892.5
File Copy 1024 bufsize 2000 maxblocks          3960.0     800558.6   2021.6
File Copy 256 bufsize 500 maxblocks            1655.0     238528.1   1441.3
File Copy 4096 bufsize 8000 maxblocks          5800.0    1921655.9   3313.2
Pipe Throughput                               12440.0    1811369.4   1456.1
Pipe-based Context Switching                   4000.0     162381.4    406.0
Process Creation                                126.0       9848.2    781.6
Shell Scripts (1 concurrent)                     42.4       8493.7   2003.2
Shell Scripts (8 concurrent)                      6.0       1618.4   2697.3
System Call Overhead                          15000.0    3039305.9   2026.2
                                                                   ========
System Benchmarks Index Score                                        1408.8

------------------------------------------------------------------------
Benchmark Run: Mon Jan 08 2018 16:32:59 - 17:01:45
2 CPUs in system; running 2 parallel copies of tests

Dhrystone 2 using register variables       52227790.6 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     6760.5 MWIPS (9.9 s, 7 samples)
Execl Throughput                               8672.1 lps   (29.5 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks       1026579.6 KBps  (30.1 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          283351.3 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       2865226.5 KBps  (30.1 s, 2 samples)
Pipe Throughput                             3623761.5 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                 578965.5 lps   (10.0 s, 7 samples)
Process Creation                              23001.7 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                  13395.6 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   1744.5 lpm   (60.1 s, 2 samples)
System Call Overhead                        4856018.0 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   52227790.6   4475.4
Double-Precision Whetstone                       55.0       6760.5   1229.2
Execl Throughput                                 43.0       8672.1   2016.8
File Copy 1024 bufsize 2000 maxblocks          3960.0    1026579.6   2592.4
File Copy 256 bufsize 500 maxblocks            1655.0     283351.3   1712.1
File Copy 4096 bufsize 8000 maxblocks          5800.0    2865226.5   4940.0
Pipe Throughput                               12440.0    3623761.5   2913.0
Pipe-based Context Switching                   4000.0     578965.5   1447.4
Process Creation                                126.0      23001.7   1825.5
Shell Scripts (1 concurrent)                     42.4      13395.6   3159.3
Shell Scripts (8 concurrent)                      6.0       1744.5   2907.6
System Call Overhead                          15000.0    4856018.0   3237.3
                                                                   ========
System Benchmarks Index Score                                        2485.8

$ uname -a
Linux storage 4.4.0-109-generic #132-Ubuntu SMP Tue Jan 9 19:52:39 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
$ ./Run
make all
make[1]: Entering directory '/home/nhirokinet/byte-unixbench/UnixBench'
make distr
make[2]: Entering directory '/home/nhirokinet/byte-unixbench/UnixBench'
Checking distribution of files
./pgms  exists
./src  exists
./testdir  exists
./tmp  exists
./results  exists
make[2]: Leaving directory '/home/nhirokinet/byte-unixbench/UnixBench'
make programs
make[2]: Entering directory '/home/nhirokinet/byte-unixbench/UnixBench'
make[2]: Nothing to be done for 'programs'.
make[2]: Leaving directory '/home/nhirokinet/byte-unixbench/UnixBench'
make[1]: Leaving directory '/home/nhirokinet/byte-unixbench/UnixBench'
sh: 1: 3dinfo: not found

   #    #  #    #  #  #    #          #####   ######  #    #   ####   #    #
   #    #  ##   #  #   #  #           #    #  #       ##   #  #    #  #    #
   #    #  # #  #  #    ##            #####   #####   # #  #  #       ######
   #    #  #  # #  #    ##            #    #  #       #  # #  #       #    #
   #    #  #   ##  #   #  #           #    #  #       #   ##  #    #  #    #
    ####   #    #  #  #    #          #####   ######  #    #   ####   #    #

   Version 5.1.3                      Based on the Byte Magazine Unix Benchmark

   Multi-CPU version                  Version 5 revisions by Ian Smith,
                                      Sunnyvale, CA, USA
   January 13, 2011                   johantheghost at yahoo period com


1 x Dhrystone 2 using register variables  1 2 3 4 5 6 7 8 9 10

1 x Double-Precision Whetstone  1 2 3 4 5 6 7 8 9 10

1 x Execl Throughput  1 2 3

1 x File Copy 1024 bufsize 2000 maxblocks  1 2 3

1 x File Copy 256 bufsize 500 maxblocks  1 2 3

1 x File Copy 4096 bufsize 8000 maxblocks  1 2 3

1 x Pipe Throughput  1 2 3 4 5 6 7 8 9 10

1 x Pipe-based Context Switching  1 2 3 4 5 6 7 8 9 10

1 x Process Creation  1 2 3

1 x System Call Overhead  1 2 3 4 5 6 7 8 9 10

1 x Shell Scripts (1 concurrent)  1 2 3

1 x Shell Scripts (8 concurrent)  1 2 3

2 x Dhrystone 2 using register variables  1 2 3 4 5 6 7 8 9 10

2 x Double-Precision Whetstone  1 2 3 4 5 6 7 8 9 10

2 x Execl Throughput  1 2 3

2 x File Copy 1024 bufsize 2000 maxblocks  1 2 3

2 x File Copy 256 bufsize 500 maxblocks  1 2 3

2 x File Copy 4096 bufsize 8000 maxblocks  1 2 3

2 x Pipe Throughput  1 2 3 4 5 6 7 8 9 10

2 x Pipe-based Context Switching  1 2 3 4 5 6 7 8 9 10

2 x Process Creation  1 2 3

2 x System Call Overhead  1 2 3 4 5 6 7 8 9 10

2 x Shell Scripts (1 concurrent)  1 2 3

2 x Shell Scripts (8 concurrent)  1 2 3

========================================================================
   BYTE UNIX Benchmarks (Version 5.1.3)

   System: storage: GNU/Linux
   OS: GNU/Linux -- 4.4.0-109-generic -- #132-Ubuntu SMP Tue Jan 9 19:52:39 UTC 2018
   Machine: x86_64 (x86_64)
   Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
   CPU 0: Intel(R) Celeron(R) CPU G1610T @ 2.30GHz (4589.4 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
   CPU 1: Intel(R) Celeron(R) CPU G1610T @ 2.30GHz (4589.4 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
   23:30:18 up 1 min,  1 user,  load average: 0.19, 0.11, 0.04; runlevel 2018-01-16

------------------------------------------------------------------------
Benchmark Run: Tue Jan 16 2018 23:30:18 - 23:59:05
2 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables       26767435.5 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     3379.3 MWIPS (9.9 s, 7 samples)
Execl Throughput                               3471.2 lps   (29.6 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        554695.5 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          148162.7 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       1609334.7 KBps  (30.0 s, 2 samples)
Pipe Throughput                              914803.6 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                 145211.3 lps   (10.0 s, 7 samples)
Process Creation                               8671.7 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   8052.1 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   1534.6 lpm   (60.1 s, 2 samples)
System Call Overhead                         768182.9 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   26767435.5   2293.7
Double-Precision Whetstone                       55.0       3379.3    614.4
Execl Throughput                                 43.0       3471.2    807.3
File Copy 1024 bufsize 2000 maxblocks          3960.0     554695.5   1400.7
File Copy 256 bufsize 500 maxblocks            1655.0     148162.7    895.2
File Copy 4096 bufsize 8000 maxblocks          5800.0    1609334.7   2774.7
Pipe Throughput                               12440.0     914803.6    735.4
Pipe-based Context Switching                   4000.0     145211.3    363.0
Process Creation                                126.0       8671.7    688.2
Shell Scripts (1 concurrent)                     42.4       8052.1   1899.1
Shell Scripts (8 concurrent)                      6.0       1534.6   2557.7
System Call Overhead                          15000.0     768182.9    512.1
                                                                   ========
System Benchmarks Index Score                                        1050.6

------------------------------------------------------------------------
Benchmark Run: Tue Jan 16 2018 23:59:05 - 00:27:51
2 CPUs in system; running 2 parallel copies of tests

Dhrystone 2 using register variables       53596515.9 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     6759.1 MWIPS (9.9 s, 7 samples)
Execl Throughput                               7767.9 lps   (29.5 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        997866.7 KBps  (30.1 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          279350.0 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       2802513.5 KBps  (30.1 s, 2 samples)
Pipe Throughput                             1877145.8 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                 323835.7 lps   (10.0 s, 7 samples)
Process Creation                              19784.7 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                  12550.5 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   1643.8 lpm   (60.1 s, 2 samples)
System Call Overhead                        1447857.1 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   53596515.9   4592.7
Double-Precision Whetstone                       55.0       6759.1   1228.9
Execl Throughput                                 43.0       7767.9   1806.5
File Copy 1024 bufsize 2000 maxblocks          3960.0     997866.7   2519.9
File Copy 256 bufsize 500 maxblocks            1655.0     279350.0   1687.9
File Copy 4096 bufsize 8000 maxblocks          5800.0    2802513.5   4831.9
Pipe Throughput                               12440.0    1877145.8   1509.0
Pipe-based Context Switching                   4000.0     323835.7    809.6
Process Creation                                126.0      19784.7   1570.2
Shell Scripts (1 concurrent)                     42.4      12550.5   2960.0
Shell Scripts (8 concurrent)                      6.0       1643.8   2739.7
System Call Overhead                          15000.0    1447857.1    965.2
                                                                   ========
System Benchmarks Index Score                                        1956.5