Platform
|
|
2S Intel® Xeon® Platinum 8180 processor CPU @ 2.50GHz (28 cores)
|
2S Intel® Xeon® CPU E5-2699 v4 @ 2.20GHz (22 cores)
|
Hyper Threading
|
|
HT disabled
|
HT enabled
|
Turbo
|
|
Turbo disabled
|
Turbo disabled
|
Driver
|
|
Scaling governor set to “performance” via intel_pstate driver
|
Scaling governor set to “performance” via acpi-cpufreq driver
|
Memory
|
|
384GB DDR4-2666 ECC RAM
|
256GB DDR4-2133 ECC RAM
|
OS
|
|
CentOS* Linux release 7.3.1611 (Core)
|
CentOS* Linux release 7.3.1611 (Core)
|
Kernel
|
|
Linux kernel 3.10.0-514.10.2.el7.x86_64
|
Linux kernel 3.10.0-514.10.2.el7.x86_64
|
SSD
|
|
SSD: Intel® SSD DC S3700 Series (800GB, 2.5in SATA 6Gb/s, 25nm, MLC)
|
SSD: Intel® SSD DC S3500 Series (480GB, 2.5in SATA 6Gb/s, 20nm, MLC)
|
Performance Measurement Command Variables
|
|
Environment variables: KMP_AFFINITY='granularity=fine, compact‘, OMP_NUM_THREADS=56, CPU Freq set with cpupower frequency-set -d 2.5G -u 3.8G -g performance
|
Environment variables: KMP_AFFINITY='granularity=fine, compact,1,0‘, OMP_NUM_THREADS=44, CPU Freq set with cpupower frequency-set -d 2.2G -u 2.2G -g performance
|
Caffe
|
Revision
|
Caffe: (http://github.com/intel/caffe/), revision f96b759f71b2281835f690af267158b82b150b5c.
|
Caffe: (http://github.com/intel/caffe/), revision f96b759f71b2281835f690af267158b82b150b5c.
|
Other Arguments
|
Inference measured with “caffe time --forward_only” command, training measured with “caffe time” command. Caffe run with “numactl -l“.
|
Inference measured with “caffe time --forward_only” command, training measured with “caffe time” command.
|
Dataset
|
For “ConvNet” topologies, dummy dataset was used. For other topologies, data was stored on local storage and cached in memory before training.
|
For “ConvNet” topologies, dummy dataset was used. For other topologies, data was stored on local storage and cached in memory before training.
|
Topologies
|
Topology specs from https://github.com/intel/caffe/tree/master/models/intel_optimized_models (GoogLeNet, AlexNet, and ResNet-50), https://github.com/intel/caffe/tree/master/models/default_vgg_19 (VGG-19), and https://github.com/soumith/convnet-benchmarks/tree/master/caffe/imagenet_winners (ConvNet benchmarks; files were updated to use newer Caffe prototxt format but are functionally equivalent).
|
Topology specs from https://github.com/intel/caffe/tree/master/models/intel_optimized_models (GoogLeNet, AlexNet, and ResNet-50), https://github.com/intel/caffe/tree/master/models/default_vgg_19 (VGG-19), and https://github.com/soumith/convnet-benchmarks/tree/master/caffe/imagenet_winners (ConvNet benchmarks; files were updated to use newer Caffe prototxt format but are functionally equivalent).
|
Compiler
|
Intel C++ compiler ver. 17.0.2 20170213
|
GCC 4.8.5
|
Library
|
Intel® MKL small libraries version 2018.0.20170425
|
Intel® MKL small libraries version 2017.0.2.20170110
|
TensorFlow
|
Revision
|
TensorFlow: (https://github.com/tensorflow/tensorflow), commit id 207203253b6f8ea5e938a512798429f91d5b4e7e.
|
TensorFlow: (https://github.com/tensorflow/tensorflow), commit id 207203253b6f8ea5e938a512798429f91d5b4e7e.
|
Other Arguments
|
Interop parallelism threads set to 1 for alexnet, vgg benchmarks, 2 for googlenet benchmarks, intra op parallelism threads set to 56, data format used is NCHW, KMP_BLOCKTIME set to 1 for googlenet and vgg benchmarks, 30 for the alexnet benchmark. Inference measured with --caffe time -forward_only -engine MKL2017option, training measured with --forward_backward_only option.
|
Interop parallelism threads set to 1 for alexnet, vgg benchmarks, 2 for googlenet benchmarks, intra op parallelism threads set to 44, data format used is NCHW, KMP_BLOCKTIME set to 1 for googlenet and vgg benchmarks, 30 for the alexnet benchmark. Inference measured with --caffe time -forward_only -engine MKL2017option, training measured with --forward_backward_only option.
|
Dataset
|
Dummy data was used.
|
Dummy data was used.
|
Topologies
|
Performance numbers were obtained for three convnet benchmarks: alexnet, googlenetv1, vgg(https://github.com/soumith/convnet-benchmarks/tree/master/tensorflow) using dummy data.
|
Performance numbers were obtained for three convnet benchmarks: alexnet, googlenetv1, vgg(https://github.com/soumith/convnet-benchmarks/tree/master/tensorflow)
|
Compiler
|
GCC 4.8.5
|
GCC 4.8.5
|
Library
|
Intel® MKL small libraries version 2018.0.20170425
|
Intel® MKL small libraries version 2018.0.20170425,
|
MXNet
|
Revision
|
MxNet: (https://github.com/dmlc/mxnet/), revision 5efd91a71f36fea483e882b0358c8d46b5a7aa20.
|
MxNet: (https://github.com/dmlc/mxnet/), revision e9f281a27584cdb78db8ce6b66e648b3dbc10d37.
|
Other Arguments
|
Inference was measured with “benchmark_score.py”, training was measured with a modified version of benchmark_score.py which also runs backward propagation.
|
Inference was measured with “benchmark_score.py”, training was measured with a modified version of benchmark_score.py which also runs backward propagation.
|
Dataset
|
Dummy data was used.
|
Dummy data was used.
|
Topologies
|
Topology specs from https://github.com/dmlc/mxnet/tree/master/example/image-classification/symbols.
|
Topology specs from https://github.com/dmlc/mxnet/tree/master/example/image-classification/symbols.
|
Compiler
|
GCC 4.8.5
|
GCC 4.8.5
|
Library
|
Intel® MKL small libraries version 2018.0.20170425.
|
Intel® MKL small libraries version 2017.0.2.20170110.
|
Neon
|
Revision
|
Neon: ZP/MKL_CHWN branch commit id:52bd02acb947a2adabb8a227166a7da5d9123b6d
|
Neon: ZP/MKL_CHWN branch commit id:52bd02acb947a2adabb8a227166a7da5d9123b6d.
|
Other Arguments
|
The main.py script was used for benchmarking, in mkl mode.
|
The main.py script was used for benchmarking, in mkl mode.
|
Dataset
|
Dummy data was used.
|
Dummy data was used.
|
Topologies
|
|
|
Compiler
|
ICC version used : 17.0.3 20170404
|
ICC version used : 17.0.3 20170404
|
Library
|
Intel® MKL small libraries version 2018.0.20170425.
|
Intel® MKL small libraries version 2018.0.20170425.
|
Inference Throughput Performance Measured in Images/Second; BS refers to Batch Size |
|
|
|
|
|
2S Intel® Xeon® Platinum 8180 processor, 28C, 2.5GHz
|
2S Intel® Xeon® processor E5-2699v4, 22C, 2.2GHz
|
Caffe
|
AlexNet
BS = 1
|
235
|
152
|
AlexNet
BS = 1024
|
2656
|
1146
|
GoogLeNet v1
BS = 1
|
117
|
103
|
GoogLeNet v1
BS = 1024
|
814
|
405
|
ResNet-50
BS = 1
|
69
|
45
|
ResNet-50
BS = 1024
|
226
|
118
|
VGG-19
BS = 1
|
73
|
37
|
VGG-19
BS = 256
|
136
|
62
|
AlexNet ConvNet
BS = 1
|
582
|
282
|
Inference Throughput Performance Measured in Images/Second; BS refers to Batch Size |
|
|
|
|
|
2S Intel® Xeon® Platinum 8180 processor, 28C, 2.5GHz
|
2S Intel® Xeon® processor E5-2699v4, 22C, 2.2GHz
|
TensorFlow
|
AlexNet ConvNet
BS = 1
|
144
|
126
|
AlexNet ConvNet
BS = 1024
|
3382
|
2135
|
GoogLeNet ConvNet
BS = 256
|
533
|
411
|
GoogLeNet ConvNet
BS = 1024
|
658
|
427
|
VGG ConvNet
BS = 32
|
236
|
129
|
VGG ConvNet
BS = 256
|
248
|
140
|
Inference Throughput Performance Measured in Images/Second; BS refers to Batch Size |
|
|
|
|
|
2S Intel® Xeon® Platinum 8180 processor, 28C, 2.5GHz
|
2S Intel® Xeon® processor E5-2699v4, 22C, 2.2GHz
|
MXNet
|
AlexNet
BS = 1
|
428
|
251
|
AlexNet
BS = 1024
|
2439
|
1093
|
VGG-19
BS = 1
|
121
|
71
|
VGG-19
BS = 256
|
333
|
155
|
Inception V3
BS = 16
|
170
|
121
|
Inception V3
BS = 1024
|
250
|
164
|
ResNet-50
BS = 1
|
47
|
41
|
ResNet-50
BS = 256
|
115
|
79
|
Inference Throughput Performance Measured in Images/Second; BS refers to Batch Size |
|
|
|
|
|
2S Intel® Xeon® Platinum 8180 processor, 28C, 2.5GHz
|
2S Intel® Xeon® processor E5-2699v4, 22C, 2.2GHz
|
Neon
|
AlexNet ConvNet
BS = 1
|
138
|
86
|
AlexNet ConvNet
BS = 1024
|
2889
|
1305
|
GoogLeNet v1 ConvNet
BS = 4
|
153
|
80
|
GoogLeNet v1 ConvNet
BS = 1024
|
1036
|
445
|
ResNet 18
BS = 4
|
224
|
133
|
ResNet 18
BS = 1024
|
672
|
286
|
Training Throughput Performance Measured in Images/Second; BS refers to Batch Size |
|
|
|
|
|
2S Intel® Xeon® Platinum 8180 processor, 28C, 2.5GHz
|
2S Intel® Xeon® processor E5-2699v4, 22C, 2.2GHz
|
Caffe
|
Caffe
AlexNet
BS=256
|
947
|
453.9007092
|
Caffe
GoogleNet v1
BS=96
|
268
|
145.2344932
|
Caffe
ResNet-50
BS=50
|
85
|
45.41326067
|
Caffe
VGG-19
BS=64
|
40
|
18.93491124
|
Caffe
AlexNet Convnet
BS=256
|
1089
|
495.1644101
|
Caffe
GoogleNet-v1 Convnet BS=96
|
288
|
146.3414634
|
Caffe
VGG ConvNet
BS=64
|
89
|
44.41360167
|
Training Throughput Performance Measured in Images/Second; BS refers to Batch Size |
|
|
|
|
|
2S Intel® Xeon® Platinum 8180 processor, 28C, 2.5GHz
|
2S Intel® Xeon® processor E5-2699v4, 22C, 2.2GHz
|
TensorFlow
|
TensorFlow
AlexNet Convnet
BS=256
|
969.69
|
387.737
|
Training Throughput Performance Measured in Images/Second; BS refers to Batch Size |
|
|
MXNet
|
|
2S Intel® Xeon® Platinum 8180 processor, 28C, 2.5GHz |
2S Intel® Xeon® processor E5-26994v4, 22C, 2.2GHz |
MXNet
Alexnet
BS=256 |
672.420169 |
335.351575 |
MXNet
VGG
BS=256
|
94.388501
|
51.650352
|
MXNet
Inception-bn
BS=256
|
134.497456
|
86.149216
|
MXNet
Inception-v3
BS=256
|
61.802955
|
41.057106
|
MXNet
ResNet-50
BS=256
|
44.340751
|
30.331338
|
Training Throughput Performance Measured in Images/Second; BS refers to Batch Size |
|
|
|
|
|
2S Intel® Xeon® Platinum 8180 processor, 28C, 2.5GHz
|
2S Intel® Xeon® processor E5-2699v4, 22C, 2.2GHz
|
Neon
|
Neon
GoogleNet v1-Convnet
BS=128
|
220.62
|
129
|
Neon
Resnet-18
BS=128
|
196.967
|
90.2427
|