Intel Instances for HPC Workloads
The tests below were conducted on AWS instances that are based on various generations of Intel® Xeon® processor in a hyper-threaded configuration. This custom processor can reach an all-core Turbo clock speed of up to 3.5GHz and features Intel® Turbo Boost Technology 2.0, Intel® Advanced Vector Extensions 512 (Intel® AVX-512), and Intel® Deep Learning Boost. These new offerings deliver a better value proposition for general-purpose and memory-intensive workloads compared to the prior generation (e.g., increased scalability and an upgraded CPU class), including better performance.
What Is GROMACS?
The GROMACS application is a compute-bound application (FLOPS). The workloads in this application are latency sensitive for any communication (socket-socket, CPU-GPU and multi-node). It does take benefits from AVX-512 (Y), compute bound (Y) (excluding ionchannel – it is MPI bound on 8-16 nodes), benefits from Turbo (Y), benefits from HT/SMT (Y).
The workloads that we have considered for our benchmarking are publicly available:
- lignocellulose (3M atoms, RF type); Lignocellulose is useful as an example of scalability demonstration.
- water_rf (1,5M atoms, RF type)
How to Get Intel Benefits
3rd Gen Intel Xeon Scalable processors provide significant performance gains for the GROMACS workload that are accelerated by the Intel AVX-512 and Intel Deep Learning Boost technologies. This acceleration provides significant benefits at lower node count (greater than 2x). It becomes more limited as we scale into larger node count due to lower network bandwidth of C6i.32xlarge and M6i.32xlarge. Customers running this GROMACS workload can realize significant performance gains by deploying on 3rd Gen Intel Xeon Scalable instance types at AWS (M6i, C6i) vs. running on previous generation Intel Xeon Scalable processors at AWS.