Energy consumption of processors and memories is quickly becoming a limiting factor in the deployment of large computing systems. For this reason, it is important to understand the energy performance of these processors and to study strategies allowing their use in the most efficient way. In this work, we focus on the computing and energy performance of the Knights Landing Xeon Phi, the latest Intel many-core architecture processor for HPC applications. We consider the 64-core Xeon Phi 7230 and profile its performance and energy efficiency using both its on-chip MCDRAM and the off-chip DDR4 memory as the main storage for application data. As a benchmark application, we use a lattice Boltzmann code heavily optimized for this architecture and implemented using several different arrangements of the application data in memory (data-layouts, in short). We also assess the dependence of energy consumption on data-layouts, memory configurations (DDR4 or MCDRAM) and the number of threads per core. We finally consider possible trade-offs between computing performance and energy efficiency, tuning the clock frequency of the processor using the Dynamic Voltage and Frequency Scaling (DVFS) technique.

Software and DVFS Tuning for Performance and Energy-Efficiency on Intel KNL Processors

Calore, Enrico
Primo
;
Gabbana, Alessandro
Secondo
;
Schifano, Sebastiano
Penultimo
;
Tripiccione, Raffaele
Ultimo
2018

Abstract

Energy consumption of processors and memories is quickly becoming a limiting factor in the deployment of large computing systems. For this reason, it is important to understand the energy performance of these processors and to study strategies allowing their use in the most efficient way. In this work, we focus on the computing and energy performance of the Knights Landing Xeon Phi, the latest Intel many-core architecture processor for HPC applications. We consider the 64-core Xeon Phi 7230 and profile its performance and energy efficiency using both its on-chip MCDRAM and the off-chip DDR4 memory as the main storage for application data. As a benchmark application, we use a lattice Boltzmann code heavily optimized for this architecture and implemented using several different arrangements of the application data in memory (data-layouts, in short). We also assess the dependence of energy consumption on data-layouts, memory configurations (DDR4 or MCDRAM) and the number of threads per core. We finally consider possible trade-offs between computing performance and energy efficiency, tuning the clock frequency of the processor using the Dynamic Voltage and Frequency Scaling (DVFS) technique.
2018
Calore, Enrico; Gabbana, Alessandro; Schifano, Sebastiano; Tripiccione, Raffaele
File in questo prodotto:
File Dimensione Formato  
jlpea-08-00018.pdf

accesso aperto

Descrizione: Full text editoriale
Tipologia: Full text (versione editoriale)
Licenza: Creative commons
Dimensione 1.83 MB
Formato Adobe PDF
1.83 MB Adobe PDF Visualizza/Apri

I documenti in SFERA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11392/2390348
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 9
  • ???jsp.display-item.citation.isi??? ND
social impact