Hardware accelerators are nowadays very common in HPC systems, and GPUs are playing a major role in this. More recently, also FPGAs started to be adopted in few data centers to accelerate specific workloads, and it is expected that in the near future they will be increasingly used also in general purpose HPC systems. FPGAs are already known to provide interesting speedups in several application fields, but to estimate their expected performance in the context of typical HPC workloads is not straightforward. To ease this task, in this paper we present the FPGA Empirical Roofline (FER), a benchmarking tool able to empirically estimate the computing throughput and memory bandwidth of FPGAs, when used as hardware accelerators for HPC applications. Using the widely known Roofline Model as a theoretical foundation, FER allows to measure FPGAs computing throughput and bandwidth upper-bounds, allowing to estimate the performance of function kernels developed using high level synthesis tools, according to their arithmetic intensity. We implemented FER using two different high level paradigms: OmpSs@FPGA and Xilinx Vitis workflow, two promising approaches to develop HPC applications enabling exploitation of FPGAs as hardware accelerators. In this paper we describe the theoretical model on which the FER benchmark relies, as well as its implementation details, and we provide performance results measured on the Xilinx Alveo U250 FPGA.

Performance assessment of FPGAs as HPC accelerators using the FPGA Empirical Roofline

Calore E.
Primo
;
Fabio Schifano S.
Ultimo
2021

Abstract

Hardware accelerators are nowadays very common in HPC systems, and GPUs are playing a major role in this. More recently, also FPGAs started to be adopted in few data centers to accelerate specific workloads, and it is expected that in the near future they will be increasingly used also in general purpose HPC systems. FPGAs are already known to provide interesting speedups in several application fields, but to estimate their expected performance in the context of typical HPC workloads is not straightforward. To ease this task, in this paper we present the FPGA Empirical Roofline (FER), a benchmarking tool able to empirically estimate the computing throughput and memory bandwidth of FPGAs, when used as hardware accelerators for HPC applications. Using the widely known Roofline Model as a theoretical foundation, FER allows to measure FPGAs computing throughput and bandwidth upper-bounds, allowing to estimate the performance of function kernels developed using high level synthesis tools, according to their arithmetic intensity. We implemented FER using two different high level paradigms: OmpSs@FPGA and Xilinx Vitis workflow, two promising approaches to develop HPC applications enabling exploitation of FPGAs as hardware accelerators. In this paper we describe the theoretical model on which the FER benchmark relies, as well as its implementation details, and we provide performance results measured on the Xilinx Alveo U250 FPGA.
2021
978-1-6654-3759-2
Acceleration, Bandwidth, Benchmarking, High level synthesis, Program processors
File in questo prodotto:
File Dimensione Formato  
2021_Performance_assessment_of_FPGAs.pdf

solo gestori archivio

Descrizione: versione editoriale
Tipologia: Full text (versione editoriale)
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 208.65 kB
Formato Adobe PDF
208.65 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in SFERA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11392/2482338
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 17
  • ???jsp.display-item.citation.isi??? 15
social impact