SFERA Archivio dei prodotti della Ricerca dell'Università di Ferrara

This paper describes a massively parallel code for a state-of-the art thermal lattice–Boltzmann method. Our code has been carefully optimized for performance on one GPU and to have a good scaling behavior extending to a large number of GPUs. Versions of this code have been already used for large-scale studies of convective turbulence. GPUs are becoming increasingly popular in HPC applications, as they are able to deliver higher performance than traditional processors. Writing efficient programs for large clusters is not an easy task as codes must adapt to increasingly parallel architectures, and the overheads of node-to-node communications must be properly handled. We describe the structure of our code, discussing several key design choices that were guided by theoretical models of performance and experimental benchmarks. We present an extensive set of performance measurements and identify the corresponding main bottlenecks; finally we compare the results of our GPU code with those measured on other currently available high performance processors. Our results are a production-grade code able to deliver a sustained performance of several tens of Tflops as well as a design and optimization methodology that can be used for the development of other high performance applications for computational physics.

Massively parallel lattice–Boltzmann codes on large GPU clusters

Calore, Enrico;GABBANA, Alessandro;Kraus, J.;PELLEGRINI, Elisa;SCHIFANO, Sebastiano Fabio;TRIPICCIONE, Raffaele

2016

Abstract

This paper describes a massively parallel code for a state-of-the art thermal lattice–Boltzmann method. Our code has been carefully optimized for performance on one GPU and to have a good scaling behavior extending to a large number of GPUs. Versions of this code have been already used for large-scale studies of convective turbulence. GPUs are becoming increasingly popular in HPC applications, as they are able to deliver higher performance than traditional processors. Writing efficient programs for large clusters is not an easy task as codes must adapt to increasingly parallel architectures, and the overheads of node-to-node communications must be properly handled. We describe the structure of our code, discussing several key design choices that were guided by theoretical models of performance and experimental benchmarks. We present an extensive set of performance measurements and identify the corresponding main bottlenecks; finally we compare the results of our GPU code with those measured on other currently available high performance processors. Our results are a production-grade code able to deliver a sustained performance of several tens of Tflops as well as a design and optimization methodology that can be used for the development of other high performance applications for computational physics.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
			2016
		
	DOI
	
			https://dx.doi.org/10.1016/j.parco.2016.08.005
		
	Titolo della Rivista
	
			PARALLEL COMPUTING
		
	Tutti gli autori
	
			Calore, Enrico; Gabbana, Alessandro; Kraus, J.; Pellegrini, Elisa; Schifano, Sebastiano Fabio; Tripiccione, Raffaele
		
	Appare nelle tipologie:
	
			03.1 Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
1703.00185.pdf accesso aperto Descrizione: pre print arXive Tipologia: Pre-print Licenza: PUBBLICO - Pubblico con Copyright Dimensione 5.34 MB Formato Adobe PDF Visualizza/Apri	5.34 MB	Adobe PDF	Visualizza/Apri
massively.2016.pdf solo gestori archivio Descrizione: versione editoriale Tipologia: Full text (versione editoriale) Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 4.98 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	4.98 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in SFERA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11392/2352035

Citazioni

ND

62

48

social impact