Lattice Boltzmann (LB) methods are widely used today to describe the dynamics of fluids. Key advantages of this approach are the relative ease with which complex physics behavior, e.g. associated to multi-phase flows or irregular boundary conditions can be modeled, and -- from a computational perspective -- the large degree of available parallelism, that can be easily exploited on massively parallel systems. The advent of multi-core and many-core processors, including General Purpose Graphics Processing Unit (GP-GPU), has pushed the quest for parallelization also at the intra-processor level. From this point of view, LB methods may strongly benefit from these new architectures. In this paper we describe the implementation and optimization of a recently proposed thermal LB model -- the so called D2Q37 model -- on multi-GPU systems. We describe in details the ptimization techniques that we have used at both the intra-processor and inter-processor level, present performance and scaling figures and analyze bottlenecks associated to this implementation.

Implementation and Optimization of a Thermal Lattice Boltzmann Algorithm on a multi-GPU cluster

MANTOVANI, Filippo;PIVANTI, Marcello;SCHIFANO, Sebastiano Fabio;TRIPICCIONE, Raffaele
2012

Abstract

Lattice Boltzmann (LB) methods are widely used today to describe the dynamics of fluids. Key advantages of this approach are the relative ease with which complex physics behavior, e.g. associated to multi-phase flows or irregular boundary conditions can be modeled, and -- from a computational perspective -- the large degree of available parallelism, that can be easily exploited on massively parallel systems. The advent of multi-core and many-core processors, including General Purpose Graphics Processing Unit (GP-GPU), has pushed the quest for parallelization also at the intra-processor level. From this point of view, LB methods may strongly benefit from these new architectures. In this paper we describe the implementation and optimization of a recently proposed thermal LB model -- the so called D2Q37 model -- on multi-GPU systems. We describe in details the ptimization techniques that we have used at both the intra-processor and inter-processor level, present performance and scaling figures and analyze bottlenecks associated to this implementation.
2012
9781467326315
9781467326322
GPU; LBM; multi-gpu cluster
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in SFERA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11392/1676484
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? ND
social impact