SFERA Archivio dei prodotti della Ricerca dell'Università di Ferrara

Current development trends of fast processors calls for an increasing number of cores, each core featuring wide vector processing units. Applications must then exploit both directions of parallelism to run efficiently. In this work we focus on the efficient use of vector instructions. These process several data-elements in parallel, and memory data layout plays an important role to make this efficient. An optimal memorylayout depends in principle on the access patterns of the algorithm but also on the architectural features of the processor. However, different parts of the application may have different requirements, and then the choice of the most efficient data-structure for vectorization has to be carefully assessed. We address these problems for a Lattice Boltzmann (LB) code, widely used in computational fluid-dynamics. We consider a state-of-the-art two-dimensional LB model, that accurately reproduces the thermo-hydrodynamics of a 2D-fluid. We write our codes in C and expose vector parallelism using directive-based programming approach. We consider different data layouts and analyze the corresponding performance. Our results show that, if an appropriate data layout is selected, it is possible to write a code for this class of applications that is automatically vectorized and performance portable on several architectures. We end up with a single code that runs efficiently onto traditional multi-core processors as well as on recent many-core systems such as the Xeon-Phi.

Experience on vectorizing lattice Boltzmann kernels for multi- and many-core architectures

Calore, Enrico;DEMO, NICOLA;SCHIFANO, Sebastiano Fabio;TRIPICCIONE, Raffaele

2016

Abstract

Current development trends of fast processors calls for an increasing number of cores, each core featuring wide vector processing units. Applications must then exploit both directions of parallelism to run efficiently. In this work we focus on the efficient use of vector instructions. These process several data-elements in parallel, and memory data layout plays an important role to make this efficient. An optimal memorylayout depends in principle on the access patterns of the algorithm but also on the architectural features of the processor. However, different parts of the application may have different requirements, and then the choice of the most efficient data-structure for vectorization has to be carefully assessed. We address these problems for a Lattice Boltzmann (LB) code, widely used in computational fluid-dynamics. We consider a state-of-the-art two-dimensional LB model, that accurately reproduces the thermo-hydrodynamics of a 2D-fluid. We write our codes in C and expose vector parallelism using directive-based programming approach. We consider different data layouts and analyze the corresponding performance. Our results show that, if an appropriate data layout is selected, it is possible to write a code for this class of applications that is automatically vectorized and performance portable on several architectures. We end up with a single code that runs efficiently onto traditional multi-core processors as well as on recent many-core systems such as the Xeon-Phi.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
			2016
		
	ISBN
	
			9783319321486
9783319321486
		
	Parole chiave
	
			Accelerator processors; Directive based compilation; Memory data layout; Vectorization; Computer Science (all); Theoretical Computer Science

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in SFERA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11392/2352804

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

12

ND

social impact