The embedded and high-performance computing (HPC) sectors, that in the past were completely separated, are now somehow converging under the pressure of two driving forces: the release of less power consuming server processors and the increased performance of the new low power Systems-on-Chip (SoCs) developed to meet the requirements of the demanding mobile market. This convergence allows the porting to low power embedded architectures of applications that were originally confined to traditional HPC systems. In this paper, we present our experience of porting the Filtered Back-projection Algorithm to a low power, low cost system-on-chip, the NVIDIA Tegra K1, which is based on a quad core ARM CPU and on a NVIDIA Kepler GPU. This Filtered Back-projection Algorithm is heavily used in 3D Tomography reconstruction software. The porting has been done exploiting various programming languages (i.e. OpenMP, CUDA) and multiple versions of the application have been developed to exploit both the SoC CPU and GPU. The performances have been measured in terms of 2D slices (of a 3D volume) reconstructed per time unit and per energy unit. The results obtained with all the developed versions are reported and compared with those obtained on a typical x86 HPC node accelerated with a recent NVIDIA GPU. The best performances are achieved combining the OpenMP version and the CUDA version of the algorithm. In particular, we discovered that only three Jetson TK1 boards, equipped with Giga Ethernet interconnections, allow to reconstruct as many images per time unit as a traditional server, using one order of magnitude less energy. The results of this work can be applied for instance to the construction of an energy-efficient computing system of a portable tomographic apparatus.
X-Ray Computed Tomography Applied to Objects of Cultural Heritage: Porting and Testing the Filtered Back-Projection Reconstruction Algorithm on Low Power Systems-on-Chip
MORIGI, MARIA PIA;BRANCACCIO, ROSA;
2016
Abstract
The embedded and high-performance computing (HPC) sectors, that in the past were completely separated, are now somehow converging under the pressure of two driving forces: the release of less power consuming server processors and the increased performance of the new low power Systems-on-Chip (SoCs) developed to meet the requirements of the demanding mobile market. This convergence allows the porting to low power embedded architectures of applications that were originally confined to traditional HPC systems. In this paper, we present our experience of porting the Filtered Back-projection Algorithm to a low power, low cost system-on-chip, the NVIDIA Tegra K1, which is based on a quad core ARM CPU and on a NVIDIA Kepler GPU. This Filtered Back-projection Algorithm is heavily used in 3D Tomography reconstruction software. The porting has been done exploiting various programming languages (i.e. OpenMP, CUDA) and multiple versions of the application have been developed to exploit both the SoC CPU and GPU. The performances have been measured in terms of 2D slices (of a 3D volume) reconstructed per time unit and per energy unit. The results obtained with all the developed versions are reported and compared with those obtained on a typical x86 HPC node accelerated with a recent NVIDIA GPU. The best performances are achieved combining the OpenMP version and the CUDA version of the algorithm. In particular, we discovered that only three Jetson TK1 boards, equipped with Giga Ethernet interconnections, allow to reconstruct as many images per time unit as a traditional server, using one order of magnitude less energy. The results of this work can be applied for instance to the construction of an energy-efficient computing system of a portable tomographic apparatus.I documenti in SFERA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.