Performance Analysis of a Hybrid MPI/CUDA Implementation of the NAS-LU Benchmark
Pennycook, S.J., Hammond, S.D., Mudalige, G.R. and Jarvis, S.A. (2010) Performance Analysis of a Hybrid MPI/CUDA Implementation of the NAS-LU Benchmark. In: 1st International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems (PMBS 10), held in conjunction with IEEE/ACM Supercomputing 2010 (SC'10), New Orleans, LA, USA.
| PDF - Published Version Download (396Kb) | Preview |
Abstract
The emergence of Graphics Processing Units (GPUs) as a potential alternative to conventional general-purpose processors has led to significant interest in these architectures by both the academic community and the High Performance Computing (HPC) industry. While GPUs look likely to deliver unparalleled levels of performance, the publication of studies claiming performance improvements in excess of 30,000x are misleading.
Significant on-node performance improvements have been demonstrated for code kernels and algorithms amenable to GPU acceleration; studies demonstrating comparable results for full scientific applications requiring multiple-GPU architectures are rare.
In this paper we present an analysis of a port of the NAS LU benchmark to NVIDIA's Compute Unified Device Architecture (CUDA) - the most stable GPU programming model currently available. Our solution is also extended to multiple nodes and multiple GPU devices.
Runtime performance on several GPUs is presented, ranging from low-end, consumer-grade cards such as the 8400GS to NVIDIA's flagship Fermi HPC processor found in the recently released C2050. We compare the runtimes of these devices to several processors including those from Intel, AMD and IBM.
In addition to this we utilise a recently developed performance model of LU. With this we predict the runtime performance of LU on large-scale distributed GPU clusters, which are predicted to become commonplace in future high-end HPC architectural solutions.
| Item Type: | Conference or Workshop Item (Paper) |
|---|---|
| Uncontrolled Keywords: | pcav hpsg gpu lu cuda performance modelling modeling pmbs bluegene |
| Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science Q Science > QA Mathematics > QA76 Computer software Q Science > QA Mathematics > QA76.73 Computer algorithms. Data structures. |
| Divisions: | Faculty of Science > Computer Science |
| Depositing User: | Simon Hammond |
| Date Deposited: | 04 Nov 2010 07:41 |
| Last Modified: | 23 Feb 2012 09:07 |
| URI: | http://eprints.dcs.warwick.ac.uk/id/eprint/270 |
Actions (login required)
| View Item |