University of Warwick
  • Study
  • |
  • Research
  • |
  • Business
  • |
  • Alumni
  • |
  • News
  • Sign in
  • Computer Science Repository
  • More…

    Computer Science Repository

    • Home
    • About
    • Browse by Year
    • Browse by Subject
    • Browse by Division
    • Browse by Author
      • Login

    On the Acceleration of Wavefront Applications using Distributed Many-Core Architectures

    Pennycook, S.J., Hammond, S.D., Mudalige, G.R., Wright, S.A. and Jarvis, S.A. (2011) On the Acceleration of Wavefront Applications using Distributed Many-Core Architectures. The Computer Journal, 55 (2). pp. 138-153. ISSN 0010-4620

    [img]
    Preview
    PDF - Draft Version
    Download (504Kb) | Preview

      Abstract

      In this paper we investigate the use of distributed graphics processing unit (GPU)-based architectures to accelerate pipelined wavefront applications—a ubiquitous class of parallel algorithms used for the solution of a number of scientific and engineering applications. Specifically, we employ a recently developed port of the LU solver (from the NAS Parallel Benchmark suite) to investigate the performance of these algorithms on high-performance computing solutions from NVIDIA (Tesla C1060 and C2050) as well as on traditional clusters (AMD/InfiniBand and IBM BlueGene/P). Benchmark results are presented for problem classes A to C and a recently developed performance model is used to provide projections for problem classes D and E, the latter of which represents a billion-cell problem. Our results demonstrate that while the theoretical performance of GPU solutions will far exceed those of many traditional technologies, the sustained application performance is currently comparable for scientific wavefront applications. Finally, a breakdown of the GPU solution is conducted, exposing PCIe overheads and decomposition constraints. A new k-blocking strategy is proposed to improve the future performance of this class of algorithm on GPU-based architectures.

      Item Type: Article
      Uncontrolled Keywords: pcav hpsg performance gpu cpu cuda nvidia ati amd intel lu nas parallel benchmark
      Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
      Divisions: Faculty of Science > Computer Science
      Depositing User: Simon Hammond
      Date Deposited: 02 Aug 2011 11:56
      Last Modified: 23 Feb 2012 09:07
      URI: http://eprints.dcs.warwick.ac.uk/id/eprint/787

      Actions (login required)

      View Item
      Close this email form
      Page contact: Repository administrator Last revised: Wed 21 Mar 2012
      • Sign in
      • | Powered by EPrints 3