| Authors | L. Burchard, J. Moe, D. T. Schroeder, K. Pogorelov and J. Langguth |
| Editors | B. L. Chamberlain, A. Varbanescu, H. Ltaief and P. Luszczek |
| Title | iPUG: Accelerating Breadth-First Graph Traversals Using Manycore Graphcore IPUs |
| Afilliation | Scientific Computing |
| Project(s) | Department of High Performance Computing |
| Status | Published |
| Publication Type | Proceedings, refereed |
| Year of Publication | 2021 |
| Conference Name | High Performance Computing. ISC High Performance 2021 |
| Volume | LNCS, volume 12728 |
| Pagination | 291-309 |
| Publisher | Springer International Publishing |
| Place Published | Cham |
| ISBN Number | 978-3-030-78712-7 |
| ISSN Number | 0302-9743 |
| Keywords | BFS, Graph500, IPU, Performance optimization |
| Abstract | The Graphcore Intelligence Processing Unit (IPU) is a newly developed processor type whose architecture does not rely on the traditional caching hierarchies. Developed to meet the need for more and more data-centric applications, such as machine learning, IPUs combine a dedicated portion of SRAM with each of its numerous cores, resulting in high memory bandwidth at the price of capacity. The proximity of processor cores and memory makes the IPU a promising field of experimentation for graph algorithms since it is the unpredictable, irregular memory accesses that lead to performance losses in traditional processors with pre-caching. This paper aims to test the IPU’s suitability for algorithms with hard-to-predict memory accesses by implementing a breadth-first search (BFS) that complies with the Graph500 specifications. Precisely because of its apparent simplicity, BFS is an established benchmark that is not only subroutine for a variety of more complex graph algorithms, but also allows comparability across a wide range of architectures. We benchmark our IPU code on a wide range of instances and compare its performance to state-of-the-art CPU and GPU codes. The results indicate that the IPU delivers speedups of up to 4×4× over the fastest competing result on an NVIDIA V100 GPU, with typical speedups of about 1.5×1.5× on most test instances. |
| URL | https://link.springer.com/10.1007/978-3-030-78713-4 |
| DOI | 10.1007/978-3-030-78713-4 |
| Citation Key | 28037 |
