
Programming Primitives for the GPU
In CPU program development, programmers use well-established abstractions, libraries, and primitives developed by other programmers. General-purpose GPU applications do not have such a body of programming primitives, a difficulty that limits the advancement of GPU computing. We believe that the development of programming primitives for the GPU is a key step in making the GPU a first-class primitive in computing systems.
Working with Dr. Mark Harris of NVIDIA, we have recently developed best-of-class implementations of a particular parallel primitive, scan, that we believe will be a fundamental primitive for a wide range of GPU computing applications. Scan allows the mapping of many problems that do not seem to be easily mapped into a data-parallel context, such as stream compaction, sparse matrix operations, tridiagonal matrix solvers, and quicksort. We implemented both unsegmented and segmented scan libraries and are preparing them for open source release as the CUDPP data parallel
primitive library for GPUs.
Mark Harris, Shubhabrata Sengupta, and John D. Owens. Parallel Prefix Sum (Scan) with CUDA. In Herbert Nguyen, editor, GPU Gems 3, chapter 39, pages 851-876. Addison Wesley, August 2007. http://graphics.idav.ucdavis.edu/publications/print_pub?pub_id=916.
Shubhabrata Sengupta, Mark Harris, Yao Zhang, and John D. Owens. Scan Primitives for GPU Computing. In Graphics Hardware 2007, pages 97-106, August 2007. Best Paper Award. http://graphics.idav.ucdavis.edu/publications/print_pub?pub_id=915.
Back to Research Highlights
