skip to main content
10.1145/1854273.1854344acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
poster

Revisiting sorting for GPGPU stream architectures

Published:11 September 2010Publication History

ABSTRACT

This poster presents efficient strategies for sorting large sequences of fixed-length keys (and values) using GPGPU stream processors. Compared to the state-of-the-art, our radix sorting methods exhibit speedup of at least 2x for all generations of NVIDIA GPGPUs, and up to 3.7x for current GT200-based models. Our implementations demonstrate sorting rates of 482 million key-value pairs per second, and 550 million keys per second (32-bit). For this domain of sorting problems, we believe our sorting primitive to be the fastest available for any fully-programmable microarchitecture.

These results motivate a different breed of parallel primitives for GPGPU stream architectures that can better exploit the memory and computational resources while maintaining the flexibility of a reusable component. Our sorting performance is derived from a parallel scan stream primitive that has been generalized in two ways: (1) with local interfaces for producer/consumer operations (visiting logic), and (2) with interfaces for performing multiple related, concurrent prefix scans (multi-scan).

References

  1. }}J D Owens et al., "GPU Computing," Proceedings of the IEEE, vol. 96, no. 5, pp. 879--899, May 2008.Google ScholarGoogle ScholarCross RefCross Ref
  2. }}Nadathur Satish, Mark Harris, and Michael Garland, "Designing efficient sorting algorithms for manycore GPUs," in IPDPS '09: Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing, 2009, pp. 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. }}GPGPU.org. {Online}. http://gpgpu.org/developer/cudppGoogle ScholarGoogle Scholar
  4. }}Jatin Chhugani et al., "Efficient implementation of sorting on multi-core SIMD CPU architecture," Proc. VLDB Endow., pp. 1313--1324, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. }}Larry Seiler et al., "Larrabee: a many-core x86 architecture for visual computing," in SIGGRAPH '08: ACM SIGGRAPH 2008 papers, Los Angeles, CA, 2008, pp. 1--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. }}Donald Knuth, The Art of Computer Programming. Reading, MA, USA: Addison-Wesley, 1973, vol. III: Sorting and Searching.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. }}Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein, Introduction to Algorithms, 2nd ed.: McGraw-Hill, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. }}Duane G. Merrill and Andrew S. Grimshaw, "Revisiting Sorting for GPGPU Stream Architectures," University of Virginia, Department of Computer Science, Charlottesville, VA, Technical Report CS2010-03, 2010.Google ScholarGoogle Scholar
  9. }}Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides, Design Patterns: Elements of Reusable Object-Oriented Software. Reading, MA, USA: Addisson-Wesley, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. }}Duane Merrill and Andrew Grimshaw, "Parallel Scan for Stream Architectures," University of Virginia, Department of Computer Science, Charlottesville, VA, USA, Technical Report CS2009-14, 2009.Google ScholarGoogle Scholar

Index Terms

  1. Revisiting sorting for GPGPU stream architectures

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader