skip to main content
10.1145/2486159.2486194acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article

Recursive design of hardware priority queues

Published:23 July 2013Publication History

ABSTRACT

A recursive and fast construction of an n elements priority queue from exponentially smaller hardware priority queues and size n RAM is presented. All priority queue implementations to date either require O (log n) instructions per operation or exponential (with key size) space or expensive special hardware whose cost and latency dramatically increases with the priority queue size. Hence constructing a priority queue (PQ) from considerably smaller hardware priority queues (which are also much faster) while maintaining the O(1) steps per PQ operation is critical. Here we present such an acceleration technique called the Power Priority Queue (PPQ) technique. Specifically, an n elements PPQ is constructed from 2k-1 primitive priority queues of size kn (k=2,3,...) and a RAM of size n, where the throughput of the construct beats that of a single, size n primitive hardware priority queue. For example an n elements PQ can be constructed from either three √n or five 3√n primitive H/W priority queues.

Applying our technique to a TCAM based priority queue, results in TCAM-PPQ, a scalable perfect line rate fair queuing of millions of concurrent connections at speeds of 100 Gbps. This demonstrates the benefits of our scheme when used with hardware TCAM, we expect similar results with systolic arrays, shift-registers and similar technologies.

As a by product of our technique we present an O(n) time sorting algorithm in a system equipped with a O(wn) entries TCAM, where here n is the number of items, and w is the maximum number of bits required to represent an item, improving on a previous result that used an Ω(n) entries TCAM. Finally, we provide a lower bound on the time complexity of sorting n elements with TCAM of size O(n) that matches our TCAM based sorting algorithm.

References

  1. M. Thorup, "Equivalence between priority queues and sorting," in IEEE Symposium on Foundations of Computer Science, 2002, pp. 125--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. Lavoie, D. Haccoun, and Y. Savaria, "A systolic architecture for fast stack sequential decoders," Communications, IEEE Transactions on, vol. 42, no. 234, pp. 324--335, feb/mar/apr 1994.Google ScholarGoogle ScholarCross RefCross Ref
  3. S.-W. Moon, K. Shin, and J. Rexford, "Scalable hardware priority queue architectures for high-speed packet switches," in Real-Time Technology and Applications Symposium, 1997. Proceedings., Third IEEE, jun 1997, pp. 203--212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. H. Wang and B. Lin, "Pipelined van emde boas tree: Algorithms, analysis, and applications," in IEEE INFOCOM, 2007, pp. 2471--2475.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Mclaughlin, S. Sezer, H. Blume, X. Yang, F. Kupzog, and T. G. Noll, "A scalable packet sorting circuit for high-speed wfq packet scheduling," IEEE Transactions on Very Large Scale Integration Systems, vol. 16, pp. 781--791, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Ioannou and M. Katevenis, "Pipelined heap (priority queue) management for advanced scheduling in high-speed networks," Networking, IEEE/ACM Transactions on, vol. 15, no. 2, pp. 450--461, april 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Chandra and O. Sinnen, "Improving application performance with hardware data structures," in Parallel Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on, april 2010, pp. 1--4.Google ScholarGoogle Scholar
  8. R. Panigrahy and S. Sharma, "Sorting and searching using ternary cams," IEEE Micro, vol. 23, pp. 44--53, January 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Y. Afek, A. Bremler-Barr, and L. Schiff, "Recursive design of hardware priority queues." {Online}. Available: http://www.cs.tau.ac.il/~schiffli/PPQfull.pdfGoogle ScholarGoogle Scholar
  10. L. Zhang, "Virtualclock: a new traffic control algorithm for packet-switched networks," ACM Transactions on Computer Systems (TOCS), vol. 9, no. 2, pp. 101--124, may 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Goyal, H. Vin, and H. Cheng, "Start-time fair queueing: a scheduling algorithm for integrated services packet switching networks," Networking, IEEE/ACM Transactions on, vol. 5, no. 5, pp. 690--704, oct 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Keshav, An engineering approach to computer networking: ATM networks, the Internet, and the telephone network. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Kortebi, L. Muscariello, S. Oueslati, and J. Roberts, "Evaluating the number of active flows in a scheduler realizing fair statistical bandwidth sharing," in Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, ser. SIGMETRICS '05. New York, NY, USA: ACM, 2005, pp. 217--228. {Online}. Available: http://doi.acm.org/10.1145/1064212.1064237 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Shreedhar and G. Varghese, "Efficient fair queueing using deficit round-robin," IEEE/ACM Trans. Netw., vol. 4, pp. 375--385, June 1996. {Online}. Available: http://dx.doi.org/10.1109/90.502236 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. Wang and B. Lin, "Succinct priority indexing structures for the management of large priority queues," in Quality of Service, 2009. IWQoS. 17th International Workshop on, july 2009, pp. 1--5.Google ScholarGoogle Scholar
  16. X. Zhuang and S. Pande, "A scalable priority queue architecture for high speed network processing," in INFOCOM 2006. 25th IEEE International Conference on Computer Communications. Proceedings, april 2006, pp. 1--12.Google ScholarGoogle Scholar
  17. G. S. Brodal, J. L. TrÃd'ff, and C. D. Zaroliagis, "A parallel priority queue with constant time operations," Journal of Parallel and Distributed Computing, vol. 49, no. 1, pp. 4 --21, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. V. Gerbessiotis and C. J. Siniolakis, "Architecture independent parallel selection with applications to parallel priority queues," Theoretical Computer Science, vol. 301, no. 1A S3, pp. 119--142, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Garcia, M. March, L. Cerda, J. Corbal, and M. Valero, "On the design of hybrid dram/sram memory schemes for fast packet buffers," in High Performance Switching and Routing, 2004. HPSR. 2004 Workshop on, 2004, pp. 15--19.Google ScholarGoogle Scholar
  20. H. J. Chao and B. Liu, High Performance Switches and Routers. John Wiley & Sons, Inc., 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Patel, E. Norige, E. Torng, and A. X. Liu, "Fast regular expression matching using small tcams for network intrusion detection and prevention systems," in USENIX Security Symposium, 2010, pp. 111--126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Packet size distribution comparison between Internet links in 1998 and 2008, CAIDA. {Online}. Available: http://www.caida.org/research/traffic-analysis/pkt_size_ distribution/graphs.xmlGoogle ScholarGoogle Scholar
  23. A. M. Ben-amram, "When can we sort in o(n log n) time"? Journal of Computer and System Sciences, vol. 54, pp. 345--370, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. Agrawal and T. Sherwood, "Ternary cam power and delay model: Extensions and uses," IEEE Transactions on Very Large Scale Integration Systems, vol. 16, pp. 554--564, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Recursive design of hardware priority queues

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                SPAA '13: Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
                July 2013
                348 pages
                ISBN:9781450315722
                DOI:10.1145/2486159

                Copyright © 2013 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 23 July 2013

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article

                Acceptance Rates

                SPAA '13 Paper Acceptance Rate31of130submissions,24%Overall Acceptance Rate447of1,461submissions,31%

                Upcoming Conference

                SPAA '24

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader