Combining Locality Analysis with Online Proactive Job Co-scheduling in Chip Multiprocessors

Jiang, Yunlian; Tian, Kai; Shen, Xipeng

doi:10.1007/978-3-642-11515-8_16

Yunlian Jiang²¹,
Kai Tian²¹ &
Xipeng Shen²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5952))

Included in the following conference series:

International Conference on High-Performance Embedded Architectures and Compilers

1278 Accesses
42 Citations

Abstract

The shared-cache contention on Chip Multiprocessors causes performance degradation to applications and hurts system fairness. Many previously proposed solutions schedule programs according to runtime sampled cache performance to reduce cache contention. The strong dependence on runtime sampling inherently limits the scalability and effectiveness of those techniques. This work explores the combination of program locality analysis with job co-scheduling. The rationale is that program locality analysis typically offers a large-scope view of various facets of an application including data access patterns and cache requirement. That knowledge complements the local behaviors sampled by runtime systems. The combination offers the key to overcoming the limitations of prior co-scheduling techniques.

Specifically, this work develops a lightweight locality model that enables efficient, proactive prediction of the performance of co-running processes, offering the potential for an integration in online scheduling systems. Compared to existing multicore scheduling systems, the technique reduces performance degradation by 34% (7% performance improvement) and unfairness by 47%. Its proactivity makes it resilient to the scalability issues that constraints the applicability of previous techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Browne, S., Deane, C., Ho, G., Mucci, P.: PAPI: A portable interface to hardware performance counters. In: Proceedings of Department of Defense HPCMP Users Group Conference (1999)
Google Scholar
Bulpin, J.R., Pratt, I.A.: Hyper-threading aware process scheduling heuristics. In: 2005 USENIX Annual Technical Conference, pp. 103–106 (2005)
Google Scholar
Chandra, D., Guo, F., Kim, S., Solihin, Y.: Predicting inter-thread cache contention on a chip multi-processor architecture. In: Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), pp. 340–351 (2005)
Google Scholar
DeVuyst, M., Kumar, R., Tullsen, D.M.: Exploiting unbalanced thread scheduling for energy and performance on a cmp of smt processors. In: Proceedings of International Parallel and Distribute Processing Symposium, IPDPS (2006)
Google Scholar
Ding, C., Zhong, Y.: Predicting whole-program locality with reuse distance analysis. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, CA, June 2003, pp. 245–257 (2003)
Google Scholar
El-Moursy, A., Garg, R., Albonesi, D.H., Dwarkadas, S.: Compatible phase co-scheduling on a cmp of multi-threaded processors. In: Proceedings of the International Parallel and Distribute Processing Symposium, IPDPS (2006)
Google Scholar
Fedorova, A., Seltzer, M., Small, C., Nussbaum, D.: Performance of multithreaded chip multiprocessors and implications for operating system design. In: Proceedings of USENIX Annual Technical Conference (2005)
Google Scholar
Fedorova, A., Seltzer, M., Smith, M.D.: Improving performance isolation on chip multiprocessors via an operating system scheduler. In: Proceedings of the International Conference on Parallel Architecture and Compilation Techniques (2007)
Google Scholar
Hsu, L.R., Reinhardt, S.K., Lyer, R., Makineni, S.: Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource. In: Proceedings of the International Conference on Parallel Architecture and Compilation Techniques (2006)
Google Scholar
Jiang, Y., Shen, X.: Exploration of the influence of program inputs on cmp co-scheduling. In: Luque, E., Margalef, T., Benítez, D. (eds.) Euro-Par 2008. LNCS, vol. 5168, pp. 263–273. Springer, Heidelberg (2008)
Chapter Google Scholar
Jiang, Y., Shen, X., Chen, J., Tripathi, R.: Analysis and approximation of optimal co-scheduling on chip multiprocessors. In: Proceedings of the International Conference on Parallel Architecture and Compilation Techniques (PACT) (October 2008)
Google Scholar
Kim, S., Chandra, D., Solihin, Y.: Fair cache sharing and partitioning in a chip multiprocessor architecture. In: Proceedings of the International Conference on Parallel Architecture and Compilation Techniques (2004)
Google Scholar
Li, T., Baumberger, D., Hahn, S.: Efficient and scalable multiprocessor fair scheduling using distributed weighted round-robin. In: Proceedings of ACM Symposium on Principles and Practice of Parallel Programming, pp. 65–74 (2009)
Google Scholar
Luk, C.-K., et al.: Pin: Building customized program analysis tools with dynamic instrumentation. In: Proceedings of the ACM SIGPLAN conference on Programming language design and implementation, Chicago, Illinois, June 2005, pp. 190–200 (2005)
Google Scholar
Marin, G., Mellor-Crummey, J.: Cross architecture performance predictions for scientific applications using parameterized models. In: Proceedings of Joint International Conference on Measurement and Modeling of Computer Systems, New York City, June 2004, pp. 2–13 (2004)
Google Scholar
Mattson, R.L., Gecsei, J., Slutz, D., Traiger, I.L.: Evaluation techniques for storage hierarchies. IBM System Journal 9(2), 78–117 (1970)
Article Google Scholar
McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. IEEE TCCA Newsletter (1995), http://www.cs.virginia.edu/stream
Parekh, S., Eggers, S., Levy, H., Lo, J.: Thread-sensitive scheduling for smt processors. Technical Report 2000-04-02, University of Washington (June 2000)
Google Scholar
Rafique, N., Lim, W., Thottethodi, M.: Architectural support for operating system-driven cmp cache management. In: Proceedings of the International Conference on Parallel Architecture and Compilation Techniques (2006)
Google Scholar
Sarkar, S., Tullsen, D.: Compiler techniques for reducing data cache miss rate on a multithreaded architecture. In: Proceedings of The HiPEAC International Conference on High Performance Embedded Architectures and Compilation (2008)
Google Scholar
Settle, A., Kihm, J.L., Janiszewski, A., Connors, D.A.: Architectural support for enhanced smt job scheduling. In: Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, pp. 63–73 (2004)
Google Scholar
Shen, X., Jiang, Y., Mao, F.: Caps: Contention-aware proactive scheduling for cmps with shared caches. Technical Report WM-CS-2007-09, Computer Science Department, The College of William and Mary (2007)
Google Scholar
Shen, X., Shaw, J., Meeker, B., Ding, C.: Locality approximation using time. In: Proceedings of the ACM SIGPLAN Conference on Principles of Programming Languages, POPL (2007)
Google Scholar
Shen, X., Zhong, Y., Ding, C.: Locality phase prediction. In: Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, Boston, MA, pp. 165–176 (2004)
Google Scholar
Snavely, A., Tullsen, D.M.: Symbiotic jobscheduling for a simultaneous multithreading processor. In: Proceedings of ASPLOS (2000)
Google Scholar
Solihin, Y., Lam, V., Torrellas, J.: Scal-tool: Pinpointing and quantifying scalability bottlenecks in dsm multiprocessors. In: Proceedings of the 1999 Conference on Supercomputing (1999)
Google Scholar
Suh, G., Rudolph, L., Devadas, S.: Dynamic partitioning of shared cache memory. Journal of Supercomputing 28, 7–26 (2004)
Article MATH Google Scholar
Suh, G.E., Devadas, S., Rudolph, L.: Analytical cache models with applications to cache partitioning. In: Proceedings of the 15th international conference on Supercomputing (2001)
Google Scholar
Suh, G.E., Devadas, S., Rudolph, L.: A new memory monitoring scheme for memory-aware scheduling and partitioning. In: Proceedings of the 8th International Symposium on High-Performance Computer Architecture (2002)
Google Scholar
Tam, D., Azimi, R., Stumm, M.: Thread clustering: sharing-aware scheduling on smp-cmp-smt multiprocessors. SIGOPS Oper. Syst. Rev. 41(3), 47–58 (2007)
Article Google Scholar
Thiebaut, D., Stone, H.S.: Footprints in the cache. ACM Transactions on Computer Systems 5(4) (1987)
Google Scholar
Tian, K., Jiang, Y., Shen, X.: A study on optimally co-scheduling jobs of different lengths on chip multiprocessors. In: Proceedings of ACM Computing Frontiers (2009)
Google Scholar
Tuck, N., Tullsen, D.M.: Initial observations of the simultaneous multithreading Pentium 4 processor. In: Proceedings of International Conference on Parallel Architectures and Compilation Techniques, New Orleans, Louisiana (September 2003)
Google Scholar
Zhang, X., Dwarkadas, S., Folkmanis, G., Shen, K.: Processor hardware counter statistics as a first-class system resource. In: Proceedings of the 11th Workshop on Hot Topics in Operating Systems (2007)
Google Scholar
Zhong, Y., Dropsho, S.G., Shen, X., Studer, A., Ding, C.: Miss rate prediction across program inputs and cache configurations. IEEE Transactions on Computers 56(3), 328–343 (2007)
Article MathSciNet Google Scholar
Zhong, Y., Orlovich, M., Shen, X., Ding, C.: Array regrouping and structure splitting using whole-program reference affinity. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2004, pp. 255–266 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, The College of William and Mary, Williamsburg, VA, USA, 23187
Yunlian Jiang, Kai Tian & Xipeng Shen

Authors

Yunlian Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Kai Tian
View author publications
You can also search for this author in PubMed Google Scholar
Xipeng Shen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical and Computer Engineering, The University of Texas at Austin, 1 University Station C0803, TX 78712-0240, Austin, USA
Yale N. Patt
Dipartimento di Ingegneria della Informazione, Università di Pisa, Via Diotisalvi 2, 56100, Pisa, Italy
Pierfrancesco Foglia
IBM T.J.Watson Research Center, 19 Skyline Drive, NY 10532, Hawthorne, USA
Evelyn Duesterwald
Hewlett-Packard, Cami de Can Graells 1-21, Sant Cugat del Vallés, 08174, Barcelona, Spain
Paolo Faraboschi
Computer Architecture Department, Technical University of Catalunya (UPC), c/Jordi Girona 1-3, 08034, Barcelona, Spain
Xavier Martorell

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jiang, Y., Tian, K., Shen, X. (2010). Combining Locality Analysis with Online Proactive Job Co-scheduling in Chip Multiprocessors. In: Patt, Y.N., Foglia, P., Duesterwald, E., Faraboschi, P., Martorell, X. (eds) High Performance Embedded Architectures and Compilers. HiPEAC 2010. Lecture Notes in Computer Science, vol 5952. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11515-8_16

Download citation

DOI: https://doi.org/10.1007/978-3-642-11515-8_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11514-1
Online ISBN: 978-3-642-11515-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics