Skip to main content

Design Space Exploration of Hybrid Ultra Low Power Branch Predictors

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7179))

Abstract

Modern branch predictors are often too large and power hungry to be a viable option for small, embedded processors where die space, power consumption and performance are all at a premium. With embedded processors the large cache structures required for high performance branch prediction can easily take up more die space than the rest of the processor combined. When coupled with the large leakage energies, which are set to be an increasing issue as technologies advance to 45nm and beyond, it can often appear appealing to not use a dynamic branch predictor at all. This paper seeks to find a way of using an ultra small branch predictor in a hybrid predictor configuration suitable for an embedded processor. We introduce a novel bias parameter to the consideration of when to execute branches statically or dynamically, further exploring the performance vs energy trade-off. We present a solution that reduces dynamic branch predictor aliasing, improves performance and requires a minimum of extra die space. The results presented relate die space requirements, energy use and performance impacts. We look at how best to optimise this balance in a way that is usually not considered, and on a lower bits budget than has previously been presented. The EEMBC 1.1 benchmark suite [1] was used to explore the energy vs performance trade-off boundary, taking averages of the results across 31 different benchmarks. We evaluate 5 traditional branch predictor configurations and 36 novel ultra small hybrid branch predictors through the use of 9 sets of our novel bias values, combining GShare dynamic predictions with profiled backwards taken forwards not-taken (BTFN)/ backwards not-taken forwards taken (BNFT) static predictions. The results demonstrate that the use of a static-dynamic hybrid is not only beneficial but necessary for very small predictors to produce a positive effect on the cycle count and overall energy use of the processor. Through the use of our novel bias parameter we explore the performance vs energy trade-off and show that through a small (0.1 seconds at 500MHz or 0.35%) reduction in peak performance (total runtime in region of 28.35 seconds) for a given architecture we can gain substantial dynamic energy savings from reduced dynamic predictor accesses (removing up to an additional 16.5%, or 53 million, of the traditional hybrid predictor accesses). Our best performing architecture showed an average improvement in run time of 2 seconds (6.7%) over a static BTFN baseline (total runtime 30.46s), at the cost of only an additional 0.01mm2 (or 1%) die space.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. EEMBC benchmark suite

    Google Scholar 

  2. ARM Ltd. ARM Cortex M3 (2011)

    Google Scholar 

  3. Burcea, I., et al.: Predictor virtualization. In: Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XIII, pp. 157–167 (2008)

    Google Scholar 

  4. Burcea, I., Moshovos, A.: Phantom-BTB: a virtualized branch target buffer design. In: Proceeding of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2009, pp. 313–324 (2009)

    Google Scholar 

  5. Chang, P.-Y., et al.: Branch classification: a new mechanism for improving branch predictor performance. In: Proceedings of the 27th Annual International Symposium on Microarchitecture, MICRO, vol. 27, pp. 22–31 (1994)

    Google Scholar 

  6. Hicks, M., et al.: Towards an energy efficient branch prediction scheme using profiling, adaptive bias measurement and delay region scheduling. In: International Conference on Design & Technology of Integrated Systems in Nanoscale Era (2007)

    Google Scholar 

  7. Hu, Z., et al.: Applying decay strategies to branch predictors for leakage energy savings. In: IEEE International Conference on Computer Design (2002)

    Google Scholar 

  8. Jimnez, D.A.: 2nd JILP championship branch prediction CBP-2 (2006)

    Google Scholar 

  9. Monchiero, M., et al.: Power-aware branch prediction techniques: a compiler-hints based approach for VLIW processors. In: Proceedings of the 14th ACM Great Lakes Symposium on VLSI, GLSVLSI 2004, pp. 440–443 (2004)

    Google Scholar 

  10. Parikh, D., et al.: Power issues related to branch prediction. In: Proceedings of the 8th International Symposium on High-Performance Computer Architecture, HPCA 2002, p. 233–244 (2002)

    Google Scholar 

  11. Patil, H., Emer, J.: Combining static and dynamic branch prediction to reduce destructive aliasing. In: Sixth International Symposium on High-Performance Computer Architecture (2000)

    Google Scholar 

  12. Sendag, R., et al.: Low power-area branch prediction using complementary branch predictors. In: IEEE International Symposium on Parallel and Distributed Processing (2008)

    Google Scholar 

  13. Skadron, K., et al.: Branch prediction, instruction-window size, and cache size: Performance tradeoffs and simulation techniques. IEEE Transactions on Computers 48, 1260–1281 (1999)

    Article  Google Scholar 

  14. Sprangle, E., Carmean, D.: Increasing processor performance by implementing deeper pipelines. In: Proceedings of the 29th Annual International Symposium on Computer Architecture, ISCA 2002 (2002)

    Google Scholar 

  15. Thoziyoor, S., et al.: HP Labs: CACTI (2010)

    Google Scholar 

  16. Topham, N., Jones, D.: High speed cpu simulation using jit binary translation. In: The 3rd Annual Workshop on Modeling, Benchmarking and Simulation (2007)

    Google Scholar 

  17. Yeh, T.-Y., Patt, Y.N.: Alternative implementations of two-level adaptive branch prediction. In: Proceedings of the 19th Annual International Symposium on Computer Architecture, ISCA 1992 (1992)

    Google Scholar 

  18. Yeh, T.-Y., Patt, Y.N.: A comparison of dynamic branch predictors that use two levels of branch history. In: Proceedings of the 20th Annual International Symposium on Computer Architecture, ISCA 1993, pp. 257–266 (1993)

    Google Scholar 

  19. Zhang, R., King, W.K., Guo, M.: A hybrid branch prediction scheme - an integration of software and hardware techniques. In: The Mid-Atlantic Student Workshop on Programming languages and Systems (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Andreas Herkersdorf Kay Römer Uwe Brinkschulte

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bielby, M., Gould, M., Topham, N. (2012). Design Space Exploration of Hybrid Ultra Low Power Branch Predictors. In: Herkersdorf, A., Römer, K., Brinkschulte, U. (eds) Architecture of Computing Systems – ARCS 2012. ARCS 2012. Lecture Notes in Computer Science, vol 7179. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28293-5_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28293-5_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28292-8

  • Online ISBN: 978-3-642-28293-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics