Skip to main content
Log in

Combating I-O bottleneck using prefetching: model, algorithms, and ramifications

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Multiple memory models have been proposed to capture the effects of memory hierarchy culminating in the I-O model of Aggarwal and Vitter (Commun. ACM 31(9):1116–1127, [1988]). More than a decade of architectural advancements have led to new features that are not captured in the I-O model—most notably the prefetching capability. We propose a relatively simple Prefetch model that incorporates data prefetching in the traditional I-O models and show how to design optimal algorithms that can attain close to peak memory bandwidth. Unlike (the inverse of) memory latency, the memory bandwidth is much closer to the processing speed, thereby, intelligent use of prefetching can considerably mitigate the I-O bottleneck. For some fundamental problems, our algorithms attain running times approaching that of the idealized random access machines under reasonable assumptions. Our work also explains more precisely the significantly superior performance of the I-O efficient algorithms in systems that support prefetching compared to ones that do not.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Adiga NR et al (2002) An overview of the bluegene/l supercomputer. In: Proceedings of supercomputing (SC)

  2. Aggarwal A, Vitter J (1988) The input/output complexity of sorting and related problems. Commun ACM 31(9):1116–1127

    Article  MathSciNet  Google Scholar 

  3. Aggarwal A, Alpern B, Chandra A, Snir M (1987) A model for hierarchical memory. In: Proceedings of ACM symposium on theory of computing

  4. Aggarwal A, Chandra A, Snir M (1987) Hierarchical memory with block transfer. In: Proceedings of IEEE foundations of computer science, pp 204–216

  5. Alpern B, Carter L, Feig E, Selker T (1994) The uniform memory hierarchy model of computation. Algorithmica 12(2):72–109

    Article  MATH  MathSciNet  Google Scholar 

  6. Brodal GS, Fagerberg R (2003) On the limits of cache-obliviousness. In: Proceedings of STOC, pp 307–315

  7. Chaudhry G, Cormen TH (2002) Getting more for out-of-core columnsort. In: Proceedings of ALENEX

  8. Chen T, Baer J (1995) Effective hardware-based data prefetching for high-performance processors. IEEE Trans Comput 44(5):609–623

    Article  MATH  Google Scholar 

  9. Cormen TH, Sundquist T, Wisniewski LF (1999) Asymptotically tight bounds for performing BMMC permutations on parallel disk systems. SIAM J Comput 28(1):105–136

    Article  MathSciNet  Google Scholar 

  10. Dementiev R, Sanders P (2003) Asynchronous parallel disk sorting. In: Proceedings of SPAA

  11. Floyd R (1972) Permuting information in idealized two-level storage. In: Complexity of computer computations, pp 105–109

  12. Frigo M, Leiserson CE, Prokop H, Ramachandran S (1999) Cache-oblivious algorithms. In: Proceedings of FOCS

  13. Hong J-W, Kung HT (1981) I/O complexity: the red–blue pebble game. In: Proceedings of the 13th symposium on the theory of computing, May 1981

  14. Iyer S, Druschel P (2001) Anticipatory scheduling: a disk scheduling framework to overcome deceptive idleness in synchronous i/o. In: Proceedings of SOSP

  15. Kallahalla M, Varman PJ (1999) Optimal read-once parallel disk scheduling. In: Proceedings of IOPADS, pp 68–77

  16. Lund K, Goebel V (2003) Adaptive disk scheduling in a multimedia DBMS. In: Proceedings of ACM multimedia

  17. Meyer U, Zeh N (2003) I-o efficient undirected shortest paths. In: Proceedings of ESA, pp 434–445

  18. Nesbit KJ, Smith JE (2004) Data cache prefetching using a global history buffer. In: Proceedings of HPCA, pp 96–105

  19. Sanders P (1999) Accessing multiple sequences through set associative caches. In: Proceedings of ICALP. A more recent version by Mehlhorn and Sanders was communicated to the authors in Dec 1999

  20. Sen S, Chatterjee S, Dumir N (2002) Towards a theory of cache-efficient algorithms. J ACM

  21. Vishkin U (1996) Can parallel algorithms enhance serial implementation? Commun ACM

  22. Vitter J, Shriver E (1994) Algorithms for parallel memory I: two-level memories. Algorithmica 12(2):110–147

    Article  MATH  MathSciNet  Google Scholar 

  23. Worthington B, Ganger G, Patt Y The disksim simulation environment (version 2.0). In: Available at http://www.ece.cmu.edu/ganger/disksim/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Akshat Verma.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Verma, A., Sen, S. Combating I-O bottleneck using prefetching: model, algorithms, and ramifications. J Supercomput 45, 205–235 (2008). https://doi.org/10.1007/s11227-007-0170-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-007-0170-0

Keywords

Navigation