Abstract
Multiple memory models have been proposed to capture the effects of memory hierarchy culminating in the I-O model of Aggarwal and Vitter (Commun. ACM 31(9):1116–1127, [1988]). More than a decade of architectural advancements have led to new features that are not captured in the I-O model—most notably the prefetching capability. We propose a relatively simple Prefetch model that incorporates data prefetching in the traditional I-O models and show how to design optimal algorithms that can attain close to peak memory bandwidth. Unlike (the inverse of) memory latency, the memory bandwidth is much closer to the processing speed, thereby, intelligent use of prefetching can considerably mitigate the I-O bottleneck. For some fundamental problems, our algorithms attain running times approaching that of the idealized random access machines under reasonable assumptions. Our work also explains more precisely the significantly superior performance of the I-O efficient algorithms in systems that support prefetching compared to ones that do not.
Similar content being viewed by others
References
Adiga NR et al (2002) An overview of the bluegene/l supercomputer. In: Proceedings of supercomputing (SC)
Aggarwal A, Vitter J (1988) The input/output complexity of sorting and related problems. Commun ACM 31(9):1116–1127
Aggarwal A, Alpern B, Chandra A, Snir M (1987) A model for hierarchical memory. In: Proceedings of ACM symposium on theory of computing
Aggarwal A, Chandra A, Snir M (1987) Hierarchical memory with block transfer. In: Proceedings of IEEE foundations of computer science, pp 204–216
Alpern B, Carter L, Feig E, Selker T (1994) The uniform memory hierarchy model of computation. Algorithmica 12(2):72–109
Brodal GS, Fagerberg R (2003) On the limits of cache-obliviousness. In: Proceedings of STOC, pp 307–315
Chaudhry G, Cormen TH (2002) Getting more for out-of-core columnsort. In: Proceedings of ALENEX
Chen T, Baer J (1995) Effective hardware-based data prefetching for high-performance processors. IEEE Trans Comput 44(5):609–623
Cormen TH, Sundquist T, Wisniewski LF (1999) Asymptotically tight bounds for performing BMMC permutations on parallel disk systems. SIAM J Comput 28(1):105–136
Dementiev R, Sanders P (2003) Asynchronous parallel disk sorting. In: Proceedings of SPAA
Floyd R (1972) Permuting information in idealized two-level storage. In: Complexity of computer computations, pp 105–109
Frigo M, Leiserson CE, Prokop H, Ramachandran S (1999) Cache-oblivious algorithms. In: Proceedings of FOCS
Hong J-W, Kung HT (1981) I/O complexity: the red–blue pebble game. In: Proceedings of the 13th symposium on the theory of computing, May 1981
Iyer S, Druschel P (2001) Anticipatory scheduling: a disk scheduling framework to overcome deceptive idleness in synchronous i/o. In: Proceedings of SOSP
Kallahalla M, Varman PJ (1999) Optimal read-once parallel disk scheduling. In: Proceedings of IOPADS, pp 68–77
Lund K, Goebel V (2003) Adaptive disk scheduling in a multimedia DBMS. In: Proceedings of ACM multimedia
Meyer U, Zeh N (2003) I-o efficient undirected shortest paths. In: Proceedings of ESA, pp 434–445
Nesbit KJ, Smith JE (2004) Data cache prefetching using a global history buffer. In: Proceedings of HPCA, pp 96–105
Sanders P (1999) Accessing multiple sequences through set associative caches. In: Proceedings of ICALP. A more recent version by Mehlhorn and Sanders was communicated to the authors in Dec 1999
Sen S, Chatterjee S, Dumir N (2002) Towards a theory of cache-efficient algorithms. J ACM
Vishkin U (1996) Can parallel algorithms enhance serial implementation? Commun ACM
Vitter J, Shriver E (1994) Algorithms for parallel memory I: two-level memories. Algorithmica 12(2):110–147
Worthington B, Ganger G, Patt Y The disksim simulation environment (version 2.0). In: Available at http://www.ece.cmu.edu/ganger/disksim/
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Verma, A., Sen, S. Combating I-O bottleneck using prefetching: model, algorithms, and ramifications. J Supercomput 45, 205–235 (2008). https://doi.org/10.1007/s11227-007-0170-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-007-0170-0