Skip to main content
Log in

Efficiently extendible mappings for balanced data distribution

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

In data storage applications, a large collection of consecutively numbered data “buckets” are often mapped to a relatively small collection of consecutively numbered storage “bins.” For example, in parallel database applications, buckets correspond to hash buckets of data and bins correspond to database nodes. In disk array applications, buckets correspond to logical tracks and bins correspond to physical disks in an array. Measures of the “goodness” of a mapping method include:

  1. (1)

    Thetime (number of operations) needed to compute the mapping.

  2. (2)

    Thestorage needed to store a representation of the mapping.

  3. (3)

    Thebalance of the mapping, i.e., the extent to which all bins receive the same number of buckets.

  4. (4)

    The cost ofrelocation, that is, the number of buckets that must be relocated to a new bin if a new mapping is needed due to an expansion of the number of bins or the number of buckets.

One contribution of this paper is to give a new mapping method, theInterval-Round-Robin (IRR) method. The IRR method has optimal balance and relocation cost, and its time complexity and storage requirements compare favorably with known methods. Specifically, ifm is the number of times that the number of bins and/or buckets has increased, then the time complexity isO(logm) and the storage isO(m 2). Another contribution of the paper is to identify the concept of ahistory-independent mapping, meaning informally that the mapping does not “remember” the past history of expansions to the number of buckets and bins, but only the current number of buckets and bins. Thus, such mappings require very little information to be stored. Assuming that balance and relocation are optimal, we prove that history-independent mappings are possible if the number of buckets is fixed (so only the number of bins can increase), but not possible if the number of bins and buckets can both increase.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. G. M. Adel'son-Vel'skii and E. M. Landis, An algorithm for the organization of information,Dokl. Akad. Nauk SSSR,146 (1962), 263–266; English translation:Soviet Math. Dokl.,3 (1962), 1259–1263.

    Google Scholar 

  2. D. M. Choy, A growth-oriented scheme to distribute objects to multiple storage locations, to appear.

  3. D. J. DeWitt, S. Ghandeharizadeh, D. A. Schneider, A. Bricker, H. Hsiao, and R. Rasmussen, The Gamma database machine project,IEEE Trans. Knowledge Data Engrg.,2 (1990), 44–62.

    Article  Google Scholar 

  4. IBM,IBM 3514Quick Reference Manual, Publication SA21-9613, 1993.

  5. D. E. Knuth,The Art of Computer Programming, Vol. 3, Addison-Wesley, Reading, MA, 1973.

    Google Scholar 

  6. D. A. Patterson, G. Gibson, and R. H. Katz, A case for redundant arrays of inexpensive disks (RAID),Proceedings of the ACM SIGMOD International Conference on Management of Data, 1988, pp. 109–116.

  7. Teradata,DBC/1012Database Computer System Manual Release 2.0, Document C10-0001-02, Teradata Corp., Nov. 1985.

Download references

Author information

Authors and Affiliations

Authors

Additional information

Communicated by C. K. Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Choy, D.M., Fagin, R. & Stockmeyer, L. Efficiently extendible mappings for balanced data distribution. Algorithmica 16, 215–232 (1996). https://doi.org/10.1007/BF01940647

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01940647

Key words

Navigation