The Subset Assignment Problem for Data Placement in Caches

Ghandeharizadeh, Shahram; Irani, Sandy; Lam, Jenny

doi:10.1007/s00453-017-0403-4

The Subset Assignment Problem for Data Placement in Caches

Published: 16 January 2018

Volume 80, pages 2201–2220, (2018)
Cite this article

Algorithmica Aims and scope Submit manuscript

246 Accesses
4 Citations
Explore all metrics

Abstract

We introduce the subset assignment problem in which items of varying sizes are placed in a set of bins with limited capacity. Items can be replicated and placed in any subset of the bins. Each (item, subset) pair has an associated cost. Not assigning an item to any of the bins is not free in general and can potentially be the most expensive option. The goal is to minimize the total cost of assigning items to subsets without exceeding the bin capacities. The subset assignment problem models the problem of managing a cache composed of banks of memory with varying cost/performance specifications. The ability to replicate a data item in more than one memory bank can benefit the overall performance of the system with a faster recovery time in the event of a memory failure. For this setting, the number n of data objects (items) is very large and the number d of memory banks (bins) is a small constant (on the order of 3 or 4). Therefore, the goal is to determine an optimal assignment in time that minimizes dependence on n. The integral version of this problem is NP-hard since it is a generalization of the knapsack problem. We focus on an efficient solution to the LP relaxation as the number of fractionally assigned items will be at most d. If the data objects are small with respect to the size of the memory banks, the effect of excluding the fractionally assigned data items from the cache will be small. We give an algorithm that solves the LP relaxation and runs in time \(O(\left( {\begin{array}{c}3^d\\ d+1\end{array}}\right) {\text {poly}}(d) n \log (n) \log (nC) \log (Z))\), where Z is the maximum item size and C the maximum storage cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network Flows: Theory, Algorithms, and Applications, 1st edn. Prentice Hall. Upper Saddle River (1993). https://doi.org/10.1016/0166-218X(94)90171-6.
Ahuja, R.K., Orlin, J.B., Stein, C., Tarjan, R.E.: Improved algorithms for bipartite network flow. SIAM J. Comput. 23(5), 906–933 (1994). https://doi.org/10.1137/S0097539791199334
Article MathSciNet MATH Google Scholar
Armstrong, T.G., Ponnekanti, V., Borthakur, D., Callaghan, M.: Linkbench: a database benchmark based on the Facebook social graph. In: SIGMOD. ACM (2013). https://doi.org/10.1145/2463676.2465296
Barahmand, S., Ghandeharizadeh, S.: BG: a benchmark to evaluate interactive social networking actions. In: CIDR (2013)
Chekuri, C., Khanna, S.: A PTAS for the multiple knapsack problem. In: SODA, pp. 213–222. ACM (2000). https://doi.org/10.1137/S0097539700382820
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. MIT Press, Cambridge, MA (2009)
MATH Google Scholar
Ghandeharizadeh, S., Irani, S., Lam, J.: Memory hierarchy design for caching middleware in the age of NVM. Technical Report 2015-01, USC Database Laboratory (2015). http://dblab.usc.edu/Users/papers/CacheDesTR2.pdf
Ghandeharizadeh, S., Irani, S., Lam, J., J.Yap: CAMP: a cost adaptive multi-queue eviction policy for key-value stores. Technical Report 2014-07, USC Database Lab (2014). http://dblab.usc.edu/Users/papers/CAMPTR.pdf
Ghandeharizadeh, S., Irani, S., Lam, J., Yap, J.: CAMP: a cost adaptive multi-queue eviction policy for key-value stores. In: Middleware 2014. Springer (2014). https://doi.org/10.1145/2663165.2663317
Gusfield, D., Martel, C., Fernández-Baca, D.: Fast algorithms for bipartite network flow. SIAM J. Comput. 16(2), 237–251 (1987). https://doi.org/10.1137/0216020
Article MathSciNet MATH Google Scholar
Jelenkovic, P., Radovanovic, A.: Asymptotic insensitivity of least-recently-used caching to statistical dependency. In: INFOCOM 2003., vol. 1, pp. 438–447 (2003). https://doi.org/10.1109/INFCOM.2003.1208695
Karmarkar, N.: A new polynomial-time algorithm for linear programming. Combinatorica 4(4), 373–395 (1984). https://doi.org/10.1007/BF02579150
Article MathSciNet MATH Google Scholar
Kellerer, H., Pferschy, U., Pisinger, D.: Knapsack Problems. Springer, New York (2004)
Book MATH Google Scholar
Kim, H., Seshadri, S., Dickey, C.L., Chiu, L.: Evaluating phase change memory for enterprise storage systems: a study of caching and tiering approaches. Trans. Storage 10(4), 15:1–15:21 (2014). https://doi.org/10.1145/2668128
Article Google Scholar
Koufogiannakis, C., Young, N.E.: A nearly linear-time PTAS for explicit fractional packing and covering linear programs. Algorithmica 70(4), 648–674 (2014). https://doi.org/10.1007/s00453-013-9771-6
Article MathSciNet MATH Google Scholar
Martello, S., Toth, P.: Knapsack Problems: Algorithms and Computer Implementations. Wiley, Hoboken (1990)
MATH Google Scholar
Mccormick, S.T., Shioura, A.: Minimum ratio canceling is oracle polynomial for linear programming, but not strongly polynomial, even for networks. Oper. Res. Lett. 27(5), 199–207 (2000)
Article MathSciNet MATH Google Scholar
Nanavati, M., Schwarzkopf, M., Wires, J., Warfield, A.: Non-volatile storage. Commun. ACM 59(1), 56–63 (2015). https://doi.org/10.1145/2814342
Article Google Scholar
Nishtala, R., Fugal, H., Grimm, S., Kwiatkowski, M., Lee, H., Li, H.C., McElroy, R., Paleczny, M., Peek, D., Saab, P., et al.: Scaling memcache at facebook. NSDI 13, 385–398 (2013)
Google Scholar
Starobinski, D., Tse, D.: Probabilistic methods for web caching. Perform. Eval. 46(23), 125–137 (2001). https://doi.org/10.1016/S0166-5316(01)00045-1. Advanced Performance Modeling
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Southern California, Los Angeles, CA, 90089, USA
Shahram Ghandeharizadeh
Department of Computer Science, University of California, Irvine, CA, 92697, USA
Sandy Irani
Department of Computer Science, San José State University, San Jose, CA, 95192, USA
Jenny Lam

Authors

Shahram Ghandeharizadeh
View author publications
You can also search for this author in PubMed Google Scholar
Sandy Irani
View author publications
You can also search for this author in PubMed Google Scholar
Jenny Lam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sandy Irani.

Additional information

Sandy Irani and Jenny Lam were supported in part by NSF Grant CCF-0916181.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ghandeharizadeh, S., Irani, S. & Lam, J. The Subset Assignment Problem for Data Placement in Caches. Algorithmica 80, 2201–2220 (2018). https://doi.org/10.1007/s00453-017-0403-4

Download citation

Received: 24 January 2017
Accepted: 30 December 2017
Published: 16 January 2018
Issue Date: July 2018
DOI: https://doi.org/10.1007/s00453-017-0403-4

Keywords

Mathematics Subject Classification

68W40

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Subset Assignment Problem for Data Placement in Caches

Abstract

Access this article

Similar content being viewed by others

Algorithms for Scheduling Deadline-Sensitive Malleable Tasks

Computational complexity and algorithms for two scheduling problems under linear constraints

LSM-based storage techniques: a survey

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

The Subset Assignment Problem for Data Placement in Caches

Abstract

Access this article

Similar content being viewed by others

Algorithms for Scheduling Deadline-Sensitive Malleable Tasks

Computational complexity and algorithms for two scheduling problems under linear constraints

LSM-based storage techniques: a survey

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation