Skip to main content
Log in

The Subset Assignment Problem for Data Placement in Caches

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

We introduce the subset assignment problem in which items of varying sizes are placed in a set of bins with limited capacity. Items can be replicated and placed in any subset of the bins. Each (item, subset) pair has an associated cost. Not assigning an item to any of the bins is not free in general and can potentially be the most expensive option. The goal is to minimize the total cost of assigning items to subsets without exceeding the bin capacities. The subset assignment problem models the problem of managing a cache composed of banks of memory with varying cost/performance specifications. The ability to replicate a data item in more than one memory bank can benefit the overall performance of the system with a faster recovery time in the event of a memory failure. For this setting, the number n of data objects (items) is very large and the number d of memory banks (bins) is a small constant (on the order of 3 or 4). Therefore, the goal is to determine an optimal assignment in time that minimizes dependence on n. The integral version of this problem is NP-hard since it is a generalization of the knapsack problem. We focus on an efficient solution to the LP relaxation as the number of fractionally assigned items will be at most d. If the data objects are small with respect to the size of the memory banks, the effect of excluding the fractionally assigned data items from the cache will be small. We give an algorithm that solves the LP relaxation and runs in time \(O(\left( {\begin{array}{c}3^d\\ d+1\end{array}}\right) {\text {poly}}(d) n \log (n) \log (nC) \log (Z))\), where Z is the maximum item size and C the maximum storage cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network Flows: Theory, Algorithms, and Applications, 1st edn. Prentice Hall. Upper Saddle River (1993). https://doi.org/10.1016/0166-218X(94)90171-6.

  2. Ahuja, R.K., Orlin, J.B., Stein, C., Tarjan, R.E.: Improved algorithms for bipartite network flow. SIAM J. Comput. 23(5), 906–933 (1994). https://doi.org/10.1137/S0097539791199334

    Article  MathSciNet  MATH  Google Scholar 

  3. Armstrong, T.G., Ponnekanti, V., Borthakur, D., Callaghan, M.: Linkbench: a database benchmark based on the Facebook social graph. In: SIGMOD. ACM (2013). https://doi.org/10.1145/2463676.2465296

  4. Barahmand, S., Ghandeharizadeh, S.: BG: a benchmark to evaluate interactive social networking actions. In: CIDR (2013)

  5. Chekuri, C., Khanna, S.: A PTAS for the multiple knapsack problem. In: SODA, pp. 213–222. ACM (2000). https://doi.org/10.1137/S0097539700382820

  6. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. MIT Press, Cambridge, MA (2009)

    MATH  Google Scholar 

  7. Ghandeharizadeh, S., Irani, S., Lam, J.: Memory hierarchy design for caching middleware in the age of NVM. Technical Report 2015-01, USC Database Laboratory (2015). http://dblab.usc.edu/Users/papers/CacheDesTR2.pdf

  8. Ghandeharizadeh, S., Irani, S., Lam, J., J.Yap: CAMP: a cost adaptive multi-queue eviction policy for key-value stores. Technical Report 2014-07, USC Database Lab (2014). http://dblab.usc.edu/Users/papers/CAMPTR.pdf

  9. Ghandeharizadeh, S., Irani, S., Lam, J., Yap, J.: CAMP: a cost adaptive multi-queue eviction policy for key-value stores. In: Middleware 2014. Springer (2014). https://doi.org/10.1145/2663165.2663317

  10. Gusfield, D., Martel, C., Fernández-Baca, D.: Fast algorithms for bipartite network flow. SIAM J. Comput. 16(2), 237–251 (1987). https://doi.org/10.1137/0216020

    Article  MathSciNet  MATH  Google Scholar 

  11. Jelenkovic, P., Radovanovic, A.: Asymptotic insensitivity of least-recently-used caching to statistical dependency. In: INFOCOM 2003., vol. 1, pp. 438–447 (2003). https://doi.org/10.1109/INFCOM.2003.1208695

  12. Karmarkar, N.: A new polynomial-time algorithm for linear programming. Combinatorica 4(4), 373–395 (1984). https://doi.org/10.1007/BF02579150

    Article  MathSciNet  MATH  Google Scholar 

  13. Kellerer, H., Pferschy, U., Pisinger, D.: Knapsack Problems. Springer, New York (2004)

    Book  MATH  Google Scholar 

  14. Kim, H., Seshadri, S., Dickey, C.L., Chiu, L.: Evaluating phase change memory for enterprise storage systems: a study of caching and tiering approaches. Trans. Storage 10(4), 15:1–15:21 (2014). https://doi.org/10.1145/2668128

    Article  Google Scholar 

  15. Koufogiannakis, C., Young, N.E.: A nearly linear-time PTAS for explicit fractional packing and covering linear programs. Algorithmica 70(4), 648–674 (2014). https://doi.org/10.1007/s00453-013-9771-6

    Article  MathSciNet  MATH  Google Scholar 

  16. Martello, S., Toth, P.: Knapsack Problems: Algorithms and Computer Implementations. Wiley, Hoboken (1990)

    MATH  Google Scholar 

  17. Mccormick, S.T., Shioura, A.: Minimum ratio canceling is oracle polynomial for linear programming, but not strongly polynomial, even for networks. Oper. Res. Lett. 27(5), 199–207 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  18. Nanavati, M., Schwarzkopf, M., Wires, J., Warfield, A.: Non-volatile storage. Commun. ACM 59(1), 56–63 (2015). https://doi.org/10.1145/2814342

    Article  Google Scholar 

  19. Nishtala, R., Fugal, H., Grimm, S., Kwiatkowski, M., Lee, H., Li, H.C., McElroy, R., Paleczny, M., Peek, D., Saab, P., et al.: Scaling memcache at facebook. NSDI 13, 385–398 (2013)

    Google Scholar 

  20. Starobinski, D., Tse, D.: Probabilistic methods for web caching. Perform. Eval. 46(23), 125–137 (2001). https://doi.org/10.1016/S0166-5316(01)00045-1. Advanced Performance Modeling

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sandy Irani.

Additional information

Sandy Irani and Jenny Lam were supported in part by NSF Grant CCF-0916181.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ghandeharizadeh, S., Irani, S. & Lam, J. The Subset Assignment Problem for Data Placement in Caches. Algorithmica 80, 2201–2220 (2018). https://doi.org/10.1007/s00453-017-0403-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-017-0403-4

Keywords

Mathematics Subject Classification

Navigation