skip to main content
10.1145/2213836.2213862acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

bLSM: a general purpose log structured merge tree

Published:20 May 2012Publication History

ABSTRACT

Data management workloads are increasingly write-intensive and subject to strict latency SLAs. This presents a dilemma: Update in place systems have unmatched latency but poor write throughput. In contrast, existing log structured techniques improve write throughput but sacrifice read performance and exhibit unacceptable latency spikes.

We begin by presenting a new performance metric: read fanout, and argue that, with read and write amplification, it better characterizes real-world indexes than approaches such as asymptotic analysis and price/performance.

We then present bLSM, a Log Structured Merge (LSM) tree with the advantages of B-Trees and log structured approaches: (1) Unlike existing log structured trees, bLSM has near-optimal read and scan performance, and (2) its new "spring and gear" merge scheduler bounds write latency without impacting throughput or allowing merges to block writes for extended periods of time. It does this by ensuring merges at each level of the tree make steady progress without resorting to techniques that degrade read performance.

We use Bloom filters to improve index performance, and find a number of subtleties arise. First, we ensure reads can stop after finding one version of a record. Otherwise, frequently written items would incur multiple B-Tree lookups. Second, many applications check for existing values at insert. Avoiding the seek performed by the check is crucial.

References

  1. http://hbase.apache.org/.Google ScholarGoogle Scholar
  2. https://launchpad.net/pbxt.Google ScholarGoogle Scholar
  3. http://wiki.basho.com/.Google ScholarGoogle Scholar
  4. M. Bender, M. Farach-Colton, R. Johnson, B. Kuszmaul, D. Medjedovic, P. Montes, P. Shetty, R. Spillane, and E. Zadok. Don't thrash: How to cache your hash on flash. In HotStorage, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. A. Bender, M. Farach-Colton, J. T. Fineman, Y. R. Fogel, B. C. Kuszmaul, and J. Nelson. Cache-oblivious streaming B-trees. In SPAA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Borthakur, J. Gray, J. Sarma, K. Muthukkaruppan, N. Spiegelberg, H. Kuang, K. Ranganathan, D. Molkov, A. Menon, S. Rash, et al. Apache Hadoop goes realtime at FaceBook. In Sigmod, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Callaghan. Read amplification factor. High Availability MySQL, August 2011.Google ScholarGoogle Scholar
  8. F. Chang et al. Bigtable: A distributed storage system for structured data. In OSDI, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Chen, C. Douglas, M. Mutsuzaki, P. Quaid, R. Ramakrishnan, S. Rao, and R. Sears. Walnut: A unified cloud object store. In Sigmod, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B. F. Cooper et al. PNUTS: Yahoo!'s hosted data serving platform. Proc. VLDB Endow., 1(2), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with YCSB. SoCC '10, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Dean and S. Ghemawat. LevelDB. Google, http://leveldb.googlecode.com.Google ScholarGoogle Scholar
  13. J. Ellis. The present and future of Apache Cassandra. In HPTS, 2011.Google ScholarGoogle Scholar
  14. S. Ghemawat, H. Gobioff, and S. T. Leung. The Google file system. In SOSP, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Gray and G. Graefe. The five-minute rule ten years later, and other computer storage rules of thumb. SIGMOD Record, 26(4), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Jermaine, E. Omiecinski, and W. G. Yee. The partitioned exponential file for database storage management. The VLDB Journal, 16(4), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kirsch and Mitzenmacher. Less hashing, same performance: Building a better bloom filter. In ESA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. C. Kuszmaul. How TokuDB fractal trees indexes work. Technical report, TokuTek, 2010.Google ScholarGoogle Scholar
  19. A. Lakshman and P. Malik. Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev., 44(2), April 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Li, B. He, J. Y. 0001, Q. Luo, and K. Yi. Tree indexing on solid state drives. PVLDB, 3(1):1195--1206, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. Lim, B. Fan, D. Andersen, and M. Kaminsky. Silt: a memory-efficient, high-performance key-value store. In SOSP, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Moshayedi and P. Wilkison. Enterprise SSDs. ACM Queue, 6, July 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. Muth, P. O'Neil, A. Pick, and G. Weikum. The LHAM Log-structured history data access method. In VLDB Journal, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Nath and A. Kansal. FlashDB: Dynamic self-tuning database for NAND flash. In IPSN, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. P. O'Neil, E. Cheng, D. Gawlick, and E. O'Neil. The log-structured merge-tree (LSM-tree). Acta Informatica, 33(4):351--385, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Oracle. Berkeley DB Java Edition.Google ScholarGoogle Scholar
  27. C. Rev, D. Hitz, J. Lau, and M. Malcolm. File system design for an NFS file server appliance, 1995.Google ScholarGoogle Scholar
  28. M. Rosenblum and J. K. Ousterhout. The design and implementation of a log-structured file system. In SOSP, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. Sears and E. Brewer. Stasis: Flexible transactional storage. In OSDI, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. Sears and E. Brewer. Segment-based recovery: Write-ahead logging revisited. In VLDB, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. Sears, M. Callaghan, and E. Brewer. Rose: Compressed, log-structured replication. VLDB, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. Seltzer, K. A. Smith, H. Balakrishnan, J. Chang, S. McMains, and V. Padmanabhan. File system logging versus clustering: A performance comparison. In Usenix Annual Technical Conference, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Sheehy and D. Smith. Bitcask, a log-structured hash table for fast key/value data. Technical report, Basho, 2010.Google ScholarGoogle Scholar
  34. R. Spillane, P. Shetty, E. Zadok, S. Archak, and S. Dixit. An efficient multi-tier tablet server storage architecture. In SoCC, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. D. Zeinalipour-Yazti, S. Lin, V. Kalogeraki, D. Gunopulos, and W. Najjar. MicroHash: An efficient index structure for fash-based sensor devices. In FAST, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. bLSM: a general purpose log structured merge tree

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
        May 2012
        886 pages
        ISBN:9781450312479
        DOI:10.1145/2213836

        Copyright © 2012 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 20 May 2012

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        SIGMOD '12 Paper Acceptance Rate48of289submissions,17%Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader