research-article

bLSM: a general purpose log structured merge tree

Authors:
Russell Sears

Yahoo!, Santa Clara, CA, USA

Yahoo!, Santa Clara, CA, USA
View Profile

,
Raghu Ramakrishnan

Yahoo!, Santa Clara, CA, USA

Yahoo!, Santa Clara, CA, USA
View Profile

SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of DataMay 2012Pages 217–228https://doi.org/10.1145/2213836.2213862

Published:20 May 2012Publication History

SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data

Pages 217–228

ABSTRACT

Data management workloads are increasingly write-intensive and subject to strict latency SLAs. This presents a dilemma: Update in place systems have unmatched latency but poor write throughput. In contrast, existing log structured techniques improve write throughput but sacrifice read performance and exhibit unacceptable latency spikes.

We begin by presenting a new performance metric: read fanout, and argue that, with read and write amplification, it better characterizes real-world indexes than approaches such as asymptotic analysis and price/performance.

We then present bLSM, a Log Structured Merge (LSM) tree with the advantages of B-Trees and log structured approaches: (1) Unlike existing log structured trees, bLSM has near-optimal read and scan performance, and (2) its new "spring and gear" merge scheduler bounds write latency without impacting throughput or allowing merges to block writes for extended periods of time. It does this by ensuring merges at each level of the tree make steady progress without resorting to techniques that degrade read performance.

We use Bloom filters to improve index performance, and find a number of subtleties arise. First, we ensure reads can stop after finding one version of a record. Otherwise, frequently written items would incur multiple B-Tree lookups. Second, many applications check for existing values at insert. Avoiding the seek performed by the check is crucial.

References

http://hbase.apache.org/.Google Scholar
https://launchpad.net/pbxt.Google Scholar
http://wiki.basho.com/.Google Scholar
M. Bender, M. Farach-Colton, R. Johnson, B. Kuszmaul, D. Medjedovic, P. Montes, P. Shetty, R. Spillane, and E. Zadok. Don't thrash: How to cache your hash on flash. In HotStorage, 2011. Google ScholarDigital Library
M. A. Bender, M. Farach-Colton, J. T. Fineman, Y. R. Fogel, B. C. Kuszmaul, and J. Nelson. Cache-oblivious streaming B-trees. In SPAA, 2007. Google ScholarDigital Library
D. Borthakur, J. Gray, J. Sarma, K. Muthukkaruppan, N. Spiegelberg, H. Kuang, K. Ranganathan, D. Molkov, A. Menon, S. Rash, et al. Apache Hadoop goes realtime at FaceBook. In Sigmod, 2011. Google ScholarDigital Library
M. Callaghan. Read amplification factor. High Availability MySQL, August 2011.Google Scholar
F. Chang et al. Bigtable: A distributed storage system for structured data. In OSDI, 2006. Google ScholarDigital Library
J. Chen, C. Douglas, M. Mutsuzaki, P. Quaid, R. Ramakrishnan, S. Rao, and R. Sears. Walnut: A unified cloud object store. In Sigmod, 2012. Google ScholarDigital Library
B. F. Cooper et al. PNUTS: Yahoo!'s hosted data serving platform. Proc. VLDB Endow., 1(2), 2008. Google ScholarDigital Library
B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with YCSB. SoCC '10, 2010. Google ScholarDigital Library
J. Dean and S. Ghemawat. LevelDB. Google, http://leveldb.googlecode.com.Google Scholar
J. Ellis. The present and future of Apache Cassandra. In HPTS, 2011.Google Scholar
S. Ghemawat, H. Gobioff, and S. T. Leung. The Google file system. In SOSP, 2003. Google ScholarDigital Library
J. Gray and G. Graefe. The five-minute rule ten years later, and other computer storage rules of thumb. SIGMOD Record, 26(4), 1997. Google ScholarDigital Library
C. Jermaine, E. Omiecinski, and W. G. Yee. The partitioned exponential file for database storage management. The VLDB Journal, 16(4), 2007. Google ScholarDigital Library
Kirsch and Mitzenmacher. Less hashing, same performance: Building a better bloom filter. In ESA, 2006. Google ScholarDigital Library
B. C. Kuszmaul. How TokuDB fractal trees indexes work. Technical report, TokuTek, 2010.Google Scholar
A. Lakshman and P. Malik. Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev., 44(2), April 2010. Google ScholarDigital Library
Y. Li, B. He, J. Y. 0001, Q. Luo, and K. Yi. Tree indexing on solid state drives. PVLDB, 3(1):1195--1206, 2010. Google ScholarDigital Library
H. Lim, B. Fan, D. Andersen, and M. Kaminsky. Silt: a memory-efficient, high-performance key-value store. In SOSP, 2011. Google ScholarDigital Library
M. Moshayedi and P. Wilkison. Enterprise SSDs. ACM Queue, 6, July 2008. Google ScholarDigital Library
P. Muth, P. O'Neil, A. Pick, and G. Weikum. The LHAM Log-structured history data access method. In VLDB Journal, 2000. Google ScholarDigital Library
S. Nath and A. Kansal. FlashDB: Dynamic self-tuning database for NAND flash. In IPSN, 2007. Google ScholarDigital Library
P. O'Neil, E. Cheng, D. Gawlick, and E. O'Neil. The log-structured merge-tree (LSM-tree). Acta Informatica, 33(4):351--385, 1996. Google ScholarDigital Library
Oracle. Berkeley DB Java Edition.Google Scholar
C. Rev, D. Hitz, J. Lau, and M. Malcolm. File system design for an NFS file server appliance, 1995.Google Scholar
M. Rosenblum and J. K. Ousterhout. The design and implementation of a log-structured file system. In SOSP, 1992. Google ScholarDigital Library
R. Sears and E. Brewer. Stasis: Flexible transactional storage. In OSDI, 2006. Google ScholarDigital Library
R. Sears and E. Brewer. Segment-based recovery: Write-ahead logging revisited. In VLDB, 2009. Google ScholarDigital Library
R. Sears, M. Callaghan, and E. Brewer. Rose: Compressed, log-structured replication. VLDB, 2008. Google ScholarDigital Library
M. Seltzer, K. A. Smith, H. Balakrishnan, J. Chang, S. McMains, and V. Padmanabhan. File system logging versus clustering: A performance comparison. In Usenix Annual Technical Conference, 1995. Google ScholarDigital Library
J. Sheehy and D. Smith. Bitcask, a log-structured hash table for fast key/value data. Technical report, Basho, 2010.Google Scholar
R. Spillane, P. Shetty, E. Zadok, S. Archak, and S. Dixit. An efficient multi-tier tablet server storage architecture. In SoCC, 2011. Google ScholarDigital Library
D. Zeinalipour-Yazti, S. Lin, V. Kalogeraki, D. Gunopulos, and W. Najjar. MicroHash: An efficient index structure for fash-based sensor devices. In FAST, 2005. Google ScholarDigital Library

Index Terms

bLSM: a general purpose log structured merge tree
1. Information systems
  1. Information storage systems
    1. Record storage systems
      1. Record storage alternatives
        Hashed file organization
        Indexed file organization

Recommendations

Building Efficient Key-Value Stores via a Lightweight Compaction Tree
Special Issue on MSST 2017 and Regular Papers

Log-Structure Merge tree (LSM-tree) has been one of the mainstream indexes in key-value systems supporting a variety of write-intensive Internet applications in today’s data centers. However, the performance of LSM-tree is seriously hampered by ...
Read More
MDCF: Multiple Dynamic Cuckoo Filters for LSM-Tree
Algorithms and Architectures for Parallel Processing
Abstract
As a write-optimized data structure, the Log-Structured Merge-tree (LSM-tree) based storage engine, which maintains data in a leveled structure on disk, is widely used in Key-Value (KV) storage systems. Meanwhile, the leveled design also makes it ... $_{}$ $_{}$
Read More
dCompaction: Delayed Compaction for the LSM-Tree

Key-value (KV) stores have become a backbone of large-scale applications in today's data centers. Write-optimized data structures like the Log-Structured Merge-tree (LSM-tree) and their variants are widely used in KV storage systems like BigTable and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
May 2012
886 pages
ISBN:9781450312479
DOI:10.1145/2213836
General Chairs:
K. Selçuk Candan
Arizona State University
,
Yi Chen
Arizona State University
,
Richard Snodgrass
University of Arizona
,
Program Chair:
Luis Gravano
Columbia University
,
Publications Chair:
Ariel Fuxman
Microsoft Research
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 May 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
log structured merge tree
merge scheduling
read amplification
read fanout
write amplification
Qualifiers
- research-article
Conference

Acceptance Rates
SIGMOD '12 Paper Acceptance Rate48of289submissions,17%Overall Acceptance Rate785of4,003submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 225
  Total Citations
  View Citations
- 3,231
  Total Downloads
- Downloads (Last 12 months)187
- Downloads (Last 6 weeks)28
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

bLSM: a general purpose log structured merge tree

SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Building Efficient Key-Value Stores via a Lightweight Compaction Tree

MDCF: Multiple Dynamic Cuckoo Filters for LSM-Tree

dCompaction: Delayed Compaction for the LSM-Tree

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

bLSM: a general purpose log structured merge tree

SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Building Efficient Key-Value Stores via a Lightweight Compaction Tree

MDCF: Multiple Dynamic Cuckoo Filters for LSM-Tree

dCompaction: Delayed Compaction for the LSM-Tree

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media