ABSTRACT
Data management workloads are increasingly write-intensive and subject to strict latency SLAs. This presents a dilemma: Update in place systems have unmatched latency but poor write throughput. In contrast, existing log structured techniques improve write throughput but sacrifice read performance and exhibit unacceptable latency spikes.
We begin by presenting a new performance metric: read fanout, and argue that, with read and write amplification, it better characterizes real-world indexes than approaches such as asymptotic analysis and price/performance.
We then present bLSM, a Log Structured Merge (LSM) tree with the advantages of B-Trees and log structured approaches: (1) Unlike existing log structured trees, bLSM has near-optimal read and scan performance, and (2) its new "spring and gear" merge scheduler bounds write latency without impacting throughput or allowing merges to block writes for extended periods of time. It does this by ensuring merges at each level of the tree make steady progress without resorting to techniques that degrade read performance.
We use Bloom filters to improve index performance, and find a number of subtleties arise. First, we ensure reads can stop after finding one version of a record. Otherwise, frequently written items would incur multiple B-Tree lookups. Second, many applications check for existing values at insert. Avoiding the seek performed by the check is crucial.
- http://hbase.apache.org/.Google Scholar
- https://launchpad.net/pbxt.Google Scholar
- http://wiki.basho.com/.Google Scholar
- M. Bender, M. Farach-Colton, R. Johnson, B. Kuszmaul, D. Medjedovic, P. Montes, P. Shetty, R. Spillane, and E. Zadok. Don't thrash: How to cache your hash on flash. In HotStorage, 2011. Google ScholarDigital Library
- M. A. Bender, M. Farach-Colton, J. T. Fineman, Y. R. Fogel, B. C. Kuszmaul, and J. Nelson. Cache-oblivious streaming B-trees. In SPAA, 2007. Google ScholarDigital Library
- D. Borthakur, J. Gray, J. Sarma, K. Muthukkaruppan, N. Spiegelberg, H. Kuang, K. Ranganathan, D. Molkov, A. Menon, S. Rash, et al. Apache Hadoop goes realtime at FaceBook. In Sigmod, 2011. Google ScholarDigital Library
- M. Callaghan. Read amplification factor. High Availability MySQL, August 2011.Google Scholar
- F. Chang et al. Bigtable: A distributed storage system for structured data. In OSDI, 2006. Google ScholarDigital Library
- J. Chen, C. Douglas, M. Mutsuzaki, P. Quaid, R. Ramakrishnan, S. Rao, and R. Sears. Walnut: A unified cloud object store. In Sigmod, 2012. Google ScholarDigital Library
- B. F. Cooper et al. PNUTS: Yahoo!'s hosted data serving platform. Proc. VLDB Endow., 1(2), 2008. Google ScholarDigital Library
- B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with YCSB. SoCC '10, 2010. Google ScholarDigital Library
- J. Dean and S. Ghemawat. LevelDB. Google, http://leveldb.googlecode.com.Google Scholar
- J. Ellis. The present and future of Apache Cassandra. In HPTS, 2011.Google Scholar
- S. Ghemawat, H. Gobioff, and S. T. Leung. The Google file system. In SOSP, 2003. Google ScholarDigital Library
- J. Gray and G. Graefe. The five-minute rule ten years later, and other computer storage rules of thumb. SIGMOD Record, 26(4), 1997. Google ScholarDigital Library
- C. Jermaine, E. Omiecinski, and W. G. Yee. The partitioned exponential file for database storage management. The VLDB Journal, 16(4), 2007. Google ScholarDigital Library
- Kirsch and Mitzenmacher. Less hashing, same performance: Building a better bloom filter. In ESA, 2006. Google ScholarDigital Library
- B. C. Kuszmaul. How TokuDB fractal trees indexes work. Technical report, TokuTek, 2010.Google Scholar
- A. Lakshman and P. Malik. Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev., 44(2), April 2010. Google ScholarDigital Library
- Y. Li, B. He, J. Y. 0001, Q. Luo, and K. Yi. Tree indexing on solid state drives. PVLDB, 3(1):1195--1206, 2010. Google ScholarDigital Library
- H. Lim, B. Fan, D. Andersen, and M. Kaminsky. Silt: a memory-efficient, high-performance key-value store. In SOSP, 2011. Google ScholarDigital Library
- M. Moshayedi and P. Wilkison. Enterprise SSDs. ACM Queue, 6, July 2008. Google ScholarDigital Library
- P. Muth, P. O'Neil, A. Pick, and G. Weikum. The LHAM Log-structured history data access method. In VLDB Journal, 2000. Google ScholarDigital Library
- S. Nath and A. Kansal. FlashDB: Dynamic self-tuning database for NAND flash. In IPSN, 2007. Google ScholarDigital Library
- P. O'Neil, E. Cheng, D. Gawlick, and E. O'Neil. The log-structured merge-tree (LSM-tree). Acta Informatica, 33(4):351--385, 1996. Google ScholarDigital Library
- Oracle. Berkeley DB Java Edition.Google Scholar
- C. Rev, D. Hitz, J. Lau, and M. Malcolm. File system design for an NFS file server appliance, 1995.Google Scholar
- M. Rosenblum and J. K. Ousterhout. The design and implementation of a log-structured file system. In SOSP, 1992. Google ScholarDigital Library
- R. Sears and E. Brewer. Stasis: Flexible transactional storage. In OSDI, 2006. Google ScholarDigital Library
- R. Sears and E. Brewer. Segment-based recovery: Write-ahead logging revisited. In VLDB, 2009. Google ScholarDigital Library
- R. Sears, M. Callaghan, and E. Brewer. Rose: Compressed, log-structured replication. VLDB, 2008. Google ScholarDigital Library
- M. Seltzer, K. A. Smith, H. Balakrishnan, J. Chang, S. McMains, and V. Padmanabhan. File system logging versus clustering: A performance comparison. In Usenix Annual Technical Conference, 1995. Google ScholarDigital Library
- J. Sheehy and D. Smith. Bitcask, a log-structured hash table for fast key/value data. Technical report, Basho, 2010.Google Scholar
- R. Spillane, P. Shetty, E. Zadok, S. Archak, and S. Dixit. An efficient multi-tier tablet server storage architecture. In SoCC, 2011. Google ScholarDigital Library
- D. Zeinalipour-Yazti, S. Lin, V. Kalogeraki, D. Gunopulos, and W. Najjar. MicroHash: An efficient index structure for fash-based sensor devices. In FAST, 2005. Google ScholarDigital Library
Index Terms
- bLSM: a general purpose log structured merge tree
Recommendations
Building Efficient Key-Value Stores via a Lightweight Compaction Tree
Special Issue on MSST 2017 and Regular PapersLog-Structure Merge tree (LSM-tree) has been one of the mainstream indexes in key-value systems supporting a variety of write-intensive Internet applications in today’s data centers. However, the performance of LSM-tree is seriously hampered by ...
MDCF: Multiple Dynamic Cuckoo Filters for LSM-Tree
Algorithms and Architectures for Parallel ProcessingAbstractAs a write-optimized data structure, the Log-Structured Merge-tree (LSM-tree) based storage engine, which maintains data in a leveled structure on disk, is widely used in Key-Value (KV) storage systems. Meanwhile, the leveled design also makes it ...
dCompaction: Delayed Compaction for the LSM-Tree
Key-value (KV) stores have become a backbone of large-scale applications in today's data centers. Write-optimized data structures like the Log-Structured Merge-tree (LSM-tree) and their variants are widely used in KV storage systems like BigTable and ...
Comments