ABSTRACT
Predictable latency on flash storage is a long-pursuit goal, yet, unpredictability stays due to the unavoidable disturbance from many well-known SSD internal activities. To combat this issue, the recent NVMe IO Determinism (IOD) interface advocates host-level controls to SSD internal management tasks. While promising, challenges remain on how to exploit it for truly predictable performance.
We present IODA, an I/O deterministic flash array design built on top of small but powerful extensions to the IOD interface for easy deployment. IODA exploits data redundancy in the context of IOD for a strong latency predictability contract. In IODA, SSDs are expected to quickly fail an I/O on purpose to allow predictable I/Os through proactive data reconstruction. In the case of concurrent internal operations, IODA introduces busy remaining time exposure and predictable-latency-window formulation to guarantee predictable data reconstructions. Overall, IODA only adds 5 new fields to the NVMe interface and a small modification in the flash firmware, while keeping most of the complexity in the host OS. Our evaluation shows that IODA improves the 95-99.99th latencies by up to 75x. IODA is also the nearest to the ideal, no disturbance case compared to 7 state-of-the-art preemption, suspension, GC coordination, partitioning, tiny-tail flash controller, prediction, and proactive approaches.
- Luiz Barroso, Mike Marty, David Patterson, and Parthasarathy Ranganathan. Attack of the Killer Microseconds. Communications of the ACM, 60(4), 2017.Google Scholar
- Jeffrey Dean and Luiz Andre Barroso. The Tail at Scale. Communications of the ACM (CACM), 56(2), 2013.Google Scholar
- Why Deterministic Storage Performance is Important. https://www.architecting.it/blog/deterministic- storage-performance/, 2018.Google Scholar
- All-Flash NVMe Reference Architecture. https://www.samsung.com/semiconductor/global.semi/file/resource/2020/05/redhat-ceph-whitepaper-0521.pdf, 2020.Google Scholar
- Micron 9100 U.2 and HHHL NVMe PCIe SSDs. https://www.micron.com/-/media/client/global/documents/products/data-sheet/ssd/9100_hhhl_u_2_pcie_ssd.pdf, 2020.Google Scholar
- Achieve Consistent Low Latency for Your Storage-Intensive Workloads. https://www.intel.com/content/www/us/en/architecture-and-technology/optane-technology/low-latency-for-storage-intensive-workloads-article-brief.html, 2021.Google Scholar
- Ross Stenfort, Ta-Yu Wu, and Lee Prewitt. NVMe Cloud SSD Specification. https://www.opencompute.org/documents/nvme-cloud-ssd-specification-v1-0-3-pdf, 2020.Google Scholar
- Storage Latency in Flash Arrays. https://www.violinsystems.com/wp-content/uploads/Storage-Mojo-WP-storage-latency.pdf, 2020.Google Scholar
- Shiqin Yan, Huaicheng Li, Mingzhe Hao, Michael Hao Tong, Swaminathan Sundararaman, Andrew A. Chien, and Haryadi S. Gunawi. Tiny-Tail Flash: Near-Perfect Elimination of Garbage Collection Tail Latencies in NAND SSDs. In Proceedings of the 15th USENIX Symposium on File and Storage Technologies (FAST), 2017.Google ScholarDigital Library
- Nima Elyasi, Changho Choi, Anand Sivasubramaniam, Jingpei Yang, and Vijay Balakrishnan. Trimming the Tail for Deterministic Read Performance in SSDs. In IEEE International Symposium on Workload Characterization (IISWC), 2019.Google Scholar
- Jian Ouyang, Shiding Lin, Song Jiang, Zhenyu Hou, Yong Wang, and Yuanzheng Wang. SDF: Software-Defined Flash for Web-Scale Internet Storage System. In Proceedings of the 19th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2014.Google ScholarDigital Library
- GreyBeards on Storage. https://silvertonconsulting.com/gbos2/tag/tail-latency/, 2016.Google Scholar
- Chris Petersen, Wei Zhang, and Alexei Naberezhnov. Enabling NVMe I/O Determinism @Scale. https://www.flashmemorysummit.com/English/Collaterals/Proceedings/2018/20180807_INVT-102A-1_Petersen.pdf, 2018.Google Scholar
- Kapil Karkra. Using Software to Reduce High Tail Latencies on SSDs. https://www.flashmemorysummit.com/English/Collaterals/Proceedin gs/2018/20180808_SOFT-201-1_Karkar.pdf, 2018.Google Scholar
- Data Set Management Commands Proposal for ATA8-ACS2. http://www.t13.org/Documents/UploadedDocuments/docs2008/e07154r6-Data_Set_Management_Proposal_for_ATA-ACS2.pdf, 2020.Google Scholar
- NVM Express Base Specification 1.0. https://nvmexpress.org/wp-content/uploads/NVM-Express-1_0e.pdf, 2020.Google Scholar
- Taejin Kim, Duwon Hong, Sangwook Shane Hahn, Myoungjun Chun, Sungjin Lee, Jooyoung Hwang, Jongyoul Lee, and Jihong Kim. Fully Automatic Stream Management for Multi-Streamed SSDs Using Program Contexts. In Proceedings of the 17th USENIX Symposium on File and Storage Technologies (FAST), 2019.Google Scholar
- NVM Express Base Specification 1.4. https://nvmexpress.org/wp-content/uploads/NVM-Express-1_4-2019.06.10-Ratified.pdf, 2020.Google Scholar
- Jon C. R. Bennett. Memory Management System and Method. https://www.google.com/patents/US8200887, 2012.Google Scholar
- K. V. Rashmi, Mosharaf Chowdhury, Jack Kosaian, Ion Stoica, and Kannan Ramchandran. EC-Cache: Load-Balanced, Low-Latency Cluster Caching with Online Erasure Coding. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2016.Google Scholar
- Yaochen Hu, Yushi Wang, Bang Liu, Di Niu, and Cheng Huang. Latency Reduction and Load Balancing in Coded Storage Systems. In Proceedings of the 8th ACM Symposium on Cloud Computing (SoCC), 2017.Google Scholar
- Heiner Litz, Javier Gonzalez, Ana Klimovic, and Christos Kozyrakis. RAIL: Predictable, Low Tail Latency for NVMe Flash. ACM Transactions on Storage (TOS), 1(1), 2021.Google Scholar
- Huaicheng Li, Mingzhe Hao, Michael Hao Tong, Swaminathan Sundararaman, Matias Bjørling, and Haryadi S. Gunawi. The CASE of FEMU: Cheap, Accurate, Scalable and Extensible Flash Emulator. In Proceedings of the 16th USENIX Symposium on File and Storage Technologies (FAST), 2018.Google Scholar
- Matias Bjørling, Javier Gonzalez, and Philippe Bonnet. LightNVM: The Linux Open-Channel SSD Subsystem. In Proceedings of the 15th USENIX Symposium on File and Storage Technologies (FAST), 2017.Google ScholarDigital Library
- Junghee Lee, Youngjae Kim, Galen M. Shipman, Sarp Oral, Feiyi Wang, and Jongman Kim. A Semi-Preemptive Garbage Collector for Solid State Drives. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2011.Google ScholarDigital Library
- Pre-emptive Garbage Collection of Memory Blocks. https://www.google.com/patents/US8626986, 2014.Google Scholar
- Junghee Lee, Youngjae Kim, Galen M. Shipman, Sarp Oral, and Jongman Kim. Preemptible I/O Scheduling of Garbage Collection for Solid State Drives. In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2013.Google Scholar
- Guanying Wu and Xubin He. Reducing SSD Read Latency via NAND Flash Program and Erase Suspension. In Proceedings of the 10th USENIX Symposium on File and Storage Technologies (FAST), 2012.Google ScholarDigital Library
- Shine Kim, Jonghyun Bae, Hakbeom Jang, Wenjing Jin, Jeonghun Gong, Seungyeon Lee, Tae Jun Ham, and Jae W. Lee. Practical Erase Suspension for Modern Low-latency SSDs. In Proceedings of the 2019 USENIX Annual Technical Conference (ATC), 2019.Google ScholarDigital Library
- Erase Suspend/Resume for Memory. https://patents.google.com/patent/US9223514B2/en, 2015.Google Scholar
- John Colgrove, John D. Davis, John Hayes, Ethan L. Miller, Cary Sandvig, Russell Sears, Ari Tamches, Neil Vachharajani, and Feng Wang. Purity: Building Fast, Highly-Available Enterprise Flash Storage from Commodity Components. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD), 2015.Google ScholarDigital Library
- Suzhen Wu, Weidong Zhu, Guixin Liu, Hong Jiang, and Bo Mao. GC-aware Request Steering with Improved Performance and Reliability for SSD-based RAIDs. In Proceedings of the 32th IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2018.Google Scholar
- Youngjae Kim, Sarp Oral, Galen M. Shipman, Junghee Lee, David A. Dillow, and Feiyi Wang. Harmonia: A Globally Coordinated Garbage Collector for Arrays of Solid-state Drives. In Proceedings of the 27th IEEE Symposium on Massive Storage Systems and Technologies (MSST), 2011.Google ScholarDigital Library
- Jaeho Kim, Kwanghyun Lim, Youngdon Jung, Sungjin Lee, Changwoo Min, and Sam H. Noh. Alleviating Garbage Collection Interference Through Spatial Separation in All Flash Arrays. In Proceedings of the 2019 USENIX Annual Technical Conference (ATC), 2019.Google ScholarDigital Library
- Dimitris Skourtis, Dimitris Achlioptas, Noah Watkins, Carlos Maltzahn, and Scott Brandt. Flash on Rails: Consistent Flash Performance through Redundancy. In Proceedings of the 2014 USENIX Annual Technical Conference (ATC), 2014.Google ScholarDigital Library
- Jian Huang, Anirudh Badam, Laura Caulfield, Suman Nath, Sudipta Sengupta, Bikash Sharma, and Moinuddin K. Qureshi. FlashBlox: Achieving Both Performance Isolation and Uniform Lifetime for Virtualized SSDs. In Proceedings of the 15th USENIX Symposium on File and Storage Technologies (FAST), 2017.Google ScholarDigital Library
- Jaeho Kim, Donghee Lee, and Sam H. Noh. Towards SLO Complying SSDs Through OPS Isolation. In Proceedings of the 13th USENIX Symposium on File and Storage Technologies (FAST), 2015.Google ScholarDigital Library
- Mingzhe Hao, Huaicheng Li, Michael Hao Tong, Chrisma Pakha, Riza O. Suminto, Cesar A. Stuardo, Andrew A. Chien, and Haryadi S. Gunawi. MittOS: Supporting Millisecond Tail Tolerance with Fast Rejecting SLO-Aware OS Interface. In Proceedings of the 26th ACM Symposium on Operating Systems Principles (SOSP), 2017.Google ScholarDigital Library
- Chun-Yi Liu, Jagadish Kotra, Myoungsoo Jung, and Mahmut T. Kandemir. PEN: Design and Evaluation of Partial-Erase for 3D NAND-Based High Density SSDs. In Proceedings of the 16th USENIX Symposium on File and Storage Technologies (FAST), 2018.Google Scholar
- Michael Mesnier, Jason B. Akers, Feng Chen, and Tian Luo. Differentiated Storage Services. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP), 2011.Google Scholar
- George Amvrosiadis, Angela Demke Brown, and Ashvin Goel. Opportunistic storage maintenance. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP), 2015.Google ScholarDigital Library
- Jie Zhang, Miryeong Kwon, Donghyun Gouk, Sungjoon Koh, Changlim Lee, Mohammad Alian, Myoungjun Chun, Mahmut Taylan Kandemir, Nam Sung Kim, Jihong Kim, and Myoungsoo Jung. FlashShare: Punching Through Server Storage Stack from Kernel to Firmware for Ultra-Low Latency SSDs. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2018.Google Scholar
- Chun-Yi Liu, Yunju Lee, Myoungsoo Jung, Mahmut Taylan Kandemir, and Wonil Choi. Prolonging 3D NAND SSD Lifetime via Read Latency Relaxation. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2021.Google Scholar
- Katherine Missimer and Richard West. Partitioned Real-Time NAND Flash Storage. In Proceedings of the 39th IEEE Real-Time Systems Symposium (RTSS), 2018.Google Scholar
- Lalith Suresh, Marco Canini, Stefan Schmid, and Anja Feldmann. C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection. In Proceedings of the 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2015.Google Scholar
- Zhe Wu, Curtis Yu, and Harsha V. Madhyastha. CosTLO: Cost-Effective Redundancy for Lower Latency Variance on Cloud Storage Services. In Proceedings of the 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2015.Google Scholar
- Mingzhe Hao, Levent Toksoz, Nanqinqin Li, Edward Edberg Halim, Henry Hoffmann, and Haryadi S. Gunawi. LinnOS: Predictability on Unpredictable Flash Storage with a Light Neural Network. In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2020.Google Scholar
- Ji-Yong Shin, Mahesh Balakrishnan, Tudor Marian, and Hakim Weatherspoon. Gecko: Contention-Oblivious Disk Arrays for Cloud Storage. In Proceedings of the 11th USENIX Symposium on File and Storage Technologies (FAST), 2013.Google Scholar
- Youngjae Kim, Junghee Lee, Sarp Oral, David A. Dillow, Feiyi Wang, and Galen M. Shipman. Coordinating Garbage Collection for Arrays of Solid-State Drives. IEEE Transactions on Computers (TC), 63(4), April 2014.Google Scholar
- Adrian M. Caulfield, Laura M. Grupp, and Steven Swanson. Gordon: using flash memory to build fast, power-efficient clusters for data-intensive applications. In Proceedings of the 14th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2009.Google ScholarDigital Library
- Feng Chen, Rubao Lee, and Xiaodong Zhang. Essential Roles of Exploiting Internal Parallelism of Flash Memory Based Solid State Drives in High-speed Data Processing. In Proceedings of the 17th International Symposium on High Performance Computer Architecture (HPCA-17), 2011.Google ScholarCross Ref
- Myoungsoo Jung, Wonil Choi, Miryeong Kwon, Shekhar Srikantaiah, Joonhyuk Yoo, and Mahmut Kandemir. Design of a Host Interface Logic for GC-Free SSDs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 8(1), May 2019.Google ScholarDigital Library
- Ana Klimovic, Heiner Litz, and Christos Kozyrakis. ReFlex: Remote Flash ≈ Local Flash. In Proceedings of the 22nd ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2017.Google Scholar
- Tianyang Jiang, Guangyan Zhang, Zican Huang, Xiaosong Ma, Junyu Wei, Zhiyue Li, and Weimin Zheng. FusionRAID: Achieving Consistent Low Latency for Commodity SSD Arrays. In Proceedings of the 19th USENIX Symposium on File and Storage Technologies (FAST), 2021.Google Scholar
- Sudharsan Seshadri, Mark Gahagan, Sundaram Bhaskaran, Trevor Bunker, Arup De, Yanqin Jin, Yang Liu, and Steven Swanson. Willow: A User-Programmable SSD. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2014.Google Scholar
- Sungjin Lee, Ming Liu, SangWoo Jun, Shuotao Xu, Jihong Kim, and Arvind. Application-Managed Flash. In Proceedings of the 14th USENIX Symposium on File and Storage Technologies (FAST), 2016.Google ScholarDigital Library
- Yiying Zhang, Leo Prasath Arulraj, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. De-indirection for Flash-based SSDs with Nameless Writes. In Proceedings of the 10th USENIX Symposium on File and Storage Technologies (FAST), 2012.Google ScholarDigital Library
- Matias Bjørling, Abutalib Aghayev, Hans Holmberg, Aravind Ramesh, Damien Le Moal, Greg R. Ganger, and George Amvrosiadis. ZNS: Avoiding the Block Interface Tax for Flash-based SSDs. In Proceedings of the 2021 USENIX Annual Technical Conference (ATC), 2021.Google Scholar
- Amy Tai, Igor Smolyar, Michael Wei, and Dan Tsafrir. Optimizing Storage Performance with Calibrated Interrupts. In Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2021.Google Scholar
- Miryeong Kwon, Donghyun Gouk, Changrim Lee, Byounggeun Kim, Jooyoung Hwang, and Myoungsoo Jung. DC-Store: Eliminating Noisy Neighbor Containers using Deterministic I/O Performance and Resource Isolation. In Proceedings of the 18th USENIX Symposium on File and Storage Technologies (FAST), 2020.Google Scholar
- Redundant Array of Independent NAND for a Three-dimensional Memory Array. https://patents.google.com/patent/US20170249211A1/en, 2019.Google Scholar
- Martin Maas, Krste Asanovic, Tim Harris, and John Kubiatowicz. Taurus: A Holistic Language Runtime System for Coordinating Distributed Managed-Language Applications. In Proceedings of the 21st ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2016.Google ScholarDigital Library
- Martin Maas, Tim Harris, Krste Asanovic, and John Kubiatowicz. Trash Day: Coordinating Garbage Collection in Distributed Systems. In Proceedings of the 15th Workshop on Hot Topics in Operating Systems (HotOS XV), 2015.Google Scholar
- Joonsung Kim, Pyeongsu Park, Jaehyung Ahn, Jihun Kim, Jong Kim, and Jangwoo Kim. SSDcheck: Timely and Accurate Prediction of Irregular Behaviors in Black-Box SSDs. In 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-51), 2018.Google Scholar
- What's the State of DWPD? Endurance in Industry Leading Enterprise SSDs. http://www.storagesearch.com/dwpd.html, 2020.Google Scholar
- Speeds, Feeds and Needs -- Understanding SSD Endurance. https://blog.westerndigital.com/ssd-endurance-speeds-feeds-needs/, 2015.Google Scholar
- Non-Volatile Random-Access Memory. https://en.wikipedia.org/wiki/Non-volatile_random-access_memory, 2021.Google Scholar
- Intel Optane Persistent Memory (PMem). https://www.intel.com/content/www/us/en/architecture-and-technology/optane-dc-persistent-memory.html, 2021.Google Scholar
- IODA Github Homepage. https://github.com/huaicheng/IODA, 2021.Google Scholar
- FEMU Github Homepage. https://github.com/ucare-uchicago/femu, 2018.Google Scholar
- Yun-Sheng Chang, Yao Hsiao, Tzu-Chi Lin, Che-Wei Tsao, Chun-Feng Wu, Yuan-Hao Chang, Hsiang-Shang Ko, and Yu-Fang Chen. Determinizing Crash Behavior with a Verified Snapshot-Consistent Flash Translation Layer. In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2020.Google Scholar
- Huaicheng Li, Mingzhe Hao, Stanko Novakovic, Vaibhav Gogte, Sriram Govindan, Dan R. K. Ports, Irene Zhang, Ricardo Bianchini, Haryadi S. Gunawi, and Anirudh Badam. LeapIO: Efficient and Portable Virtual NVMe Storage on ARM SoCs. In Proceedings of the 25th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2020.Google ScholarDigital Library
- Open-Channel Solid State Drives. http://lightnvm.io/.Google Scholar
- Emulab D430s. https://gitlab.flux.utah.edu/emulab/emulab-devel/wikis/Utah-Cluster/d430s, 2017.Google Scholar
- Ultra-Low Latency with Samsung Z-NAND SSD. https://www.samsung.com/us/labs/pdfs/collateral/Samsung_Z-NAND_Technology_Brief_v5.pdf, 2020.Google Scholar
- SNIA I/O Trace Data Files. http://iotta.snia.org/traces, 2016.Google Scholar
- Filebench. https://github.com/filebench/filebench/wiki.Google Scholar
- Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC), 2010.Google Scholar
- Sysbench. https://github.com/akopytov/sysbench, 2020.Google Scholar
- HiBench: The Bigdata Micro Benchmark Suite. https://github.com/Intel-bigdata/HiBench, 2020.Google Scholar
- Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, and Ion Stoica. Effective Straggler Mitigation: Attack of the Clones. In Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2013.Google Scholar
- Myungsuk Kim, Jisung Park, Geonhee Cho, and Yoona Kim. Evanesco: Architectural Support for Efficient Data Sanitization in Modern Flash-Based Storage Systems. In Proceedings of the 25th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2020.Google ScholarDigital Library
- Yang Hu, Hong Jiang, Dan Feng, Lei Tian, Hao Luo, and Shuping Zhang. Performance Impact and Interplay of SSD Parallelism through Advanced Commands, Allocation Strategy and Data Granularity. In Proceedings of the 25th International Conference on Supercomputing (ICS), 2011.Google ScholarDigital Library
Index Terms
- IODA: A Host/Device Co-Design for Strong Predictability Contract on Modern Flash Storage
Recommendations
FRASH: Exploiting storage class memory in hybrid file system for hierarchical storage
In this work, we develop a novel hybrid file system, FRASH, for storage-class memory and NAND Flash. Despite the promising physical characteristics of storage-class memory, its scale is an order of magnitude smaller than the current storage device ...
Optimizing FTL mapping cache for random-write workloads using adaptive block partitioning
SAC '14: Proceedings of the 29th Annual ACM Symposium on Applied ComputingMapping table caching is a promising technique to reduce the RAM footprint of the FTL mapping tables in modern SSDs. The mapping cache can achieve a high hit ratio under the disk workloads of many production systems because there are spatial and ...
Comments