Publications

DINOMO: an elastic, scalable, high-performance key-value store for DPM

Published in Proceedings of the VLDB Endowment (pVLDB), 2022

This paper presents Dinomo, a novel key-value store for disaggregated persistent memory (DPM). Dinomo is the first key-value store for DPM that simultaneously achieves high common-case performance, scalability, and lightweight online reconfiguration simultaneously. Dinomo uses a novel combination of techniques such as ownership partitioning, disaggregated adaptive caching, selective replication, and lock-free and log-free indexing to achieve these goals. Dinomo achieves at least 3.8X better throughput than a state-of-the-art DPM key-value store while providing fast reconfiguration.
Paper Slides Talk Citation

WineFS: a hugepage-aware file system for persistent memory that ages gracefully

Published in 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2021

Modern persistent-memory (PM) file systems degrade in performance with usage due to their inability to use hugepages. This paper introduces WineFS, a novel hugepage-aware PM file system that largely eliminates this effect. WineFS combines a new alignment-aware allocator with fragmentation-avoiding approaches to consistency and concurrency to preserve the ability to use hugepages. Experiments show that WineFS resists the effects of aging and outperforms state-of-the-art PM file systems in both aged and un-aged settings.
Paper Slides Talk Citation

RainBlock: Faster Transaction Processing for Public Blockchains

Published in Annual Technical Conference (ATC), 2021

This paper presents RAINBLOCK, a public blockchain that achieves high transaction throughput. The chief insight behind RAINBLOCK is that the number of transactions in each block is limited by I/O bottlenecks. By removing these I/O bottlenecks, RAINBLOCK allows miners to process more transactions in the same amount of time. The paper makes two novel contributions: the RAINBLOCK architecture that removes I/O from the critical path of processing transactions, and the distributed, multi-versioned DSM-TREE data structure that stores the system state efficiently. A single RAINBLOCK miner processes 27.4 K transactions per second (27× higher than an Ethereum miner). In a geo-distributed settings, RAINBLOCK miners process 20K transactions per second.
Paper Slides Talk Citation

Software-defined data protection: Low overhead policy compliance is within reach!

Published in Proceedings of the VLDB Endowment (pVLDB), 2021

This paper presents our novel approach “Software-Defined Data Protection” (SDP). Its simple, yet powerful premise is to decouple often changing policies from request-level enforcement to allow distributed smart storage nodes to implement the latter at line-rate. Existing and future data protection frameworks can be translated to the same hardware interface which allows storage nodes to offload enforcement efficiently both for company-specific rules and regulations, such as GDPR or CCPA.
Paper Slides) Talk Citation

Crashmonkey and ACE: Systematically testing file-system crash consistency

Published in ACM Transactions on Storage (TOCS), 2019

This paper presents CrashMonkey and Ace, a set of tools to systematically find crash-consistency bugs in Linux file systems. CrashMonkey is a record-and-replay framework that simulates power-loss crashes while executing a given workload, and checks if the file system recovers to a consistent state after each crash. Ace automatically generates workloads to be run on the target file system. CrashMonkey and Ace are based on a new approach to test file-system crash consistency: bounded black-box crash testing (B3) which alleviates the consequences of having an infinite set of possible workloads to test. CrashMonkey and Ace are able to find 24 out of the 26 crash-consistency bugs reported in the last 5 years. These tools also revealed 10 new crash-consistency bugs in widely used, mature Linux file systems, 7 of which existed in the kernel since 2014. They also found a crash-consistency bug in a verified file system, FSCQ.
Paper Slides Talk Citation

Finding Crash-Consistency Bugs with Bounded Black-Box Crash Testing

Published in 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2018

This paper presents a new approach to test file-system crash consistency: bounded black-box crash testing (B3). B3 tests the file system in a black-box manner using workloads with file-system operations. Since the space of possible workloads is infinite, B3 bounds this space based on the insights from studying recent crash-consistency bugs reported in Linux file systems. Most reported bugs can be reproduced using small workloads with three or fewer operations, and all reported bugs result from crashes after fsync () related system calls. We build CrashMonkey and Ace, to demonstrate the effectiveness of this approach. These tools find 24 out of the 26 recent crash-consistency bugs and reveal 10 new crash-consistency bugs that result in severe consequences like broken rename atomicity and loss of persisted files.
Paper Slides Talk Citation

mLSM: Making Authenticated Storage Faster in Ethereum

Published in 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2018

This paper presents a novel data-authenticating structure, Merkelized LSM (mLSM). In authenticated storage each read returns a value and a proof that allows the client to verify the value returned is correct. Such authentication leads to high read and write amplification (64x in the worst case). mLSM significantly reduces the read and write amplification while still allowing client verification of reads, and thus improves the performance of applications like Ethereum.
Paper Slides Talk Citation