Publicação
Co-designing log-structured merge key-value stores with a non-volatile storage hierarchy
| Resumo: | The trend of increasing size of datasets in storage-based applications has promoted the research of new methods and technologies for efficiently storing, processing, and analyzing large amounts of data. As a result, Log Structured Merge (LSM) Key-Value Stores (KVSs) have been highly adopted since their design allows high write throughput and enforces sequential disk access patterns. Additionally, with the advent of Non-Volatile Main Memory (NVMM), new storage technologies have emerged that offer faster access times compared to traditional block-based storage devices, thus accelerating KVSs. However, while NVMM devices offer faster access to data, they are typically limited in capacity and are often more expensive. To address this trade-off, contemporary storage solutions harness the capabilities of heterogeneous storage devices in two fundamental manners: caching and tiering. In this dissertation, we show that, on one hand, read-dominated workloads benefit from a caching approach, but their performance degrades under tiering. On the other hand, for write-dominated workloads, the tiering approach presents better performance, while storing the entire dataset on NVMM actually degrades performance. To overcome these challenges, this dissertation proposes KEIGO, a novel storage middleware that al lows LSM-based KVS to efficiently use storage hierarchies composed of NVMM and block-based devices. KEIGO is aware of the different I/O operations done by the KVS (e.g., foreground requests, and background flushes and compactions) and the characteristics of the underlying devices (e.g., concurrency, read/write asymmetry). This knowledge serves as a pivotal factor in optimizing KEIGO’s performance in the face of dynamic and mixed production workloads such as those observed in Nutanix and Meta. Moreover, KEIGO requires minimal code modifications to integrate into production-ready LSM KVSs. Conducted experiments show that KEIGO significantly enhances the throughput of LSM KVS solu tions, including RocksDB, Speedb, and LevelDB, by as much as 12.4×. Furthermore, it substantially reduces tail latency by up to 21.3× over both general-purpose storage solutions and LSM KVSs built from the ground up for hierarchical storage. |
|---|---|
| Autores principais: | Adão, Rúben Daniel Almeida |
| Assunto: | Key-value store NVMM Persistent memory Hierarchical storage Armazenamento chave-valor Memória persistente Armazenamento hierárquico |
| Ano: | 2024 |
| País: | Portugal |
| Tipo de documento: | dissertação de mestrado |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade do Minho |
| Idioma: | inglês |
| Origem: | RepositóriUM - Universidade do Minho |
| Resumo: | The trend of increasing size of datasets in storage-based applications has promoted the research of new methods and technologies for efficiently storing, processing, and analyzing large amounts of data. As a result, Log Structured Merge (LSM) Key-Value Stores (KVSs) have been highly adopted since their design allows high write throughput and enforces sequential disk access patterns. Additionally, with the advent of Non-Volatile Main Memory (NVMM), new storage technologies have emerged that offer faster access times compared to traditional block-based storage devices, thus accelerating KVSs. However, while NVMM devices offer faster access to data, they are typically limited in capacity and are often more expensive. To address this trade-off, contemporary storage solutions harness the capabilities of heterogeneous storage devices in two fundamental manners: caching and tiering. In this dissertation, we show that, on one hand, read-dominated workloads benefit from a caching approach, but their performance degrades under tiering. On the other hand, for write-dominated workloads, the tiering approach presents better performance, while storing the entire dataset on NVMM actually degrades performance. To overcome these challenges, this dissertation proposes KEIGO, a novel storage middleware that al lows LSM-based KVS to efficiently use storage hierarchies composed of NVMM and block-based devices. KEIGO is aware of the different I/O operations done by the KVS (e.g., foreground requests, and background flushes and compactions) and the characteristics of the underlying devices (e.g., concurrency, read/write asymmetry). This knowledge serves as a pivotal factor in optimizing KEIGO’s performance in the face of dynamic and mixed production workloads such as those observed in Nutanix and Meta. Moreover, KEIGO requires minimal code modifications to integrate into production-ready LSM KVSs. Conducted experiments show that KEIGO significantly enhances the throughput of LSM KVS solu tions, including RocksDB, Speedb, and LevelDB, by as much as 12.4×. Furthermore, it substantially reduces tail latency by up to 21.3× over both general-purpose storage solutions and LSM KVSs built from the ground up for hierarchical storage. |
|---|