Publicação

Dependable decentralized storage management for cloud computing

Detalhes bibliográficos
Resumo:	The volume of worldwide digital information is growing and will continue to grow at an impressive rate. Storage deduplication is accepted as valuable technique for handling such data explosion. Namely, by eliminating unnecessary duplicate content from storage systems, both hardware and storage management costs can be improved. Nowadays, this technique is applied to distinct storage types and, it is increasingly desired in cloud computing infrastructures, where a significant portion of worldwide data is stored. However, designing a deduplication system for cloud infrastructures is a complex task, as duplicates must be found and eliminated across a distributed cluster that supports virtual machines and applications with strict storage performance requirements. The core of this dissertation addresses precisely the challenges of cloud infrastructures deduplication. We start by surveying and comparing the existing deduplication systems and the distinct storage environments targeted by them. This discussion is missing in the literature and it is important for understanding the novel issues that must be addressed by cloud deduplication systems. Then, as our main contribution, we introduce our own deduplication system that eliminates duplicates across virtual machine volumes in a distributed cloud infrastructure. Redundant content is found and removed in a cluster-wide fashion while having a negligible impact in the performance of applications using the deduplicated volumes. Our prototype is evaluated in a real distributed setting with a benchmark suited for deduplication systems, which is also a contribution of this dissertation.
Autores principais:	Paulo, João Tiago Medeiros
Assunto:	Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
Ano:	2015
País:	Portugal
Tipo de documento:	tese de doutoramento
Tipo de acesso:	acesso aberto
Instituição associada:	Universidade do Minho
Idioma:	inglês
Origem:	RepositóriUM - Universidade do Minho

Descrição
Resumo:	The volume of worldwide digital information is growing and will continue to grow at an impressive rate. Storage deduplication is accepted as valuable technique for handling such data explosion. Namely, by eliminating unnecessary duplicate content from storage systems, both hardware and storage management costs can be improved. Nowadays, this technique is applied to distinct storage types and, it is increasingly desired in cloud computing infrastructures, where a significant portion of worldwide data is stored. However, designing a deduplication system for cloud infrastructures is a complex task, as duplicates must be found and eliminated across a distributed cluster that supports virtual machines and applications with strict storage performance requirements. The core of this dissertation addresses precisely the challenges of cloud infrastructures deduplication. We start by surveying and comparing the existing deduplication systems and the distinct storage environments targeted by them. This discussion is missing in the literature and it is important for understanding the novel issues that must be addressed by cloud deduplication systems. Then, as our main contribution, we introduce our own deduplication system that eliminates duplicates across virtual machine volumes in a distributed cloud infrastructure. Redundant content is found and removed in a cluster-wide fashion while having a negligible impact in the performance of applications using the deduplicated volumes. Our prototype is evaluated in a real distributed setting with a benchmark suited for deduplication systems, which is also a contribution of this dissertation.