Publicação
SafeSpark: a secure data analytics platform using cryptographic techniques and trusted hardware
| Resumo: | Nowadays, most companies resort to data analytics frameworks to extract value from the increasing amounts of digital information. These systems give substantial competitive ad vantages to companies since they allow to support situations such as possible marketing decisions or predict user behaviors. Therefore, organizations tend to leverage the cloud to store and perform analytics over the data. Database services in the cloud present significant advantages as a high level of efficiency and flexibility, and the reduction of costs inherent to the maintenance and management of private infrastructures. The problem is that these services are often a target for malicious attacks, which means that sensitive and private personal information can be compromised. The current secure analytical processing solutions use a limited set of cryptographic techniques or technologies, which makes it impossible to explore different trade-offs of performance, security, and functionality requirements for different applications. Moreover, these systems also do not explore the combination of multiple cryptographic techniques and trusted hardware to protect sensitive data. The work presented here addresses this challenge, by using cryptographic schemes and the Intel SGX technology to protect confidential information, ensuring a practical solution which can be adapted to applications with different requirements. In detail, this dissertation begins by exposing a baseline study about cryptographic schemes and the Intel SGX tech nology, followed by the state-of-the-art revision about secure data analytics frameworks. A new solution based on the Apache Spark framework, called SafeSpark, is proposed. It provides a modular and extensible architecture and prototype, which allows protecting in formation and processing analytical queries over encrypted data, using three cryptographic schemes and the SGX technology. We validated the prototype with an experimental evalu ation, where we analyze the performance costs of the solution and also its resource usage. For this purpose, we use the TPC-DS benchmark to evaluate the proposed solution, and the results show that it is possible to perform analytical processing on protected data with a performance impact between 1.13x and 4.1x. |
|---|---|
| Autores principais: | Carvalho, Hugo Alves |
| Assunto: | Apache Spark Cryptographic Schemes Databases Intel SGX Bases de Dados Esquemas Criptográficos |
| Ano: | 2019 |
| País: | Portugal |
| Tipo de documento: | dissertação de mestrado |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade do Minho |
| Idioma: | inglês |
| Origem: | RepositóriUM - Universidade do Minho |
| Resumo: | Nowadays, most companies resort to data analytics frameworks to extract value from the increasing amounts of digital information. These systems give substantial competitive ad vantages to companies since they allow to support situations such as possible marketing decisions or predict user behaviors. Therefore, organizations tend to leverage the cloud to store and perform analytics over the data. Database services in the cloud present significant advantages as a high level of efficiency and flexibility, and the reduction of costs inherent to the maintenance and management of private infrastructures. The problem is that these services are often a target for malicious attacks, which means that sensitive and private personal information can be compromised. The current secure analytical processing solutions use a limited set of cryptographic techniques or technologies, which makes it impossible to explore different trade-offs of performance, security, and functionality requirements for different applications. Moreover, these systems also do not explore the combination of multiple cryptographic techniques and trusted hardware to protect sensitive data. The work presented here addresses this challenge, by using cryptographic schemes and the Intel SGX technology to protect confidential information, ensuring a practical solution which can be adapted to applications with different requirements. In detail, this dissertation begins by exposing a baseline study about cryptographic schemes and the Intel SGX tech nology, followed by the state-of-the-art revision about secure data analytics frameworks. A new solution based on the Apache Spark framework, called SafeSpark, is proposed. It provides a modular and extensible architecture and prototype, which allows protecting in formation and processing analytical queries over encrypted data, using three cryptographic schemes and the SGX technology. We validated the prototype with an experimental evalu ation, where we analyze the performance costs of the solution and also its resource usage. For this purpose, we use the TPC-DS benchmark to evaluate the proposed solution, and the results show that it is possible to perform analytical processing on protected data with a performance impact between 1.13x and 4.1x. |
|---|