Publicação
Big data framework implemented in cloud Azure
| Resumo: | As the technology boom of the last decades has led to a much higher availability of data, companies can leverage it to make better business decisions. There are tools and techniques available to work with large amounts of data but this report focuses on studying one particular tool: Microsoft Azure. With it, a framework for processing big data was implemented with the objective to explore the tools of one of the most popular cloud services and develop the most effective architecture maintaining cost-effectiveness and the restraints of the project. The digital ecosystem of Microsoft Azure is extremely extensive and complex, so only the relevant concepts and tools were explored, namely Databricks, Synapse Analytics, Data Factory, WebJobs and Storage. During this process a greater understanding of Microsoft Azure elements was gained, both for their applications and limits. After the exploration phase, the Architecture defined was implemented that included the entire of the big data processing lifecycle and, following the entry of the framework into production, multiple possible improvements were found that can be either implemented or further researched. |
|---|---|
| Autores principais: | Pêga, Sofia Alegre Fernandes |
| Assunto: | Microsoft Azure Big Data Framework Data Lake |
| Ano: | 2023 |
| País: | Portugal |
| Tipo de documento: | dissertação de mestrado |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade de Lisboa |
| Idioma: | inglês |
| Origem: | Repositório da Universidade de Lisboa |
| Resumo: | As the technology boom of the last decades has led to a much higher availability of data, companies can leverage it to make better business decisions. There are tools and techniques available to work with large amounts of data but this report focuses on studying one particular tool: Microsoft Azure. With it, a framework for processing big data was implemented with the objective to explore the tools of one of the most popular cloud services and develop the most effective architecture maintaining cost-effectiveness and the restraints of the project. The digital ecosystem of Microsoft Azure is extremely extensive and complex, so only the relevant concepts and tools were explored, namely Databricks, Synapse Analytics, Data Factory, WebJobs and Storage. During this process a greater understanding of Microsoft Azure elements was gained, both for their applications and limits. After the exploration phase, the Architecture defined was implemented that included the entire of the big data processing lifecycle and, following the entry of the framework into production, multiple possible improvements were found that can be either implemented or further researched. |
|---|