Publicação

Big data framework implemented in cloud Azure

Ver documento

Detalhes bibliográficos
Resumo:As the technology boom of the last decades has led to a much higher availability of data, companies can leverage it to make better business decisions. There are tools and techniques available to work with large amounts of data but this report focuses on studying one particular tool: Microsoft Azure. With it, a framework for processing big data was implemented with the objective to explore the tools of one of the most popular cloud services and develop the most effective architecture maintaining cost-effectiveness and the restraints of the project. The digital ecosystem of Microsoft Azure is extremely extensive and complex, so only the relevant concepts and tools were explored, namely Databricks, Synapse Analytics, Data Factory, WebJobs and Storage. During this process a greater understanding of Microsoft Azure elements was gained, both for their applications and limits. After the exploration phase, the Architecture defined was implemented that included the entire of the big data processing lifecycle and, following the entry of the framework into production, multiple possible improvements were found that can be either implemented or further researched.
Autores principais:Pêga, Sofia Alegre Fernandes
Assunto:Microsoft Azure Big Data Framework Data Lake
Ano:2023
País:Portugal
Tipo de documento:dissertação de mestrado
Tipo de acesso:acesso aberto
Instituição associada:Universidade de Lisboa
Idioma:inglês
Origem:Repositório da Universidade de Lisboa
Descrição
Resumo:As the technology boom of the last decades has led to a much higher availability of data, companies can leverage it to make better business decisions. There are tools and techniques available to work with large amounts of data but this report focuses on studying one particular tool: Microsoft Azure. With it, a framework for processing big data was implemented with the objective to explore the tools of one of the most popular cloud services and develop the most effective architecture maintaining cost-effectiveness and the restraints of the project. The digital ecosystem of Microsoft Azure is extremely extensive and complex, so only the relevant concepts and tools were explored, namely Databricks, Synapse Analytics, Data Factory, WebJobs and Storage. During this process a greater understanding of Microsoft Azure elements was gained, both for their applications and limits. After the exploration phase, the Architecture defined was implemented that included the entire of the big data processing lifecycle and, following the entry of the framework into production, multiple possible improvements were found that can be either implemented or further researched.