Publicação

Benchmark of market cloud data warehouse technologies

Ver documento

Detalhes bibliográficos
Resumo:Over the past two decades, the way computing resources are been developed, deployed, upgraded, and applied changed dramatically, with more and more software and hardware solutions being transferred to cloud technologies. Data Warehouses (DW), defined as a way of organizing corporate data in an integrated manner over (sequential) time periods, "structured & disposed" in order to generate a "single data source", were also affected by the evolution, thus giving rise to the concept of Cloud Data Warehouse (CDW). This technology allows users to be more technologically free, as they do not need to spend time investing in software and hardware, they only pay for the resources they used and the infrastructure itself has greater flexibility and scalability. However, selecting the most suitable platform or technology for a CDW can be a complex task due to the large number of factors that can influence the decision and due to the existing offer in the market. The objective of this paper is to describe the process of benchmarking a set of CDW platforms, with the goal of analyzing and exposing each one’s performance results. These platforms are Snowflake, Google BigQuery, Amazon Redshift, and Azure Synapse. The metrics to be measured are data loading and query running time, and alias running times. For this benchmark, the dataset used was Star Schema Benchmark (SSB), a dataset based on the well-known TPC Benchmark™ H (TPC-H).
Autores principais:Oliveira e Sá, Jorge
Outros Autores:Renata Gonçalves; Kaldeich, Claus
Assunto:Data Warehouse Cloud Computing Cloud Data Warehouse Cloud Data Warehouse Technologies
Ano:2024
País:Portugal
Tipo de documento:comunicação em conferência
Tipo de acesso:acesso aberto
Instituição associada:Universidade do Minho
Idioma:inglês
Origem:RepositóriUM - Universidade do Minho
_version_ 1866877899572772864
author Oliveira e Sá, Jorge
author2 Renata Gonçalves
Kaldeich, Claus
author2_role author
author
author_facet Oliveira e Sá, Jorge
Renata Gonçalves
Kaldeich, Claus
author_role author
contributor_name_str_mv Universidade do Minho
country_str PT
creators_json_txt [{\"Person.name\":\"Oliveira e Sá, Jorge\"},{\"Person.name\":\"Renata Gonçalves\"},{\"Person.name\":\"Kaldeich, Claus\"}]
datacite.contributors.contributor.contributorName.fl_str_mv Universidade do Minho
datacite.creators.creator.creatorName.fl_str_mv Oliveira e Sá, Jorge
Renata Gonçalves
Kaldeich, Claus
datacite.date.Accepted.fl_str_mv 2024-01-01T00:00:00Z
datacite.date.available.fl_str_mv 2024-03-14T20:42:17Z
datacite.date.embargoed.fl_str_mv 2024-03-14T20:42:17Z
datacite.rights.fl_str_mv http://purl.org/coar/access_right/c_abf2
datacite.subjects.subject.fl_str_mv Data Warehouse
Cloud Computing
Cloud Data Warehouse
Cloud Data Warehouse Technologies
datacite.titles.title.fl_str_mv Benchmark of market cloud data warehouse technologies
dc.contributor.none.fl_str_mv Universidade do Minho
dc.creator.none.fl_str_mv Oliveira e Sá, Jorge
Renata Gonçalves
Kaldeich, Claus
dc.date.Accepted.fl_str_mv 2024-01-01T00:00:00Z
dc.date.available.fl_str_mv 2024-03-14T20:42:17Z
dc.date.embargoed.fl_str_mv 2024-03-14T20:42:17Z
dc.format.none.fl_str_mv application/pdf
dc.identifier.none.fl_str_mv https://hdl.handle.net/1822/89565
dc.language.none.fl_str_mv eng
dc.publisher.none.fl_str_mv Elsevier
dc.rights.cclincense.fl_str_mv http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.rights.none.fl_str_mv http://purl.org/coar/access_right/c_abf2
dc.rights.rights.copyright.fl_str_mv openAccess
dc.subject.none.fl_str_mv Data Warehouse
Cloud Computing
Cloud Data Warehouse
Cloud Data Warehouse Technologies
dc.title.fl_str_mv Benchmark of market cloud data warehouse technologies
dc.type.none.fl_str_mv http://purl.org/coar/resource_type/c_5794
description Over the past two decades, the way computing resources are been developed, deployed, upgraded, and applied changed dramatically, with more and more software and hardware solutions being transferred to cloud technologies. Data Warehouses (DW), defined as a way of organizing corporate data in an integrated manner over (sequential) time periods, "structured & disposed" in order to generate a "single data source", were also affected by the evolution, thus giving rise to the concept of Cloud Data Warehouse (CDW). This technology allows users to be more technologically free, as they do not need to spend time investing in software and hardware, they only pay for the resources they used and the infrastructure itself has greater flexibility and scalability. However, selecting the most suitable platform or technology for a CDW can be a complex task due to the large number of factors that can influence the decision and due to the existing offer in the market. The objective of this paper is to describe the process of benchmarking a set of CDW platforms, with the goal of analyzing and exposing each one’s performance results. These platforms are Snowflake, Google BigQuery, Amazon Redshift, and Azure Synapse. The metrics to be measured are data loading and query running time, and alias running times. For this benchmark, the dataset used was Star Schema Benchmark (SSB), a dataset based on the well-known TPC Benchmark™ H (TPC-H).
dirty 0
eu_rights_str_mv openAccess
format conferencePaper
fulltext.url.fl_str_mv https://repositorium.uminho.pt/bitstreams/983f4eba-c233-43d2-952a-d5ae6e6a6989/download
id rum_a4d43aeb9df75e2bc4b97c7c5ea49345
identifier.url.fl_str_mv https://hdl.handle.net/1822/89565
instacron_str repositorium
institution Universidade do Minho
instname_str Universidade do Minho
language eng
network_acronym_str rum
network_name_str RepositóriUM - Universidade do Minho
oai_identifier_str oai:repositorium.uminho.pt:1822/89565
organization_str_mv urn:organizationAcronym:repositorium
person_str_mv Oliveira e Sá, Jorge
Renata Gonçalves
Kaldeich, Claus
publishDate 2024
publisher.none.fl_str_mv Elsevier
reponame_str RepositóriUM - Universidade do Minho
repository_id_str urn:repositoryAcronym:rum
service_str_mv urn:repositoryAcronym:rum
spelling engElsevierporOver the past two decades, the way computing resources are been developed, deployed, upgraded, and applied changed dramatically, with more and more software and hardware solutions being transferred to cloud technologies. Data Warehouses (DW), defined as a way of organizing corporate data in an integrated manner over (sequential) time periods, "structured & disposed" in order to generate a "single data source", were also affected by the evolution, thus giving rise to the concept of Cloud Data Warehouse (CDW). This technology allows users to be more technologically free, as they do not need to spend time investing in software and hardware, they only pay for the resources they used and the infrastructure itself has greater flexibility and scalability. However, selecting the most suitable platform or technology for a CDW can be a complex task due to the large number of factors that can influence the decision and due to the existing offer in the market. The objective of this paper is to describe the process of benchmarking a set of CDW platforms, with the goal of analyzing and exposing each one’s performance results. These platforms are Snowflake, Google BigQuery, Amazon Redshift, and Azure Synapse. The metrics to be measured are data loading and query running time, and alias running times. For this benchmark, the dataset used was Star Schema Benchmark (SSB), a dataset based on the well-known TPC Benchmark™ H (TPC-H).application/pdfporBenchmark of market cloud data warehouse technologiesOliveira e Sá, JorgeRenata GonçalvesKaldeich, ClausHostingInstitutionOrganizationalUniversidade do Minhoe-mailmailto:repositorium@usdb.uminho.ptrepositorium@usdb.uminho.ptISSNIsPartOf1877-0509DOIIsPartOf10.1016/j.procs.2024.06.2892024-03-14T20:42:17Z20242024-01-01T00:00:00ZHandlehttps://hdl.handle.net/1822/89565http://purl.org/coar/access_right/c_abf2open accessData WarehouseCloud ComputingCloud Data WarehouseCloud Data Warehouse Technologies635896 bytesother research producthttp://purl.org/coar/resource_type/c_5794conference paper2024http://creativecommons.org/licenses/by-nc-nd/4.0/openAccesshttp://purl.org/coar/access_right/c_abf2application/pdffulltexthttps://repositorium.uminho.pt/bitstreams/983f4eba-c233-43d2-952a-d5ae6e6a6989/download
spellingShingle Benchmark of market cloud data warehouse technologies
Oliveira e Sá, Jorge
Data Warehouse
Cloud Computing
Cloud Data Warehouse
Cloud Data Warehouse Technologies
status SINGLETON
subject.fl_str_mv Data Warehouse
Cloud Computing
Cloud Data Warehouse
Cloud Data Warehouse Technologies
title Benchmark of market cloud data warehouse technologies
title_full Benchmark of market cloud data warehouse technologies
title_fullStr Benchmark of market cloud data warehouse technologies
title_full_unstemmed Benchmark of market cloud data warehouse technologies
title_short Benchmark of market cloud data warehouse technologies
title_sort Benchmark of market cloud data warehouse technologies
topic Data Warehouse
Cloud Computing
Cloud Data Warehouse
Cloud Data Warehouse Technologies
topic_facet Data Warehouse
Cloud Computing
Cloud Data Warehouse
Cloud Data Warehouse Technologies
url https://hdl.handle.net/1822/89565
visible 1