Publicação

Configuring and executing ETL tasks on GRID environments - requirements and specificities

Ver documento

Detalhes bibliográficos
Resumo:Data Warehouses store integrated and consistent data in a subject-oriented data repository dedicated especially to support business intelligence processes. Nevertheless, in order to maintain a data warehouse up-to-date, data intensive tasks retrieve regularly specialized information from specific preselected information sources, transforming and conforming it accordingly to some specific business requirements provided by decision-makers. Such tasks, commonly named as Extract-Transform-Load (ETL) processes, have a limited time frame window to be executed over an ever increasing amount of data with extremely complex operations. The common approach to deal with the need of more computational power is the acquisition of new and more powerful hardware. This expensive approach disregards the unused computational resources available in desktop computers already present at most enterprises’ computational environments. This paper intends to define a different approach to deal with ETL processes, taking advantage of parallel processing over a GRID environment using XML data as an effective support to data storage and communication, demonstrating that GRID environments could be a real alternative for the implementation of low cost data warehouses.
Autores principais:Santos, Vasco
Outros Autores:Oliveira, Bruno; Silva, Rui; Belo, O.
Assunto:Data Warehousing ETL GRID Parallel processing Relational algebra
Ano:2011
País:Portugal
Tipo de documento:comunicação em conferência
Tipo de acesso:acesso restrito
Instituição associada:Universidade do Minho
Idioma:inglês
Origem:RepositóriUM - Universidade do Minho
_version_ 1866877061077925888
author Santos, Vasco
author2 Oliveira, Bruno
Silva, Rui
Belo, O.
author2_role author
author
author
author_facet Santos, Vasco
Oliveira, Bruno
Silva, Rui
Belo, O.
author_role author
contributor_name_str_mv Universidade do Minho
country_str PT
creators_json_txt [{\"Person.name\":\"Santos, Vasco\"},{\"Person.name\":\"Oliveira, Bruno\"},{\"Person.name\":\"Silva, Rui\"},{\"Person.name\":\"Belo, O.\"}]
datacite.contributors.contributor.contributorName.fl_str_mv Universidade do Minho
datacite.creators.creator.creatorName.fl_str_mv Santos, Vasco
Oliveira, Bruno
Silva, Rui
Belo, O.
datacite.date.Accepted.fl_str_mv 2011-01-01T00:00:00Z
datacite.date.available.fl_str_mv 2015-10-01T15:46:31Z
datacite.date.embargoed.fl_str_mv 2015-10-01T15:46:31Z
datacite.rights.fl_str_mv http://purl.org/coar/access_right/c_16ec
datacite.subjects.subject.fl_str_mv Data Warehousing
ETL
GRID
Parallel processing
Relational algebra
datacite.titles.title.fl_str_mv Configuring and executing ETL tasks on GRID environments - requirements and specificities
dc.contributor.none.fl_str_mv Universidade do Minho
dc.creator.none.fl_str_mv Santos, Vasco
Oliveira, Bruno
Silva, Rui
Belo, O.
dc.date.Accepted.fl_str_mv 2011-01-01T00:00:00Z
dc.date.available.fl_str_mv 2015-10-01T15:46:31Z
dc.date.embargoed.fl_str_mv 2015-10-01T15:46:31Z
dc.format.none.fl_str_mv application/pdf
dc.identifier.none.fl_str_mv https://hdl.handle.net/1822/37443
dc.language.none.fl_str_mv eng
dc.publisher.none.fl_str_mv Elsevier
dc.rights.none.fl_str_mv http://purl.org/coar/access_right/c_16ec
dc.subject.none.fl_str_mv Data Warehousing
ETL
GRID
Parallel processing
Relational algebra
dc.title.fl_str_mv Configuring and executing ETL tasks on GRID environments - requirements and specificities
dc.type.none.fl_str_mv http://purl.org/coar/resource_type/c_5794
description Data Warehouses store integrated and consistent data in a subject-oriented data repository dedicated especially to support business intelligence processes. Nevertheless, in order to maintain a data warehouse up-to-date, data intensive tasks retrieve regularly specialized information from specific preselected information sources, transforming and conforming it accordingly to some specific business requirements provided by decision-makers. Such tasks, commonly named as Extract-Transform-Load (ETL) processes, have a limited time frame window to be executed over an ever increasing amount of data with extremely complex operations. The common approach to deal with the need of more computational power is the acquisition of new and more powerful hardware. This expensive approach disregards the unused computational resources available in desktop computers already present at most enterprises’ computational environments. This paper intends to define a different approach to deal with ETL processes, taking advantage of parallel processing over a GRID environment using XML data as an effective support to data storage and communication, demonstrating that GRID environments could be a real alternative for the implementation of low cost data warehouses.
dirty 0
eu_rights_str_mv restrictedAccess
format conferencePaper
fulltext.url.fl_str_mv https://prod-dspace.uminho.pt/bitstreams/b03b1a26-4b39-43b8-b6b5-4a2987ad17db/download
id rum_b1bc29d0d2e552d73e4b5dcdc651e118
identifier.url.fl_str_mv https://hdl.handle.net/1822/37443
instacron_str repositorium
institution Universidade do Minho
instname_str Universidade do Minho
language eng
network_acronym_str rum
network_name_str RepositóriUM - Universidade do Minho
oai_identifier_str oai:repositorium.uminho.pt:1822/37443
organization_str_mv urn:organizationAcronym:repositorium
person_str_mv Santos, Vasco
Oliveira, Bruno
Silva, Rui
Belo, O.
publishDate 2011
publisher.none.fl_str_mv Elsevier
reponame_str RepositóriUM - Universidade do Minho
repository_id_str urn:repositoryAcronym:rum
service_str_mv urn:repositoryAcronym:rum
spelling engElsevierporData Warehouses store integrated and consistent data in a subject-oriented data repository dedicated especially to support business intelligence processes. Nevertheless, in order to maintain a data warehouse up-to-date, data intensive tasks retrieve regularly specialized information from specific preselected information sources, transforming and conforming it accordingly to some specific business requirements provided by decision-makers. Such tasks, commonly named as Extract-Transform-Load (ETL) processes, have a limited time frame window to be executed over an ever increasing amount of data with extremely complex operations. The common approach to deal with the need of more computational power is the acquisition of new and more powerful hardware. This expensive approach disregards the unused computational resources available in desktop computers already present at most enterprises’ computational environments. This paper intends to define a different approach to deal with ETL processes, taking advantage of parallel processing over a GRID environment using XML data as an effective support to data storage and communication, demonstrating that GRID environments could be a real alternative for the implementation of low cost data warehouses.application/pdfporConfiguring and executing ETL tasks on GRID environments - requirements and specificitiesSantos, VascoOliveira, BrunoSilva, RuiBelo, O.HostingInstitutionOrganizationalUniversidade do Minhoe-mailmailto:repositorium@usdb.uminho.ptrepositorium@usdb.uminho.ptISSNIsPartOf2212-0173DOIIsPartOf10.1016/j.protcy.2012.02.0222015-10-01T15:46:31Z20112011-01-01T00:00:00ZHandlehttps://hdl.handle.net/1822/37443http://purl.org/coar/access_right/c_16ecrestricted accessData WarehousingETLGRIDParallel processingRelational algebra493123 bytesother research producthttp://purl.org/coar/resource_type/c_5794conference paperhttp://purl.org/coar/access_right/c_16ecapplication/pdffulltexthttps://prod-dspace.uminho.pt/bitstreams/b03b1a26-4b39-43b8-b6b5-4a2987ad17db/download
spellingShingle Configuring and executing ETL tasks on GRID environments - requirements and specificities
Santos, Vasco
Data Warehousing
ETL
GRID
Parallel processing
Relational algebra
status SINGLETON
subject.fl_str_mv Data Warehousing
ETL
GRID
Parallel processing
Relational algebra
title Configuring and executing ETL tasks on GRID environments - requirements and specificities
title_full Configuring and executing ETL tasks on GRID environments - requirements and specificities
title_fullStr Configuring and executing ETL tasks on GRID environments - requirements and specificities
title_full_unstemmed Configuring and executing ETL tasks on GRID environments - requirements and specificities
title_short Configuring and executing ETL tasks on GRID environments - requirements and specificities
title_sort Configuring and executing ETL tasks on GRID environments - requirements and specificities
topic Data Warehousing
ETL
GRID
Parallel processing
Relational algebra
topic_facet Data Warehousing
ETL
GRID
Parallel processing
Relational algebra
url https://hdl.handle.net/1822/37443
visible 1