Publicação

Configuring and executing ETL tasks on GRID environments - requirements and specificities

Ver documento

Detalhes bibliográficos
Resumo:Data Warehouses store integrated and consistent data in a subject-oriented data repository dedicated especially to support business intelligence processes. Nevertheless, in order to maintain a data warehouse up-to-date, data intensive tasks retrieve regularly specialized information from specific preselected information sources, transforming and conforming it accordingly to some specific business requirements provided by decision-makers. Such tasks, commonly named as Extract-Transform-Load (ETL) processes, have a limited time frame window to be executed over an ever increasing amount of data with extremely complex operations. The common approach to deal with the need of more computational power is the acquisition of new and more powerful hardware. This expensive approach disregards the unused computational resources available in desktop computers already present at most enterprises’ computational environments. This paper intends to define a different approach to deal with ETL processes, taking advantage of parallel processing over a GRID environment using XML data as an effective support to data storage and communication, demonstrating that GRID environments could be a real alternative for the implementation of low cost data warehouses.
Autores principais:Santos, Vasco
Outros Autores:Oliveira, Bruno; Silva, Rui; Belo, O.
Assunto:Data Warehousing ETL GRID Parallel processing Relational algebra
Ano:2011
País:Portugal
Tipo de documento:comunicação em conferência
Tipo de acesso:acesso restrito
Instituição associada:Universidade do Minho
Idioma:inglês
Origem:RepositóriUM - Universidade do Minho

Registos relacionados