Author(s): Carreira, Paulo J.F.
Date: 2008
Persistent ID: http://hdl.handle.net/10451/14295
Origin: Repositório da Universidade de Lisboa
Subject(s): Relational Algebra; Data Transformation; Data Integration; Data Cleaning; Data Warehousing
Author(s): Carreira, Paulo J.F.
Date: 2008
Persistent ID: http://hdl.handle.net/10451/14295
Origin: Repositório da Universidade de Lisboa
Subject(s): Relational Algebra; Data Transformation; Data Integration; Data Cleaning; Data Warehousing
Data transformations are fundamental operations in legacy data migration, data integration, data cleaning, and data warehousing. These operations are often implemented as relational queries that aim at leveraging the optimization capabilities of most DBMSs. However, relational query languages like SQL are not expressive enough to specify one-to-many data transformations, an important class of data transformations that produce several output tuples for a single input tuple. These transformations are required for solving several types of data heterogeneities, like those that occur when the source data represents aggregations of the target data. This thesis proposes a new relational operator, named data mapper, as an extension to the relational algebra to address one-to-many data transformations and focus on its optimization. It also provides algebraic rewriting rules and execution algorithms for the logical and physical optimization, respectively. As a result, queries may be expressed as a combination of standard relational operators and mappers. The proposed optimizations have been experimentally validated and the key factors that influence the obtained performance gains identified.