Document details

A system’s approach to cache hierarchy-aware decomposition of data-parallel computations

Author(s): Delgado, Nuno Miguel de Brito

Date: 2014

Persistent ID: http://hdl.handle.net/10362/13014

Origin: Repositório Institucional da UNL

Subject(s): Data-parallelism; Hierarchical parallelism; Domain decomposition; Runtime systems


Description

The architecture of nowadays’ processors is very complex, comprising several computational cores and an intricate hierarchy of cache memories. The latter, in particular, differ considerably between the many processors currently available in the market, resulting in a wide variety of configurations. Application development is typically oblivious of this complexity and diversity, taking only into consideration the number of available execution cores. This oblivion prevents such applications from fully harnessing the computing power available in these architectures. This problem has been recognized by the community, which has proposed languages and models to express and tune applications according to the underlying machine’s hierarchy. These, however, lack the desired abstraction level, forcing the programmer to have deep knowledge of computer architecture and parallel programming, in order to ensure performance portability across a wide range of architectures. Realizing these limitations, the goal of this thesis is to delegate these hierarchy-aware optimizations to the runtime system. Accordingly, the programmer’s responsibilities are confined to the definition of procedures for decomposing an application’s domain, into an arbitrary number of partitions. With this, the programmer has only to reason about the application’s data representation and manipulation. We prototyped our proposal on top of a Java parallel programming framework, and evaluated it from a performance perspective, against cache neglectful domain decompositions. The results demonstrate that our optimizations deliver significant speedups against decomposition strategies based solely on the number of execution cores, without requiring the programmer to reason about the machine’s hardware. These facts allow us to conclude that it is possible to obtain performance gains by transferring hierarchyaware optimizations concerns to the runtime system.

Dissertação para obtenção do Grau de Mestre em Engenharia Informática

Document Type Master thesis
Language English
Advisor(s) Paulino, Hervé
Contributor(s) Delgado, Nuno Miguel de Brito
facebook logo  linkedin logo  twitter logo 
mendeley logo

Related documents

No related documents