Compiler Support for Machine-Independent Parallelization of Irregular Problems
von Hanxleden, Reinhard
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/16894
Data-parallel languages, such as HIGH PERFORMANCE FORTRAN or FORTRAND, provide a machine-independent data-parallel programming paradigm in which the applications programmer uses a dialect of a sequential language annotated with high-level data-distribution directives. Identifying parallelism in data-parallel applications typically is straightforward, but making efficient use of this parallelism for irregular applications, such as molecular dynamics or unstructured meshes, is a challenge due to the limited compile-time knowledge about data access patterns. This dissertation establishes the thesis that spatial locality of the underlying problems can be used as a basis of compiler support for parallelizing such applications. The work done for supporting this thesis and for parallelizing applications in general can be divided into three parts, which correspond to different aspects of parallelizing compilers for different architectures. Value-based mappings express the spatial locality characteristics of an application and assist the compiler in computing a distribution with both a balanced computational workload and high data access locality. The GIVE-N-TAKE data-flow framework is an extension of Partial Redundancy Elimination particularly well suited to advanced code-placement tasks such as communication generation. Loop flattening is a code transformation to overcome SIMD specific control flow limitations when executing nested loops with varying inner loop bounds, which are typical for irregular problems. To illustrate this thesis, the FORTRAN 77D compiler at Rice University has been extended with value-based alignments and distributions, a communication placement mechanism based on the GIVE-N-TAKE data-flow framework, and general infrastructure for handling irregular subscripts. This dissertation describes the techniques involved in these extensions and provides experimental results for various irregular applications compiled for a distributed-memory architecture.
Technical Report Number