Compiler support for machine-independent parallelization of irregular problems
von Hanxleden, Reinhard
Doctor of Philosophy
Data-parallel languages, such as H scIGH P scERFORMANCE F scORTRAN or F scORTRAN D, provide a machine-independent data-parallel programming paradigm in which the applications programmer uses a dialect of a sequential language annotated with high-level data-distribution directives. Identifying parallelism in data-parallel applications typically is straightforward, but making efficient use of this parallelism for irregular applications, such as molecular dynamics or unstructured meshes, is a challenge due to the limited compile-time knowledge about data access patterns. This dissertation establishes the thesis that spatial locality of the underlying problems can be used as a basis of compiler support for parallelizing such applications. The work done for supporting this thesis and for parallelizing applications in general can be divided into three parts, which correspond to different aspects of parallelizing compilers for different architectures. Value-based mappings express the spatial locality characteristics of an application and assist the compiler in computing a distribution with both a balanced computational workload and high data access locality. The G scIVE-N-T scAKE data-flow framework is an extension of Partial Redundancy Elimination particularly well suited to advanced code-placement tasks such as communication generation. Loop flattening is a code transformation to overcome SIMD specific control flow limitations when executing nested loops with varying inner loop bounds, which are typical for irregular problems. To illustrate this thesis, the F scORTRAN 77D compiler at Rice University has been extended with value-based alignments and distributions, a communication placement mechanism based on the G scIVE-N-T scAKE data-flow framework, and general infras- tructure for handling irregular subscripts. This dissertation describes the techniques involved in these extensions and provides experimental results for various irregular applications compiled for a distributed-memory architecture.