Communication Generation for Data-Parallel Languages
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/19208
Data-parallel languages allow programmers to use the familiar machine-independent programming style to develop programs for multiprocessor systems. These languages relieve users of the tedious task of inserting interprocessor communication and delegate this crucial and error-prone task to the compilers for the languages. Since remote access in hierarchical multiprocessor systems is orders of magnitude slower than access to a processor's local memory, interprocessor communication introduces significant overheads to the total execution time. The success of data-parallel languages depends heavily on the compiler's ability to reduce the communication overhead. This dissertation describes novel techniques for communication generation. It covers issues related to communication analysis, placement, and optimization. The techniques have been implemented in the Rice Fortran D95 research compilera High Performance Fortran (HPF) compiler being developed at the Rice University. A major contribution of the dissertation is the development of a data-flow analysis framework for supporting communication placement and optimization in the presence of machine-dependent resource constraints. Examples of resource constraints include in-core memory size, cache size, and the number of physical registers. Communication placement and optimizations that do not take resource constraints into account can lead to incorrect communication placement and/or performance loss. This work also describes how the data-dependence information can be combined with data-flow analysis to improve the scope of some of the well-known communication optimizations. Finally, the dissertation presents communication generation techniques for the cyclic (k) distributions supported by HPF. It presents efficient algorithms for computing the local addresses as well as for generating the communication sets.The innovative techniques described in the dissertation exploit the repetitive pattern exhibited by the cyclic(k) accesses.
Technical Report Number