Compiling reductions in data parallel programs for distributed memory multiprocessors
Master of Science
Reduction recognition and optimization are crucial techniques in parallelizing compilers. They are used to detect the recurrences in a program and transform the originally sequential code into parallel code. Because of the expensive interprocessor communication cost, reduction recognition and efficient code generation become even more important for distributed-memory multiprocessors. As part of the dHPF parallel compiler project, we developed reduction recognition and parallel code generation strategies for distributed-memory multiprocessors. Using combination of dependence analysis and pattern matching our reduction recognition technique can detect a broad range of reduction operations in complex loop or control flow structures. It recognizes reductions into scalar variable as well as reductions into one or more elements of all array. To generate efficient code, we generalize dHPF's computation partitioning algorithm to assign computations for reduction statements and apply various techniques to reduce the number of collective communications. Furthermore, we "factor" some reduction statements into a group of reduction statements to better exploit data locality. We have evaluated the effectiveness of our optimization oil an IBM Scalable PowerParallel System SP2. The results indicate that without reduction optimization, reduction operations are sequentially executed and become an execution bottleneck while with reduction support, we get good speedups and the time for reductions is almost negligible as compared to that for the non-optimized version. Compared to IBM's xlhpf compiler, dHPF can recognize a larger set of reduction operations and dHPF achieves 111% to 270% increase in efficiency for combined extreme value/location reductions. Our experiments show that factorization is an effective approach which reduces the reduction operation time by 50% or more where it is applicable.