Array syntax compilation and performance tuning
Doctor of Philosophy
Array syntax adds expressive power to a language by providing operations on and assignments to array sections. Thus it allows programmers to write clear and concise code. However, state-of-the-art vendor compilers fail to efficiently map array statements to underlying architectures for high performance. The inefficiency is caused by ineffectively solving the following three technical problems: (1) reducing the size of allocated temporary array; (2) extending solutions to the evolving architectures; (3) applying loop fusion to multiple array statements. Finding solutions to these problems is important because otherwise array syntax, though a high-level language feature, may not be widely used by application developers. To address the above problems, this research first develops a novel strategy that minimizes the allocated temporary arrays using loop alignment and loop skewing on scalar processors, thereby reducing memory traffic and improving cache utilization. It then extends the minimization strategy to exploit the increasing on-chip parallelism on evolving architectures that offer vector (e.g., SSE and AltiVec) and multi-core (e.g., CELL) capabilities. In addition, new techniques boost performance by improving data alignment and managing data movement, both of which are important on these new architectures. Last, this dissertation parameterizes loop fusion for performance tuning and explores the properties of the space of all possible loop fusion configurations, to expedite performance tuning of loop fusion for increasing data reuse across multiple array statements. These transformations and optimizations are implemented in a source-to-source research compiler with extensions to target short vector processors and CELL processor. Experiments show that array statements compiled with our strategy run as much as two times faster than those compiled directly by vendor compilers. Our exploration of loop fusion parameter space identifies good candidates for heuristic searching and space pruning, which are essential to make the performance tuning process practical. In summary, this dissertation demonstrates that advanced compilation techniques can significantly improve the performance of programs written in array syntax upon current state-of-the-art implementation across a variety of architectures, including the latest multi-core processors with vector capabilities.