THE PERFORMANCE OF PARALLEL LINEAR ALGEBRA ALGORITHMS ON AN INTERLEAVED ARRAY PROCESSOR
Doctor of Philosophy
This thesis investigates the performance of numerical algorithms that take advantage of the unique properties of a new type of array processor. This processor, called an Interleaved Array Processor (IAP), is characterized by its ability to use multiple Programmable Functional Units (PFU's) with local data and instruction memories. Unlike conventional array processors, which can only execute simple arithmetic vector operations such as addition and multiplication, the IAP can execute complex vector operations defined by the user. These operations are specified by small programs that can contain conditional branching as well as arithmetic and data movement instructions in each processor. We call these programs High-Level Vector Operations (HLVO's). We present ways to partition the algorithms and the data among the processing units in the system, in such a way that we increase the computation time in every processing unit, and at the same time we reduce the data movement on the system bus. In this way the bus can be timeshared among several functional units, allowing several operations on different vector components to be executed simultaneously and overlapped with the transfer of operands and results. Interconnecting multiple functional units in a computer system gives rise to a data transfer bottleneck which is usually referred to as the Von Neumann bottleneck. The general approach to alleviate this bottleneck is to increase the number and the bandwidth of the data transfer paths in the system. This solution, although more general, is also more expensive. The objective of the HLVO's and the data and instruction memories in the PFU's is to alleviate the Von Neumann bottleneck without the need of expensive high-speed data transfer paths. The goal of our approach is to organize the computations and the algorithms on the IAP so that the time required to perform a vector operation on one component of its vector operands is large relative to the time required to transfer operands and results between the functional units. Through a quantitative analysis of the performance of the IAP, we demonstrate that our approach is achievable for an important class of numeric algorithms, namely linear algebra algorithms.
Electronics; Electrical engineering