Performance of synchronous parallel algorithms with regular structures
Sinclair, James B.
Doctor of Philosophy
The ability to model the execution and predict the performance of parallel algorithms is necessary if parallel computer systems are to be designed and utilized effectively. One factor which is inherent to the structure of the algorithm is the delay due to synchronization requirements when one task has to await the completion of another task. This thesis is concerned with the problem of evaluating the performance of parallel algorithms on MIMD computer systems when tasks in the algorithms have non-deterministic execution times. It addresses the following questions: How does non-determinism affect the synchronization delays and ultimately the performance of parallel algorithms? Is it possible to predict this effect independent of implementation on a specific architecture? How can such predictions be validated? The approach adopted is based on the view of a parallel algorithm as a collection of tasks with constraints on their relative order of execution. Important classes of parallel algorithms which exhibit regularity in their task graph structures are identified and the performance of these structures is predicted using results from order statistics and extreme value theory. These algorithm classes are those whose task graphs have multiphase, partitioning, and pipeline structures. The Rice Parallel Processing Testbed (RPPT), a program-driven simulation tool for the performance evaluation of parallel algorithms on parallel architectures, is used along with distribution-driven simulation to validate the results. Variations in the execution times of tasks generally increase the length of synchronization delays and adversely affect the performance of a parallel algorithm. In the case of an algorithm with a regular task graph structure it is possible to quantify the performance degradation due to non-determinism under certain assumptions about the task execution times and the number of processors available to execute the tasks. In particular, it is possible to place bounds on and approximate the mean execution time of the multiphase, partitioning, and pipeline structures. Distribution-driven simulations and results from sorting algorithms run on the RPPT indicate that these bounds and approximations, which are based on independence assumptions about task execution time distributions, are robust even in the presence of dependencies among task execution times. (Abstract shortened with permission of author.)
Electronics; Electrical engineering; Computer science