The effects of cache coherence on the performance of parallel PDE algorithms in multiprocessor systems
Johnson, Sandra Kay
Briggs, Faye A.
Doctor of Philosophy thesis
The advent of parallel processing systems has resulted in the potential for increased performance over traditional uniprocessor systems. However, while there has been significant advances in developing these systems, designing parallel algorithms to run on them has not kept up with the pace. Although parallel algorithms have been studied in the literature, very little has been done in studying how various architectural features effect the performance of these algorithms. This thesis presents the results of a study conducted to determine how one particular design feature of a parallel processing architecture, cache coherence maintenance, affects the performance of parallel partial differential equations' (PDE) algorithms. A high performance shared-memory multiprocessor architecture with private caches and a single bus or full crossbar interconnection network is assumed. The performance degradation as a result of using a directory based cache coherence protocol is evaluated on specific implementations of three synchronous parallel PDE algorithms (Jacobi's algorithm, red-black successive over-relaxation or SOR and the preconditioned conjugate gradient algorithm or PCG). A trace driven cache simulator determines this degradation. The trace is obtained by symbolically executing the algorithm on the multiprocessor system. Parameters derived to evaluate the performance degradation are used as input into an execution time model used to calculate the time needed to execute one iteration of each algorithm. This facilitates parallel algorithm speedup calculations over the sequential algorithm as well as over the parallel algorithm without cache coherence. The results show that implementing cache coherence degrades the overall performance of the parallel PDE algorithms considered by 10 to 30 percent. Various cache design features such as the cache blocksize, the mapping function and the cache size and the algorithm design feature considered, the PDE grid decomposition strategy, have no appreciable effects on the algorithm performance degradation. In fact, the major factors affecting this degradation are the cache miss ratio, the size of the PDE grid relative to the cache size and the write probability of the parallel algorithm. Finally, for the coherence protocol used in this study, SOR has the best speedup performance, followed by Jacobi's algorithm and PCG, respectively.
Computer Science; Engineering, Electronics and Electrical; Mathematics