Show simple item record

dc.contributor.authorZhang, Kai
dc.date.accessioned 2017-08-02T22:02:47Z
dc.date.available 2017-08-02T22:02:47Z
dc.date.issued 2000-04-03
dc.identifier.urihttps://hdl.handle.net/1911/96274
dc.description.abstract In this thesis, we explore the use of software distributed shared memory (SDSM) as a target communication layer for parallelizing compilers. ForSDSM to be effective for this purpose it must efficiently support both regular and irregular communication patterns. Previous studies have demonstrated techniques that enable SDSM to achieve performance that is competitive with hand-coded message passing for irregular applications. Here, we explore how to effectively exploit compiler-derived knowledge of sharing and communication patterns for regular access patterns to improve their performance on SDSM systems. We introduce two novel optimization techniques: compiler-restricted consistency which reduces the cost of false sharing, and compiler-managed communication buffers which, when used together with compiler-restricted consistency, reduce the cost of fragmentation. We focus on regular applications with wavefront computation and tightly-coupled sharing due to carried data dependence. Previous studies of regular applications all focus on loosely-coupled parallelism for which it is easier to achieve good performance. We describe point-to-point synchronization primitives we have developed that facilitate the parallelization of this type of applications on SDSM. Along with other types of compiler-assisted SDSM optimizations such as compiler-controlled eager update, our integrated compiler and run-time support provides speedups for wavefront computations on SDSM that rival those achieved previously only for loosely synchronous style applications. For example, we achieve a speed up of 11 out of 16 for SOR benchmark—a tightly-coupled computation based on wavefront, of a problem size of 4Kx4K. which compares favorably with the 14 out of 16 speed up which we obtain for Red Black SOR—a loosely-coupled computation, of the same problem size under the same hardware and software environment. With the NAS-BT application benchmark using the Class A problem size, we achieved an impressive boost of speedup, from 4 out of 16, to 10 out of 16, on SDSM as a result of the compiler and runtime optimizations we described here.
dc.format.extent 80 pp
dc.language.iso eng
dc.rights You are granted permission for the noncommercial reproduction, distribution, display, and performance of this technical report in any format, but this permission is only for a period of forty-five (45) days from the most recent time that you verified that this technical report is still available from the Computer Science Department of Rice University under terms that include this permission. All other rights are reserved by the author(s).
dc.title Compiling for Software Distributed-Shared Memory Systems
dc.type Technical report
dc.date.note April 3, 2000
dc.identifier.digital TR00-356
dc.type.dcmi Text
dc.identifier.citation Zhang, Kai. "Compiling for Software Distributed-Shared Memory Systems." (2000) https://hdl.handle.net/1911/96274.


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record