Show simple item record

dc.contributor.authorCox, Alan
Dwarkadas, Sandhya
Zwaenepoel, Willy
dc.date.accessioned 2017-08-02T22:03:37Z
dc.date.available 2017-08-02T22:03:37Z
dc.date.issued 1997-11-17
dc.identifier.urihttps://hdl.handle.net/1911/96479
dc.description.abstract High Performance Fortran (HPF), as well as its predecessor FortranD,has attracted considerable attention as a promising language for writing portable parallel programs for a wide variety of distributed-memory architectures. Programmers express data parallelism using Fortran90 array operations and use data layout directives to direct the partitioning of the data and computation among the processors of a parallel machine. For HPF to gain acceptance as a vehicle for parallel scientific programming, it must achieve high performance on problems for which it is well suited. To achieve high performance with an HPF program on a distributed-memory parallel machine, an HPF compiler must do a superb job of translating Fortran90 data-parallel array constructs into an efficient sequence of operations that minimize the overhead associated with data movement and also maximize data locality. This dissertation presents and analyzes a set of advanced optimizations designed to improve the execution performance of HPF programs on distributed-memory architectures. Presented is a methodology for performing deep analysis ofFortran90 programs, eliminating the reliance upon pattern matching to drive the optimizations as is done in many Fortran90 compilers. The optimizations address the overhead of data movement, both interprocessor and intraprocessor movement, that results from the translation of Fortran90 array constructs. Additional optimizations address the issues of scalarizing array assignment statements, loop fusion, and data locality. The combination of these optimizations results in a compiler that is capable of optimizing dense matrix stencil computations more completely than all previous efforts in this area. This work is distinguished by advanced compile-time analysis and optimizations performed at the whole-array level as opposed to analysis and optimization performed at the loop or array-element levels.
dc.format.extent 12 pp
dc.language.iso eng
dc.rights You are granted permission for the noncommercial reproduction, distribution, display, and performance of this technical report in any format, but this permission is only for a period of forty-five (45) days from the most recent time that you verified that this technical report is still available from the Computer Science Department of Rice University under terms that include this permission. All other rights are reserved by the author(s).
dc.title An Integrated Compile-Time/Run-Time Software Distributed Shared Memory System
dc.type Technical report
dc.date.note November 17, 1997
dc.identifier.digital TR97-297
dc.type.dcmi Text
dc.identifier.citation Cox, Alan, Dwarkadas, Sandhya and Zwaenepoel, Willy. "An Integrated Compile-Time/Run-Time Software Distributed Shared Memory System." (1997) https://hdl.handle.net/1911/96479.


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record