Show simple item record

dc.contributor.authorYi, Qing
dc.date.accessioned 2017-08-02T22:02:57Z
dc.date.available 2017-08-02T22:02:57Z
dc.date.issued 2002-04-01
dc.identifier.urihttps://hdl.handle.net/1911/96302
dc.descriptionThis work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/18153
dc.description.abstract Over the past 20 years, increases in processor speed have dramatically outstripped performance increases for standard memory chips. To bridge this gap, compilers must optimize applications so that data fetched into caches are reused before being displaced. Existing compiler techniques can efficiently optimize simple loop structures such as sequences of perfectly nested loops. However, on more complicated structures, existing techniques are either ineffective or require too much computation time to be practical for a commercial compiler. This thesis develops the following novel techniques to optimize complex loop structures both effectively and inexpensively for better locality. Extended dependence representation: a matrix representation that incorporates dependence relations between iterations of arbitrarily nested loops. Transitive dependence analysis algorithm: a new algorithm that improves the time complexity of existing transitive dependence analysis algorithms. Dependence hoisting: a new loop transformation technique that permits the direct fusion and interchange of arbitrarily nested loops. The transformation is inexpensive and can be incorporated into most commercial compilers. Computation slicing: a framework that systematically applies dependence hoisting to optimize arbitrary loop structures for better locality. Recursion transformation: the first compiler work that automatically trans- forms loop structures into recursive form to exploit locality simultaneously at multiple levels of the memory hierarchy. Both the computation slicing framework and recursion transformation have been implemented and applied to successfully optimize a collection of benchmarks. In particular, the slicing framework has successfully blocked four linear algebra kernels: Cholesky, QR, LU factorization without pivoting, and LU with partial pivoting. The auto-blocked versions have achieved performance improvements similar to those attained by manually blocked programs in LAPACK [7]. The automatic blocking of QR and pivoting LU is a notable achievement because these kernels include loop nests that are considered difficult — to our knowledge, few previous compiler implementations have completely automated the blocking of the loop nests in these kernels. These facts indicate that although with a cost much lower than that of existing more general transformation frameworks [34, 42, 2, 36, 49], the computation slicing framework can in practice match or exceed the effectiveness of these general frameworks.
dc.format.extent 167 pp
dc.language.iso eng
dc.rights You are granted permission for the noncommercial reproduction, distribution, display, and performance of this technical report in any format, but this permission is only for a period of forty-five (45) days from the most recent time that you verified that this technical report is still available from the Computer Science Department of Rice University under terms that include this permission. All other rights are reserved by the author(s).
dc.title Transforming Complex Loop Nests for Locality
dc.type Technical report
dc.date.note April 1, 2002
dc.identifier.digital TR02-397
dc.type.dcmi Text
dc.identifier.citation Yi, Qing. "Transforming Complex Loop Nests for Locality." (2002) https://hdl.handle.net/1911/96302.


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record