Code Transformations to Improve Memory Parallelism

Files in this item

Files Size Format View
Pai2000May1CodeTransf.PDF 167.5Kb application/pdf Thumbnail

Show full item record

Item Metadata

Title: Code Transformations to Improve Memory Parallelism
Author: Pai, Vijay S.; Adve, Sarita V.
Type: Journal article
Keywords: compiler transformations; out-of-order issue; memory parallelism; latency tolerance; unroll-and-jam
Citation: V. S. Pai and S. V. Adve, "Code Transformations to Improve Memory Parallelism," Journal of Instruction-Level Parallelism, vol. 2, 2000.
Abstract: Current microprocessors incorporate techniques to exploit instruction-level parallelism (ILP). However, previous work has shown that these ILP techniques are less effective in removing memory stall time than CPU time, making the memory system a greater bottleneck in ILP-based systems than in previous-generation systems. These deficiencies arise largely because applications present limited opportunities for an out-of-order issue processor to overlap multiple read misses, the dominant source of memory stalls. This work proposes code transformations to increase parallelism in the memory system by overlapping multiple read misses within the same instruction window, while preserving cache locality. We present an analysis and transformation framework suitable for compiler implementation. Our simulation experiments show execution time reductions averaging 20% in a multiprocessor and 30% in a uniprocessor. A substantial part of these reductions comes from increases in memory parallelism. We see similar benefits on a Convex Exemplar.
Date Published: 2000-05-20

This item appears in the following Collection(s)

  • ECE Publications [1048 items]
    Publications by Rice University Electrical and Computer Engineering faculty and graduate students