Portable Techniques to Find Effective Memory Hierarchy Parameters
Cooper, Keith D.
Application performance on modern microprocessors depends heavily on performance related characteristics of the underlying architecture. To achieve the best performance, an application must be tuned to both the target-processor family and, in many cases, to the specific model, as memory-hierarchy parameters vary in important ways between models. Manual tuning is too inefficient to be practical; we need compilers that perform model-specific tuning automatically. To make such tuning practical, we need techniques that can automatically discern the critical performance parameters of a new computer system. While some of these parameters can be found in manuals, many of them cannot. To further complicate matters, compiler-based optimization should target the system’s behavior rather than its hardware limits. Effective cache capacities, in particular, can be smaller than the hardware limits for a number of reasons, such as sharing between cores or between instruction and data caches. Physical address mapping can also reduce the effective cache capacity. To address these challenges, we have developed a suite of portable tools that derive many of the effective parameters of the memory hierarchy. Our work builds on a long line of prior art that uses micro-benchmarks to analyze the memory system. We separate the design of a reference string that elicits a specific behavior from the analysis that interprets that behavior. We present a novel set of reference strings and a new robust approach to analyzing the results. We present experimental validation on a collection of 20 processors.