CACHE MANAGEMENT BY THE COMPILER
THABIT, KHALID OMAR
Doctor of Philosophy thesis
An ideal high performance computer includes a fast processor and a multi-million byte memory of comparable speed. Since it is currently economically infeasible to have large memories with speeds matching the processor, hardware designers have included the cache. Because of its small size, and its effectiveness in eliminating the speed mismatch, the cache has become a common feature of high performance computers. Enhancing cache performance proved to be instrumental in the speed up of cache-based computers. In most cases enhancement methods could be classified as either software based, or hardware controlled. In most cases, software based improvement methods that proved to be very effective in main memory were considered to be inapplicable to the cache. A main reason has been the cache's transparency to programs, and the fast response time of main memory. This resulted in only hardware enhancement features being considered, and implemented for the cache. Developments in program optimization by the compiler were successful in improving the program's performance, and the understanding of program behavior. Coupling the information about a program's behavior with knowledge of the hardware structure became a good approach to optimization. With this premise we developed two cache management models: the prompting model, and the explicit management model. Both models rely on the underlying concepts of: prefetching, clustering (packing), and loop transformations. All three are software based enhancement methods that proved to be successful in boosting main memory performance. In analyzing these methods for possible implementation in the cache we found that optimal data packing is a hard problem. Nevertheless, we suggested various heuristic methods for effective packing. We then set forth a number of conditions for loop transformations. The aim of these transformations is to facilitate prefetching (preloading) of cache blocks during loop execution. In both models the compiler places preload requests within the program's code. These requests are serviced in parallel with program execution. Replacement decisions are determined at compile time in the explicit model, but are fully controlled by the hardware in the prompting model. In this model special tag bits are introduced to each cache block in order to facilitate replacement decisions. The handling of aggregate data elements (arrays) are also discussed in the thesis. In the explicit model a special indexing scheme is introduced for controlling array access in the cache. In addition, main memory addresses are only generated for block load requests, all other addresses are for the cache.