Synchronization, coherence, and consistency for high performance shared memory multiprocessing
Jump, J. Robert; Sinclair, James B.
Doctor of Philosophy
Although improved device technology has increased the performance of computer systems, fundamental hardware limitations and the need to build faster systems using existing technology have led many computer system designers to consider parallel designs with multiple computing elements. Unfortunately, the design of efficient and scalable multiprocessors has proven to be an elusive goal. This dissertation describes a hierarchical bus-based multiprocessor architecture, an adaptive cache coherence protocol, and efficient and simple synchronization support that together meet this challenge. We have also developed an execution-driven tool for the simulation of shared-memory multiprocessors, which we use to evaluate the proposed architectural enhancements. Our simulator offers substantial advantages in terms of reduced time and space overheads when compared to instruction-driven or trace-driven simulation techniques, without significant loss of accuracy. The simulator generates correctly interleaved parallel traces at run time, allowing the accurate simulation of a variety of architectural alternatives for a number of programs. Our results provide a quantitative analysis of the viability of large-scale bus-based memory hierarchies. We evaluate the effect on performance of several architectural enhancements, and discuss the tradeoffs between reducing contention and increasing latency as the number of levels in the memory hierarchy are increased. Toward this end, we have developed a cache coherence protocol for a hierarchical bus-based architecture that minimizes total communication overhead by utilizing all available (bus-provided) information. Based on our evaluation, we propose an integrated set of architectural design decisions. These include synchronization using a conditional test&set operation that eliminates excess bus traffic and contention, conditional access scheduling, where bus traffic is reduced by keeping track of pending bus accesses for every cache line, adaptive caching, where each cache line is assigned a coherence protocol based upon the expected or observed access behavior for that line, and the use of relaxed memory consistency models, where writes are aggressively buffered. We also present a new classification of memory consistency models that, in addition to unifying all existing models into a common framework, provides insight into the implications of these models with respect to access ordering.
Computer science; Electronics; Electrical engineering