Evaluation of Compiler-Controlled Updating to Reduce Coherence-Miss Penalties in Shared-Memory Multiprocessors

J. Skeppstedt, F. Dahlgren, P. Stenström

Journal of Parallel and Distributed Computing, v 56, n 1, February, 1999, p122-143 (ID jpdc.1998.1510) Copyright © 1999 Academic Press

Abstract

We consider in this paper the effectiveness of a new approach called compiler-controlled updating to reduce coherence-miss penalties in shared-memory multiprocessors. A key part of the method is a compiler algorithm that identifies the last store instruction to a memory block in a flow graph using classic dataflow analysis techniques. Such stores are marked and replaced by update instructions that at run time make the memory copy clean. Whereas this static method shortens the read-miss latency for actively shared blocks, it can cause useless traffic for shared blocks that are effectively private. We therefore complement the static analysis with a dynamic simple heuristic in the cache coherence protocol aiming at classifying blocks as private or shared at run time. We evaluate the performance effects of compiler-controlled updating using six scientific parallel applications compiled by an optimizing compiler that incorporates our static analysis and then running them on a detailed CC-NUMA architectural simulation model. We have found that the compiler algorithm can convert between 83 and 100% of the dirty misses into clean misses. By adding the private/shared heuristic, the update traffic of private memory blocks can be practically eliminated. Overall, the static analysis in combination with the dynamic heuristic is shown to reduce the execution time by as much as 32%.