Evaluation of Compiler-Controlled Updating to Reduce Coherence-Miss Penalties
in Shared-Memory Multiprocessors
J. Skeppstedt, F. Dahlgren, P. Stenström
Journal of Parallel and
Distributed Computing, v 56, n 1, February, 1999, p122-143 (ID jpdc.1998.1510)
Copyright © 1999 Academic Press
Abstract
We consider in this paper the effectiveness of a new approach called compiler-controlled
updating to reduce coherence-miss penalties in shared-memory multiprocessors.
A key part of the method is a compiler algorithm that identifies the last
store instruction to a memory block in a flow graph using classic dataflow
analysis techniques. Such stores are marked and replaced by update instructions
that at run time make the memory copy clean. Whereas this static method
shortens the read-miss latency for actively shared blocks, it can cause
useless traffic for shared blocks that are effectively private. We therefore
complement the static analysis with a dynamic simple heuristic in the cache
coherence protocol aiming at classifying blocks as private or shared at
run time. We evaluate the performance effects of compiler-controlled updating
using six scientific parallel applications compiled by an optimizing compiler
that incorporates our static analysis and then running them on a detailed
CC-NUMA architectural simulation model. We have found that the compiler
algorithm can convert between 83 and 100% of the dirty misses into clean
misses. By adding the private/shared heuristic, the update traffic of private
memory blocks can be practically eliminated. Overall, the static analysis
in combination with the dynamic heuristic is shown to reduce the execution
time by as much as 32%.