A simple mechanism for detecting ineffectual instructions in slipstream processors
Abstract (Summary)
KOPPANALIL, JINSON JOSEPH
A Simple Mechanism for Detecting Ineffectual Instructions in
Slipstream Processors
(Under the direction of Dr. Eric Rotenberg)
The slipstream paradigm harnesses multiple processing elements in a chip
multiprocessor (CMP) to speed up a single, sequential program. It does this by running
two redundant copies of the program, one slightly ahead of the other. The leading
program is the Advanced Stream (A-stream) and the trailing program is the Redundant
Stream (R-stream). Predicted non-essential computation is speculatively removed from
the A-stream. The A-stream is sped up because it fetches and executes fewer instructions
than the original program. The trailing R-stream checks the control flow and data flow
outcomes of the A-stream, and redirects it when it fails to make correct forward progress.
The R-stream also exploits the A-stream outcomes as accurate branch and value
predictions. Therefore, although the R-stream retires the same number of instructions as
the original program, it fetches and executes much more efficiently. As a result, both
program copies finish sooner than the original program.
A slipstream component called the instruction-removal detector (IR-detector)
detects past-ineffectual instructions in the R-stream and selects them for possible removal
from the A-stream in the future. The IR-detector uses a two-step selection process. First,
it selects key trigger instructions -- unreferenced writes, non-modifying writes, and
correctly-predicted branches. A table similar to a conventional register rename table can
easily detect unreferenced and non-modifying writes. The second step, called backpropagation,
selects computation chains feeding the trigger instructions. In an explicit
implementation of back-propagation, retired R-stream instructions are buffered and
consumer instructions are connected to their producer instructions using a configurable
interconnection network. Consumers that are selected because they are ineffectual use
these connections to propagate their ineffectual status to their producers, so that they get
selected, too.
Explicit back-propagation is complex because it requires a configurable
interconnection network. This thesis proposes a simpler implementation of backpropagation,
called implicit back-propagation. The key idea is to logically monitor the A-
stream instead of the R-stream. Now, the IR-detector only performs the first step, i.e., it
selects unreferenced writes, non-modifying writes, and correctly-predicted branches.
After building up confidence, these trigger instructions are removed from the A-stream.
Once removed, their producers become unreferenced writes in the A-stream (because
they no longer have consumers). After building up confidence, the freshly exposed
unreferenced writes are also removed, exposing additional unreferenced writes. This
process continues iteratively, until eventually entire non-essential dependence chains are
removed.
By logically monitoring the A-stream, back-propagation is reduced to detecting
unreferenced writes. Implicit back-propagation eliminates complex hardware and
performs within 0.5% of explicit back-propagation.
Bibliographical Information:
Advisor:
School:North Carolina State University
School Location:USA - North Carolina
Source Type:Master's Thesis
Keywords:north carolina state university
ISBN:
Date of Publication: