A Simple Mechanism for Detecting Ineffectual Instructions in Slipstream Processors
The slipstream paradigm harnesses multiple processing elements in a chip multiprocessor (CMP) to speed up a single, sequential program. It does this by running two redundant copies of the program, one slightly ahead of the other. The leading program is the Advanced Stream (A-stream) and the trailing program is the Redundant Stream (R-stream). Predicted non-essential computation is speculatively removed from the A-stream. The A-stream is sped up because it fetches and executes fewer instructions than the original program. The trailing R-stream checks the control flow and data flow outcomes of the A-stream, and redirects it when it fails to make correct forward progress. The R-stream also exploits the A-stream outcomes as accurate branch and value predictions. Therefore, although the R-stream retires the same number of instructions as the original program, it fetches and executes much more efficiently. As a result, both program copies finish sooner than the original program.
A slipstream component called the instruction-removal detector (IR-detector) detects past-ineffectual instructions in the R-stream and selects them for possible removal from the A-stream in the future. The IR-detector uses a two-step selection process. First, it selects key trigger instructions -- unreferenced writes, non-modifying writes, and correctly-predicted branches. A table similar to a conventional register rename table can easily detect unreferenced and non-modifying writes. The second step, called back-propagation, selects computation chains feeding the trigger instructions. In an explicit implementation of back-propagation, retired R-stream instructions are buffered and consumer instructions are connected to their producer instructions using a configurable interconnection network. Consumers that are selected because they are ineffectual use these connections to propagate their ineffectual status to their producers, so that they get selected, too.
Explicit back-propagation is complex because it requires a configurable interconnection network. This thesis proposes a simpler implementation of back-propagation, called implicit back-propagation. The key idea is to logically monitor the A-stream instead of the R-stream. Now, the IR-detector only performs the first step, i.e., it selects unreferenced writes, non-modifying writes, and correctly-predicted branches. After building up confidence, these trigger instructions are removed from the A-stream. Once removed, their producers become unreferenced writes in the A-stream (because they no longer have consumers). After building up confidence, the freshly exposed unreferenced writes are also removed, exposing additional unreferenced writes. This process continues iteratively, until eventually entire non-essential dependence chains are removed.
By logically monitoring the A-stream, back-propagation is reduced to detecting unreferenced writes. Implicit back-propagation eliminates complex hardware and performs within 0.5% of explicit back-propagation.
Advisor:Dr. Gregory T. Byrd; Dr. Eric Rotenberg; Dr. Thomas M. Conte
School Location:USA - North Carolina
Source Type:Master's Thesis
Date of Publication:05/15/2002