The dynamic speculation and performance prediction of parallel loops

by Zier, David A.

Abstract (Summary)
General purpose computer systems have seen increased performance potential through the parallel processing capabilities of multicore processors. Yet this potential performance can only be attained through parallel applications, thus forcing software developers to rethink how everyday applications are designed. The most readily form of Thread Level Parallelism (TLP) within any program are from loops. Unfortunately, the majority of loops cannot be easily multithreaded due to inter-iteration dependencies, conditional statements, nested functions, and dynamic memory allocation. This dissertation seeks to understand the fundamental characteristics and relationships of loops in order to assist programmers and compilers in exploiting TLP.

First, this dissertation explores a hardware solution that exploits (TLP) through Dynamic Speculative Multithreading (D-SpMT), which can extract multiple threads from a sequential program without compiler support or instruction set extensions. This dissertation presents Cascadia, a D-SpMT multicore architecture that provides multi-grain thread-level support. Cascadia applies a unique sustainable IPC (sIPC) metric on a comprehensive loop tree to select the best performing nested loop level to multithread. Results showed that Cascadia can extract large amounts of TLP, but ultimately, only yielded moderate performance gains. The lack of overall performance gains exhibited by Cascadia were due to the sequential nature of applications, rather than Cascadia's ability to perform D-SpMT.

In order to fully exploit TLP through loops, some loop level analysis and transformation must first be performed. Therefore, second contribution of this dissertation is the development of several theoretical methodologies to aid programmers and auto-tuners in parallelizing loops. This work found that the inter-iteration dependencies have a two-fold effect on the loop's parallel performance. First, the performance is primarily affected by a single, dominant dependency, and it is the execution of the dominant dependency path that directly determines the parallel performance of the loop. Any additional dependencies cause a secondary effect that may increase the execution time due to relative dependency path differences. Furthermore, this study analyzes the effects of non-ideal conditions, such as a limited number of processors, multithreading overhead, and irregular loop structures.

Bibliographical Information:

Advisor:Lee, Ben; Bose, Bella; Nguyen, Thinh; Metoyer, Ronald A.; Levien, Keith L.

School:Oregon State University

School Location:USA - Oregon

Source Type:Master's Thesis

Keywords:computer architecture parallel programming speculative multithreading loop level parallelism thread multi grain threading multicore processors dependency analysis simultaneous science


Date of Publication:05/01/2009

© 2009 All Rights Reserved.