Energy-efficient high performance cache architectures
Abstract (Summary)iii The demand for high-performance architectures and powerful battery-operated mobile devices has accentuated the need for low-power systems. In many media and embedded applications, the memory system can consume more than 50% of the overall system energy, making this a ripe candidate for optimization. Also, caches play an important role in performance by filtering a large percentage of main memory accesses that would take long latencies. To address this increasingly important problem, this thesis studies energy-efficient high performance cache architectures that can have a significant impact on the overall system energy consumption and performance. This thesis makes four contributions to this end. The first contribution focuses on partitioning the cache resources architecturally for energy and performance optimizations. Specifically, this thesis investigates splitting the cache into several smaller units, each of which is a cache by itself (called subcache). The proposed subcache architecture employs page-based placement, dynamic page remapping, and subcache prediction policies in order to improve the memory system energy and performance, especially for instruction accesses. As technology scales down into deep-submicron, leakage energy is becoming a dominant source of energy consumption. Leakage energy is generally proportional to the number of transistors in a circuit and caches constitute a large portion of the die transistor count. Most techniques have targeted cell leakage energy minimization; bitline iv leakage is critical as well. To this end, this thesis proposes a predictive precharging scheme for minimizing bitline leakage as its second contribution. Many of the recently proposed techniques to reduce power consumption in caches introduce an additional level of non-determinism in cache access latency. Due to this additional latency, instructions dependent on a non-deterministic load and speculatively issued must be re-executed as they will not have the correct data in time. This penalty can potentially offset the claimed power benefits of using such low-power caches. To address this problem, this thesis proposes an early cache set resolution scheme as its third contribution. Increasing clock frequencies and issue rates aggravate the memory latency problem, imposing higher memory bandwidth requirements. While caches can be multiported for providing high memory bandwidth, increase in their access latency with more ports limits their potential. This thesis proposes a novel temporal cache architecture, as the fourth contribution, for improving performance and reducing energy consumption by satisfying a large percentage of loads from a small power-efficient temporal cache early in the pipeline.
School Location:USA - Pennsylvania
Source Type:Master's Thesis
Date of Publication: