Energy-efficient high performance cache architectures
Abstract (Summary)
iii
The demand for high-performance architectures and powerful battery-operated
mobile devices has accentuated the need for low-power systems. In many media and
embedded applications, the memory system can consume more than 50% of the overall
system energy, making this a ripe candidate for optimization. Also, caches play an important
role in performance by filtering a large percentage of main memory accesses that
would take long latencies. To address this increasingly important problem, this thesis
studies energy-efficient high performance cache architectures that can have a significant
impact on the overall system energy consumption and performance. This thesis makes
four contributions to this end.
The first contribution focuses on partitioning the cache resources architecturally
for energy and performance optimizations. Specifically, this thesis investigates splitting
the cache into several smaller units, each of which is a cache by itself (called subcache).
The proposed subcache architecture employs page-based placement, dynamic page remapping,
and subcache prediction policies in order to improve the memory system energy
and performance, especially for instruction accesses.
As technology scales down into deep-submicron, leakage energy is becoming a
dominant source of energy consumption. Leakage energy is generally proportional to
the number of transistors in a circuit and caches constitute a large portion of the die
transistor count. Most techniques have targeted cell leakage energy minimization; bitline
iv
leakage is critical as well. To this end, this thesis proposes a predictive precharging
scheme for minimizing bitline leakage as its second contribution.
Many of the recently proposed techniques to reduce power consumption in caches
introduce an additional level of non-determinism in cache access latency. Due to this
additional latency, instructions dependent on a non-deterministic load and speculatively
issued must be re-executed as they will not have the correct data in time. This penalty
can potentially offset the claimed power benefits of using such low-power caches. To
address this problem, this thesis proposes an early cache set resolution scheme as its
third contribution.
Increasing clock frequencies and issue rates aggravate the memory latency problem,
imposing higher memory bandwidth requirements. While caches can be multiported
for providing high memory bandwidth, increase in their access latency with more
ports limits their potential. This thesis proposes a novel temporal cache architecture, as
the fourth contribution, for improving performance and reducing energy consumption by
satisfying a large percentage of loads from a small power-efficient temporal cache early
in the pipeline.
Bibliographical Information:
Advisor:
School:Pennsylvania State University
School Location:USA - Pennsylvania
Source Type:Master's Thesis
Keywords:
ISBN:
Date of Publication: