Realizing Ultra Energy-efficient Hardware Systems through Inexact Computing
Palem, Krishna V.
Doctor of Philosophy
In this dissertation, novel methodologies for designing energy-efficient hardware systems that deliver just "good-enough" results are proposed by leveraging the principles of inexact computing, wherein perceptually- or statistically-acceptable accuracy degradation is permitted in exchange for substantial hardware savings. These inexact computing systems are of particular relevance today owing to the widely acknowledged limit to the exponentially improving resource-savings sustained by Moore's law driven technology scaling as well as the emergence of a large classes of workloads (in particular, embedded, multimedia and Recognition, Mining and Synthesis (RMS) applications) that could still process information usefully with unreliable or error-prone elements. This thesis proposes several inexact design methodologies to efficiently realize energy-efficient hardware systems by intentionally rendering reliable components unreliable. These inexact systems are shown to produce ``good-enough" results, judged through domain-specific quality evaluation metrics, in a wide variety of error-resilient applications, while consuming significantly less hardware resources—quantified through energy consumed, critical path delay and/or area occupied. The proposed inexact design techniques span several layers of design abstraction: voltage overscaling (overclocking) and gate sizing at the physical layer; inexact logic minimization at the logic-layer; probabilistic pruning and compensation buddies at the architectural-layer and waveform shaping at the algorithm-layer. Furthermore, a cross-layer co-design framework is presented that creates a symbiotic interaction between the techniques from different layers of abstraction to maximize the resulting energy gains for a targeted accuracy loss while overcoming the drawbacks of individual techniques; this framework uses machine-learning approaches to further enhance the cost-accuracy tradeoff gains in DSP hardware systems. The effectiveness of the proposed techniques has been validated through extensive experimental simulations and backed up by two ASIC chip fabrications—64-bit inexact arithmetic adders in 180nm(LP) and 256-point quality-tunable Fast Fourier Transform (FFT) accelerators in 65nm process technology. The utility of the proposed techniques is also shown in applications from other domains including image/multimedia codecs as well as neural network accelerators—all of which can tolerate inaccuracies to varying extents and can synthesize sufficient information even from inaccurate computations.