Low-Power Hardware Designs for Next-Generation Signal Processing and Machine Learning Applications

The need to support various signal and media processing and recognition applications on energy-constrained mobile computing devices has steadily grown. There has been a growing interest in hardware neural networks in recent years, which express many benefits over conventional software models, mainly in applications where speed, cost, reliability, or energy efficiency are of great importance. Deep Neural Networks (DNN) are, presently, the most popular application models. Such multilayered networks, characterized by many hidden layers and vast amounts of data to be trained, demand specialized, high-performance, lowpower hardware architectures. DNN training and inference are computation-intensive processes: training requires high throughput, whereas inference needs a low latency.

In the last few years, FPGAs and GPUs vendors engaged in a race to offer the best hardware platform that runs computationally intensive algorithms quickly and efficiently. These algorithms' standard hardware implementations require many resource-, power- and timeconsuming arithmetic operations (mainly multiplication). Hence the goal is to reduce the size and power consumption of internal arithmetic units. In particular, for large DNNs to run in real-time on resource-constrained systems, it is crucial to simplify/approximate tensor cores since they are usually responsible for the significant area, power and latency costs. One option to achieve this goal is to replace the complex exact arithmetic circuits with simpler, approximate ones. Approximate computing forms a design alternative that exploits the intrinsic error resilience of various applications and produces energy-efficient circuits with a small accuracy loss.

In this course, we will study the importance of low-power hardware designs, evaluate the accuracy of media processing algorithms and DNNs based on approximate computing, evaluate power reduction in approximate circuits and investigate training-time methodologies to compensate for the reduction in accuracy. During the course, the students will implement various circuits in FPGAs and evaluate them in terms of speed, area and power consumption.