Hokkaido University Collection of Scholarly and Academic Papers >
博士 （工学） >
Efficiency-Centric Hardware Accelerator for Deep Neural Network Inference
|Title: ||Efficiency-Centric Hardware Accelerator for Deep Neural Network Inference|
|Other Titles: ||深層ニューラルネットワーク向け高効率HWアクセラレータに関する研究|
|Authors: ||植吉, 晃大 Browse this author|
|Issue Date: ||25-Mar-2020|
|Abstract: ||This study discusses the efficiency-centric hardware architecture for deep neural network (DNN)inference. DNN is a mathematical model inspired by the functionality of the cortex of the brain.Recently, DNN has been devoted growing attention in many fields of artificial intelligence technology,such as image or sound recognition and natural language processing. This is because DNN can achieve high performance and accuracy in the fields. A lot of data can be managed to train DNN because of improve of processor technology. GPU is the most usable devices for DNN training. Recent GPUs achieves highly parallel processing with low cost. Therefore, became to be able to train a lot of data on the real-world devices. On the other hand, trained DNN have to be run on the restricted devices in the real world. Therefore, designing high energy efficient hardware model is required for the embedded devices.
In this study, we explore an optimal approach in the aspect of both algorithm and architectur for highly efficient DNN hardware. Here, I analyze three points, compressed DNN, architecture exploration, and optimal NN model.At first, we analyze a log-quantization and the benefit. Log-quantization is a multi-bit quantization method that utilizes a power-of-2 logarithmic format. The most important feature of logarithmic quantization (log-quantization) is that multiplier hardware is no longer required because all multiplications in the linear field are represented simply through additions in the logarithmic field. Therefore, LOGNET can potentially achieve a high level of energy efficiency. Another advantage of LOGNET is that the memory footprint and bandwidth requirements are much lower than with linear quantization, but with the same accuracy, because a log-quantization can represent the same numeric range using fewer bits. A key insight here is that most of the weight distributions generally form a Gaussian dis ribution, in which smaller values appear more frequently than larger values. Log-quantization can represent these types of non-uniform distributions with a lower amount of numerical errors as compared to linear quantization with the same bit width.
Secondly, we propose a novel DNN architecture called QUEST. QUEST is a programmable MIMD parallel accelerator for general-purpose state-of-the-art deep neural networks (DNNs). It features die-to-die stacking with three-cycle latency, 28.8 GB/s, 96 MB, and 8 SRAMs using an inductive coupling technology called the ThruChip Interface (TCI). By stacking the SRAMs instead of DRAMs, lower memory access latency and simpler hardware are expected. This facilitates in balancing the memory capacity, latency, and bandwidth, all of which are in demand by cutting-edge DNNs at a high level. QUEST also introduces log-quantized programmable bit-precision processing for achieving faster (larger) DNN computation (size) in a 3D module. It can sustain a higher recognition accuracy at a lower bit-width region compared to linear quantization. The prototype QUEST chip is integrated in the 40-nm CMOS technology, and it achieves 7.49 tera operations per second (TOPS) peak performance in binary precision, and 1.96 TOPS in 4-bit precision at 300-MHz clock.
Lastly, we propose prediction-based DNN model called Dead Neuron Prediction (DNP). In most DNN models, a large part of neurons finally results in zero (dead neuron) due to activation functions.Computations for such the dead neurons waste huge energy by unnecessary multiply-and-accumulate (MAC) operations. To skip unnecessary computations for dead neurons, we propose DNP to predict liveness of neurons in advance by employing a supportive lightweight neural network. By efficiently pipelining both computations of a main DNN and its prediction, computations for likely dead neurons are dynamically skipped. Experiment results indicate a DNN accelerator with DNP achieves a better energy efficiency than prior approaches at the same accuracy.|
|Conffering University: ||北海道大学|
|Degree Report Number: ||甲第14129号|
|Degree Level: ||博士|
|Degree Discipline: ||工学|
|Examination Committee Members: ||(主査) 教授 浅井 哲也, 教授 富田 章久, 教授 葛西 誠也（量子集積エレクトロニクス研究センター）, 教授 池辺 将之（量子集積エレクトロニクス研究センター）|
|Degree Affiliation: ||情報科学研究科（情報エレクトロニクス専攻）|
|Type: ||theses (doctoral)|
|Appears in Collections:||課程博士 (Doctorate by way of Advanced Course) > 情報科学院(Graduate School of Information Science and Technology)|
学位論文 (Theses) > 博士 （工学）