Date of Completion

11-4-2020

Embargo Period

11-4-2020

Keywords

Neural Network, Parallelization, Network Pruning, Model Compression

Major Advisor

Sanguthevar Rajasekaran

Associate Advisor

Jinbo Bi

Associate Advisor

Qian Yang

Field of Study

Computer Science and Engineering

Degree

Doctor of Philosophy

Open Access

Open Access

Abstract

Deep neural networks (DNNs) have achieved significant success in many applications, such as computer vision, natural language processing, robots, and self-driving cars. With the growing demand for more complex real-world applications, more complicated neural networks have been proposed. However, high capacity models result in two major problems: long training times and high inference delays, making the neural networks hard to train and infeasible to deploy for time-intensive applications or resource-limited devices. In this work, we propose multiple techniques to accelerate the training and inference speed as well as model performance

The first technique we study is model parallelization on generative adversarial networks (GANs). Multiple orthogonal generators with shared memory are employed to capture the whole data distribution space. This method can not only improve the model performance but also alleviate the mode collapse problem that is common in GANs. The second technique we investigate is the automatic network pruning. To reduce the floating-point operations (FLOPs) to a proper level without compromising accuracy, we propose a better generalized and easy-to-use pruning method, which prunes the network through optimizing a set of trainable auxiliary parameters instead of original weights. Weakly coupled gradient update rules are proposed to keep consistency with pruning tasks. The third technique is to remove the redundancy of the complicated model based on the need of applications. We treat the chemical reaction prediction as a translation problem and apply a low capacity neuron translation model to this problem. The fourth technique is to combine distillation with Differentiable Architecture Search to stabilize and improve the searching procedure. Intermediate results as well as the output logits are transferred from the teacher network to the student network. For the application of the speedup technique, we introduce neural network pruning into Materials Genomics. We propose attention based AutoPrune for the kernel pruning of a continuous filtering neural network for molecular property prediction and achieves better performance and more compact size.

COinS