Date of Completion

3-8-2020

Embargo Period

2-17-2020

Keywords

Closest Pair; Approximate Algorithm; Deterministic Algorithm; Dynamic Time Warping; Neural Network

Major Advisor

Sanguthevar Rajasekaran

Associate Advisor

Yufeng Wu

Associate Advisor

Song Han

Field of Study

Computer Science and Engineering

Degree

Doctor of Philosophy

Open Access

Open Access

Abstract

The Closest Pair problem aims to identify the closest pair (using some similarity measure, e.g., Euclidean distance, Dynamic Time Warping distance, etc.) of points in a metric space. This is one of the fundamental problems that has a wide range of applications in the data mining area, since most of the data can be represented in a vector form residing in a high dimensional space, and we would like to identify the relationship among those data points. Typical applications include but not limited to, social data analysis, user pattern identification, motif mining in biological data, data clustering, etc. This is a very classical problem and has been studied very well in the past decades.

In this thesis, we study the Closest Pair problem and its variants, and also bring the machine learning perspective to solve some closely related problems. In particular, we have proposed two approximate algorithms to efficiently address the Closest Pair of Points (CPP) problem, and one deterministic approach to solve the Closest Pair of Subsequences (CPS) problem, using Euclidean distance measure. In addition, to identify the closest subsequences in the time series data, we have proposed a learnable feature extractor embedded in an artificial neural network, to learn patterns in the scope of the Dynamic Time Warping metric. In the end, to speed up the inference speed of the proposed algorithm, we have also proposed a neural network pruning technique to obtain a smaller network with similar capacity.

All the proposed methods are shown to have achieved the state-of-the-art performance in various standard benchmark datasets.

COinS