#### Date of Completion

6-15-2017

#### Embargo Period

6-13-2017

#### Keywords

Longitudinal modeling; tensor modeling; regularization methods; sparse predictive modeling; regression; dual coordinate descent; distributed computing; optimization

#### Major Advisor

Jinbo Bi

#### Associate Advisor

Sanguthevar Rajasekaran

#### Associate Advisor

Jun Yan

#### Associate Advisor

Jason K. Johannesen

#### Field of Study

Computer Science and Engineering

#### Degree

Doctor of Philosophy

#### Open Access

Open Access

#### Abstract

Temporal data such as time series data and longitudinal data are pervasive across almost all human endeavors, including medicine, finance, climate, and genetics. As such, it is hardly surprising that temporal data mining has attracted significant attention and research effort. Only very recently, feature selection has drawn adequate attention in the context of longitudinal modeling. Standard statistical techniques, such as generalized estimating equations (GEE), have been modified to identify important features by imposing sparsity-inducing regularizers. However, they do not explicitly model how a dependent variable relies on features measured at proximal time points. Recent machine learning models can select features at lagged time points but ignore the temporal correlations within an individual's repeated measurements. With advances in data acquisition technologies and availability of big data, ultra-high dimensions with complex structure are present in many subjects recorded in a continuous time period, which imposes another challenge on temporal data analysis. In order to effectively model the complex data structure, huge data size, and lagged effects along time of temporal data, we propose in this thesis study several novel machine learning methods.

First, we propose an approach called Longitudinal LASSO (i.e., Least Absolute Shrinkage and Selection Operator), to automatically and simultaneously determine both the relevant features and the time points that impact the current observation of a dependent variable. Meanwhile, the proposed approach models the fact that data are not independently and identically distributed (*i.i.d*) due to the temporal correlations within an individual. This approach decomposes model parameters into a summation of two components and imposes separate block-wise LASSO penalties on each component when building a linear model in terms of *τ* repeated measurements of a set of features. One component is used to select features whereas the other is used to select temporal contingent points.

Second, we extend the first method to a new tensor-based quadratic inference function, (Tensor-QIF), which aims to select structured features along each dimension of the tensor data. Assume that the data example is a *k*-way tensor and we build a linear model with respect to the tensor, the parameters in the model naturally form another *k*-way tensor. Mathematically, we decompose the *k*-way parameter tensor into a summation of *k* sparse $k$-way tensors. These tensors each present sparsity along one direction of the parameter tensor. In order to correct for the non-*i.i.d* nature of the data, we employ QIF to estimate within-individual correlations, which brings advantages over the classic GEE methods because presumed covariance structures in GEE always mis-specify complex correlation structures.

Due to the immense growth of data, it is necessary to take advantage of modern high performance computing (HPC) systems. In other words, parallelized optimization solvers are helpful to solve the above two models with the issues of huge data size and longtime recordings for large-scaled time-related datasets. Hence, third, we propose a hybrid stochastic dual coordinate ascent (hybrid-SDCA) solver for a multi-core cluster, the most common high performance computing environment that consists of multiple computing nodes with each having multiple cores and its own shared memory. We distribute data across nodes where each node solves a local problem in an asynchronous parallel fashion on its cores, and then the local updates are aggregated via an asynchronous across-node update scheme. The proposed double asynchronous method converges to a global solution for *L*-Lipschitz continuous loss functions, and at a linear convergence rate if a smooth convex loss function is used.

#### Recommended Citation

Xu, Tingyang, "Jointly Learning Features and Temporal Contingency for Prediction in Large Scale Datasets" (2017). *Doctoral Dissertations*. 1508.

https://opencommons.uconn.edu/dissertations/1508