Date of Completion
Big Data; Experimental Design; Sampling Strategy; Statistical Inference; Sample Size; Asymptotic Distribution; Nonparametric; Sequential Estimation; Linear Regression
Dr. Nitis Mukhopadhyay
Dr. Dipak K. Dey
Dr. Lynn Kuo
Field of Study
Doctor of Philosophy
This dissertation provides some new perspectives on sequential experimental designs for statistical inference in the context of big data. In sequential analysis, an experimenter gathers information regarding an unknown parameter by observing random samples in successive steps. The total number of observations collected at termination is a random variable often referred to as the stopping variable, the stopping time, or the final sample size. We begin by presenting a broad framework for obtaining the asymptotic distributions of stopping times in a wide range of sequential estimation problems for populations with known distributions as well as distribution-free (nonparametric) populations. Next, we introduce our proposed finely-tuned parallel piecewise sequential procedure for estimating the mean of a normal population with unknown variance, with which the asymptotic unbiasedness of the stopping variable can be achieved along with the added operational convenience and efficiency as a result of the parallel processing, or distributed computing in modern terms. We then extend this idea to sequential estimation problems for the regression coefficients in a linear model. In each problem, we let theory and methodology go hand-in-hand followed by illustrations from large-scale data analyses based on simulated data and/or real data from various areas of applications. Finally, we conclude by discussing some interesting directions for substantial research in the future.
Zhang, Chen, "Sequential Experimental Designs for Statistical Inference on Big Data Problems" (2018). Doctoral Dissertations. 1877.