Date of Completion

7-20-2018

Embargo Period

7-20-2018

Keywords

Big Data; Experimental Design; Sampling Strategy; Statistical Inference; Sample Size; Asymptotic Distribution; Nonparametric; Sequential Estimation; Linear Regression

Major Advisor

Dr. Nitis Mukhopadhyay

Associate Advisor

Dr. Dipak K. Dey

Associate Advisor

Dr. Lynn Kuo

Field of Study

Statistics

Degree

Doctor of Philosophy

Open Access

Campus Access

Abstract

This dissertation provides some new perspectives on sequential experimental designs for statistical inference in the context of big data. In sequential analysis, an experimenter gathers information regarding an unknown parameter by observing random samples in successive steps. The total number of observations collected at termination is a random variable often referred to as the stopping variable, the stopping time, or the final sample size. We begin by presenting a broad framework for obtaining the asymptotic distributions of stopping times in a wide range of sequential estimation problems for populations with known distributions as well as distribution-free (nonparametric) populations. Next, we introduce our proposed finely-tuned parallel piecewise sequential procedure for estimating the mean of a normal population with unknown variance, with which the asymptotic unbiasedness of the stopping variable can be achieved along with the added operational convenience and efficiency as a result of the parallel processing, or distributed computing in modern terms. We then extend this idea to sequential estimation problems for the regression coefficients in a linear model. In each problem, we let theory and methodology go hand-in-hand followed by illustrations from large-scale data analyses based on simulated data and/or real data from various areas of applications. Finally, we conclude by discussing some interesting directions for substantial research in the future.

COinS