Date of Completion

8-12-2015

Embargo Period

8-12-2015

Keywords

Time Series, Multivariate Counts, Compositional Data, HDLM, Level Correlated Model

Major Advisor

Nalini Ravishanker

Associate Advisor

Xiaojing Wang

Associate Advisor

John N. Ivan

Field of Study

Statistics

Degree

Doctor of Philosophy

Open Access

Open Access

Abstract

An adequate statistical methodology is required for modeling multivariate time series of counts. The proper specification of the underlying distribution in such modeling could be very challenging, as it should account for the possibility of overdispersion, an excessive number of zero values, positive and negative association between counts, etc.

This dissertation is focused on modeling multivariate time series of counts as a function of location-specific and time-dependent covariates. The Bayesian framework for estimation and prediction is discussed. We focus on Markov chain Monte Carlo (MCMC) methods for fully Bayesian inference and the Integrated Nested Laplace Approximation (INLA) for fast implementation of approximate Bayesian modeling which is especially useful for large data sets.

The dissertation has three main contributions. First, we propose a dynamic model that combines time series compositional modeling with dynamic modeling for counts. This approach is applied to the problem of transportation engineering. We investigate the temporal behavior of injury severity levels as proportions of all pedestrian crashes in each month, taking into consideration effects of time trend, seasonal variations and VMT (vehicle miles traveled).

Second, this dissertation discusses a hierarchical multivariate dynamic modeling framework. The use of a multivariate Poisson (MVP) sampling distribution is discussed. We show that the use of such distribution enables us to model the association between components of the multivariate response vector over time. This approach is illustrated using data from ecology on gastropod abundance in Puerto Rico.

Finally, we propose a level correlated model (LCM) to account for the association among the components of the response vector. This multivariate model accounts for overdispersion as well as for positive and negative association between counts. The flexible LCM framework allows us to combine different marginal count distributions and to build a dynamic model for the vector time series of counts. We comprehensively discuss the lower and upper limits for the association between the components of the response vector of counts. We employ the proposed modeling to ecology and marketing examples and discuss the results.

COinS