Date of Completion

5-2-2013

Embargo Period

5-2-2013

Keywords

case-cohort, gehan, least squares, logrank, rank based, sampling, weight

Major Advisor

Yan, Jun

Co-Major Advisor

Kang, Sangwook

Associate Advisor

Dey, Dipak

Associate Advisor

Committee Fulfilled

Field of Study

Statistics

Degree

Doctor of Philosophy

Open Access

Open Access

Abstract

In survival analysis, semiparametric accelerated failure time (AFT) models directly relate the predicted failure times to covariates and are a useful alternative to relative risk models. Recent developments in rank-based estimation and least squares estimation provide promising tools to make the AFT models more attractive in practice. In this dissertation, we propose fast and accurate inferences for AFT models with applications under various sampling schemes.

The challenge in computing the rank-based estimator comes from solving nonsmooth estimating equations. This difficulty can be overcome with an induced smoothing approach. We generalize the induced smoothing approach to incorporate weights with missing data arising from case-cohort study and stratified sampling design. Parameters are estimated with smoothed estimating equations. Variance estimators are obtained through efficient resampling methods that avoid full blown bootstrap. The estimator from the smooth weighted estimating equations are shown to be consistent and have the same asymptotic distribution as that from the nonsmooth version. An univariate failure time data from a tumor study and a clustered data from a dental study are analyzed.

The induced smoothing approach for rank-based AFT models is natural with Gehan's weight. Using the estimator from induced smoothing with Gehan's weight as an initial value, we propose an iterative procedure that works for any weight of general form. The resulting estimator has the same asymptotic properties as the nonsmooth rank-based estimator with the same weight. Real data from an adolescent stress duration study and a case-cohort study for Wilm's tumor illustrate the methods.

As for the least square estimation, we propose a generalized estimating equations (GEE) approach. The consistency of the regression coefficient estimator is robust to misspecification of working covariance and the efficiency is higher when the working covariance structure is closer to the truth. The marginal error distributions and regression coefficient are allowed be unique for each margin or partially shared across margins as needed. The resulting estimator is consistent and asymptotically normal, with variance estimated through a multiplier resampling method. Bivariate failure times data from a diabetic retinopathy study is analyzed.

All the aforementioned methods for AFT models are implemented in an R package aftgee (http://cran.r-project.org/web/packages/aftgee/index.html).

COinS