Date of Completion

5-11-2013

Embargo Period

5-11-2013

Major Advisor

Professor Yufeng Wu

Associate Advisor

Professor Ion Mandoiu

Associate Advisor

Professor Jinbo Bi

Field of Study

Computer Science and Engineering

Degree

Doctor of Philosophy

Open Access

Campus Access

Abstract

Identifying genetic variants that associate to complex traits is important both for developing methodologies and for understanding complex diseases. Despite enormous efforts have expended on association studies of complex traits, common genetic variants only show moderate influence in many reported associations and consequently may have limited clinic value. This lack of success in finding genetic variants with significant effects is also called the missing heritability problem and several explanations have been suggested for this problem. One of these is that interactions of variants, rather than an individual one, show the increment of risk of the traits. Another explanation is that the spectrum of minor allelic frequencies of risk genetic variants ranges from common to rare. Existing approaches are weakened by the computational issues, e.g combinatorial explosion. In this dissertation, we focus on the computational issues of disease association and present three approaches on developing computational tools for common and rare genetic variation association studies.

First, we focus on identifying a set of interacting common genetic variants, which may associate with complex traits. In this field, logic regression (LR) is a class of the existing approaches for multi-variant high-order interaction association studies. Some LR-based approaches have successfully analyzed different datasets; however, these approaches still can be improved. We propose a new logic regression based approach, fish-swarm logic regression (FSLR), which improves logic regression process by incorporating swarm optimization. Swarm framework enhances both accuracy and efficiency by speeding up the convergence and preventing from dropping into local optimization solutions.

We then shift to the association study between rare genetic variants and complex traits. We propose a hidden Markov random field (HMRF) model, RareProb, to select a bundle of rare variants which may potentially affect a binary trait. The selected variants should show higher likelihood on the given genotypes and phenotype. Then, we apply a statistical test on this bundle. This association analysis can be achieved without pre-selection, which differs from most existing approaches. Then, for the scenario where multiple rare variants collectively influence a trait, we develop a collapse-based approach, GraphSyn. This approach collapses a subset of the given rare variants according to a series of synchronization criteria. Synchronization implies that the variants in a synchronization structure show genetic similarities and have higher probability of collaborating towards the trait. Due to the computational complexity, heuristic algorithms are designed to infer the synchronization structures. In conclusion, three computational approaches are designed for genetic variation association studies under different scenarios. All of these approaches are applied on real screening datasets and compared to different existing approaches on simulation experiments. Our approaches have higher statistical powers, lower type I or type II error rates, identify more preset causal sites and are faster.

For future work, we are working on extending RareProb into a unified approach, and applying synchronization analysis on identifying the fusion driver genes in cancers.

COinS