Date of Completion

5-7-2015

Embargo Period

5-6-2015

Keywords

computational biology, bioinformatics, scaffolding, genome assembly, biomarker selection, deconvolution

Major Advisor

Ion Mandoiu

Associate Advisor

Craig Nelson

Associate Advisor

Yufeng Wu

Associate Advisor

Sanguthevar Rajasekaran

Associate Advisor

Alexander Zelikovsky

Field of Study

Computer Science and Engineering

Degree

Doctor of Philosophy

Open Access

Open Access

Abstract

The problem of interpreting biological data is often cast into a mathematical optimization framework where a large body of existing computational theory and practical techniques can be leveraged. While this strategy has been particularly successful in the bioinformatics domain, the massive datasets generated by high-throughput genomic technologies are challenging the scalability of even the most advanced mathematical optimization algorithms. Indeed, as the cost per base of of DNA sequencing has dropped precipitously, even outpacing Moore's law, the size of many bioinformatics problems has grown beyond the limit of existing methods, necessitating new algorithms. This effect is felt even more acutely in the burgeoning field of single cell biology where advances in microfluidics has rapidly increased the ability of bench biologists to capture and sequence the genomes and transcriptomes of hundreds of cells per experiment.

This dissertation presents novel computational method for answering three distinct biological questions: genome scaffolding, biomarker selection, and computational deconvolution of gene expression data from heterogeneous samples assisted by single-cell expression data. Each method strives to balance computational efficiency with the biological relevance of computed solutions.

COinS