Date of Completion


Embargo Period



Phylogeny, Protein Domain, Reconciliation, Integer Linear Programming, Dynamic Algorithm

Major Advisor

Mukul S. Bansal

Associate Advisor

Ion Mandoiu

Associate Advisor

Yufeng Wu

Field of Study

Computer Science and Engineering


Doctor of Philosophy

Open Access

Open Access


Genes, as functional fragments of DNA sequences, evolve inside genomes through evolutionary events such as gene duplication, gene loss, and horizontal gene transfer. These evolutionary events are often assumed to affect entire genes, rather than parts of genes. However, it is well understood that a majority of genes in eukaryotes consist of multiple protein domains that can be independently lost or gained during evolution. Despite the fact that a large amount of research has been conducted on protein domains, existing research generally focuses on detecting domains and on studying the domain content of genes. Thus, the study of domain evolution itself is still in its infancy, and the relationship between domain level, gene level, and species level evolution has not been sufficiently explored. Phylogenetic reconciliation is a powerful technique for inferring gene family evolution and is used to study the evolution of gene families inside species trees. In this dissertation, we develop an expanded reconciliation model that also accounts for domain evolution and explicitly captures the interdependence of domain, gene, and species level evolution. We show that the problem of finding an optimal reconciliation under this new Domain-Gene-Species (DGS) reconciliation model is NP-hard, and devise an effective heuristic algorithm as well as an ILP-based exact algorithm for the problem. Both algorithms are tested on a genome-wide data set containing thousands of domain families and gene families from 12 fly species. We also present an extended version of DGS model which reconciles multiple domain trees, multiple gene trees and a species tree simultaneously.