Date of Completion
Phylogeny, Protein Domain, Reconciliation, Integer Linear Programming, Dynamic Algorithm
Mukul S. Bansal
Field of Study
Computer Science and Engineering
Doctor of Philosophy
Genes, as functional fragments of DNA sequences, evolve inside genomes through evolutionary events such as gene duplication, gene loss, and horizontal gene transfer. These evolutionary events are often assumed to affect entire genes, rather than parts of genes. However, it is well understood that a majority of genes in eukaryotes consist of multiple protein domains that can be independently lost or gained during evolution. Despite the fact that a large amount of research has been conducted on protein domains, existing research generally focuses on detecting domains and on studying the domain content of genes. Thus, the study of domain evolution itself is still in its infancy, and the relationship between domain level, gene level, and species level evolution has not been sufficiently explored. Phylogenetic reconciliation is a powerful technique for inferring gene family evolution and is used to study the evolution of gene families inside species trees. In this dissertation, we develop an expanded reconciliation model that also accounts for domain evolution and explicitly captures the interdependence of domain, gene, and species level evolution. We show that the problem of finding an optimal reconciliation under this new Domain-Gene-Species (DGS) reconciliation model is NP-hard, and devise an effective heuristic algorithm as well as an ILP-based exact algorithm for the problem. Both algorithms are tested on a genome-wide data set containing thousands of domain families and gene families from 12 fly species. We also present an extended version of DGS model which reconciles multiple domain trees, multiple gene trees and a species tree simultaneously.
Li, Lei, "An Integrated Framework for Domain, Gene and Species Reconciliation" (2019). Doctoral Dissertations. 2093.