Date of Completion

3-30-2018

Embargo Period

3-30-2018

Keywords

machine learning, pathway, enhancer, broad domains, integration, chromatin interactions

Major Advisor

Dong-Guk Shin

Associate Advisor

Duygu Ucar

Associate Advisor

Mukul Bansal

Associate Advisor

Ion Mandoiu

Associate Advisor

Yufeng Wu

Field of Study

Computer Science and Engineering

Degree

Doctor of Philosophy

Open Access

Open Access

Abstract

With the increase in diverse genome profiling technologies and publicly available ontology databases ranging from open chromatin profiles to the 3D structure of the genome, it is imperative to build novel computational methods that take full advantage of these diverse datasets to uncover the regulatory mechanisms behind cellular functions. Integrating these datasets offers the opportunity to identify regulatory elements (i.e., promoter, enhancers, etc.) and interactions critical for cell-type-specific functions. Here, the goal’s two fold: 1) inference of regulatory interactions and networks from 3D chromatin interaction datasets and 2) inference of cell-specific and non-specific regulatory elements such as enhancers (regulatory elements that target gene promoters and regulate their expression).

To address the first goal, two software tools were developed: (1) a web-accessible application: Querying and visualizing chromatin Interaction Network (QuIN) and (2) a pathway analysis prioritization tool: Triangulation of Perturbation Origins and Identification of Non-Coding Targets (TriPOINT). QuIN enables users to easily mine chromatin interaction datasets and integrate them with other sources such as SNPs and epigenetic marks to ultimately build networks to query and visualize them in downstream analyses and to prioritize genomic loci (i.e., disease-causing variants). Similarly, TriPOINT uses pathways in conjunction with chromatin interaction networks to identify perturbed genes in treatment vs. control cases, implementing pathway topology based approaches for identifying inconsistencies in pathways and incorporating the capabilities of QuIN to integrate non-coding regulators targeting genes in these pathways through chromatin interaction data. The second goal was achieved using two approaches. First, features obtained from network mining were trained on support vector machines to assess the predictive power in identifying cell-type-specific promoters (broad domains) and enhancers (super enhancers) from chromatin interaction networks. Network signatures were mined in three cell lines (MCF-7, K562, and GM12878) using QuIN across multiple chromatin interaction assays (ChIA-PET, Hi-C, and HiChIP) and it was discovered that network related features could effectively discriminate typical promoters and enhancers from cell-type-specific ones. Second, features from Assay for Transposase Accessible Chromatin (ATAC-seq) were profiled to identify enhancers from accessible chromatin in neural network models. Models were highly predictive of enhancers; useful for individual specific and clinical sample settings.

COinS