Date of Completion

8-24-2019

Embargo Period

8-9-2029

Keywords

Spatial clustering; Clustering; Single-cell; Hi-C

Major Advisor

Yuping Zhang

Associate Advisor

Ming-Hui Chen

Associate Advisor

Zhiyi Chi

Associate Advisor

Joseph Glaz

Field of Study

Statistics

Degree

Doctor of Philosophy

Open Access

Open Access

Abstract

Recent advance on biotechnologies such as the single-cell RNA sequencing technology and the Hi-C assays produces huge amount of unlabelled information and opens the door for many biomedical researches, such as transcriptional characterization of individual cells, comprehensive chromosomal conformation investigation, etc. In this thesis, we study the problem of using unsupervised methods such as clustering and scan spatial clustering to extract patterns and learn representations from single-cell RNA-seq and Hi-C data.

To tackle the heterogeneity of single-cell RNA-seq data, powerful and appropriate clustering is required to facilitate the discovery of cell types. In this dissertation research, we propose a graph-based clustering method, Linf-SClust, and another distribution-based approach, RDMM, to extract the cluster configurations in two different perspectives. The Linf-SClust is a novel tuning-free graph-based model which constructs the graph by l-infinity measure and the entropy equalizer similarity, and divides the graph via spectral clustering. Parameter tuning and determination of the number of clusters are guided by the Gap statistic, which makes Linf-SClust a fully automatic approach. Our other method, RDMM, is a regularized Dirichlet-Multinomial finite-mixture model which addresses the gene expression clustering problem in a compositional fashion. The advantages of Linf-SClust and RDMM are shown through simulations and real applications.

The Hi-C experiment enables assessment of the chromosomal structural information, including the detection of structural variations, especially translocations. In this dissertation research, we formulate the inter-chromosomal translocation detection as a problem of scan clustering in spatial point process. We then develop TranScan, a new translocation detection method via scan statistics with the control of false discovery. The real application of TranScan to Hi-C data in breast cancer research, successfully identifies previously discovered translocation events and also suggests a new putative segment translocated between nonhomologous chromosomes.

Available for download on Thursday, August 09, 2029

COinS