Testing for associations in a heterogeneous population
Detection of association between two variables plays an important role in many different fields. The classical measures such as Pearson’s correlation coefficient and Kendall’s tau can only provide overall correlation between two variables. When the sample is heterogeneous in the correlation structure, they cannot be used to describe the true correlation structure. By ranking observations according to Kendall’s tau, a descending tau-path can be derived to facilitate the identification of correlated subsets, if such subsets exist. However, the current tau-path methodology can only detect correlated subsets among uncorrelated samples (H0: population is homogeneously uncorrelated); the more general scenario in which the samples are heterogeneously correlated with different non-zero correlations (H0: population is homogeneously correlated) was not addressed. In this dissertation, we propose two methods: empirical copula-based tau-path method and truncated geometric distribution based moving average maximum likelihood method to test whether the sample is drawn from a population with homogeneous correlation without assuming this correlation is zero. In addition, the methods can identify the differentially correlated subsets. We carried out extensive simulations to show that in general the proposed methods can control path-wise type I error well and have reasonable power. We applied our methods to gene expression data and global health and well-being data to illustrate the utility for uncovering heterogeneous co-expression that were missed with standard methods.
Embargo status: Restricted until September 2022. To request an access exception, click on the PDF link to the left.