Simultaneous inference based on rank regression in biomedical data analysis

Date

2016-08-29

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Modern biomedical study often results in high dimension data. Gene expression study usually investigates the transcript levels of thousands of cDNA’s among different samples simultaneously. Two of the major problems besetting gene expression data analysis are violation of normality and the influence of outliers. It is known that classical parametric statistical methods are sensitive to violations of normality assumption. Rank-based non-parametric methods are distribution-free and therefore promising in gene expression analysis. Since usually a very large number of hypothesis tests are carried out simultaneously, a serious concern is to control the family-wise error rate (FWER). Classical least square method often commits exceedingly large number of Type I errors in gene expression analysis. We developed simultaneous Aligned Rank Transformation testing procedure to detect differentially expressed genes; this method controls family wise error rate and false discovery rate at a desired level and it is more powerful than the popular statistical methods in identifying differentially expressed genes. In order to obtain the magnitude of differential gene expressions, we proposed simultaneous confidence intervals based on rank estimates; the distribution of the pivotal quantity based on rank estimates is investigated and it has thicker tails than multivariate-normal even multivariate-t distributions; therefore, we developed bootstrap resampling technique such that the coverage probability is controlled at nominal level. A real microarray example is included to illustrate the application of the proposed methods. A Monte-Carlo simulation is carried out in order to study the performance of the simultaneous rank regression methods.

Description

Keywords

Aligned rank transformation, Multiple comparisons, Bootstrap, Gene expression, Rank estimates

Citation