The effects of closure-based multiple testing on the power of P-value combination tests



Journal Title

Journal ISSN

Volume Title



A multiple testing situation arises whenever several statistical inferences (tests or intervals, although we focus on the former) are considered simultaneously, and the goal is to make valid conclusions about each inference in the presence of the others. Many diverse approaches to dealing with the multiple testing issue, from both the frequentist and Bayesian perspectives, have been proposed, although in this dissertation, we restrict our attention to frequentist hypothesis testing. One increasingly popular procedure for multiple testing is known as the "closure method," or simply "closure." The method allows simultaneous conclusions to be made about individual hypotheses by guaranteeing that the probability of rejecting any true null hypothesis is no greater than α. The method works by testing all subset hypotheses formed by considering the set of all non-empty intersections of the individual hypotheses. To reject an individual hypothesis H_{i} requires rejecting all intersection hypotheses that involve H_{i} . Notably, any test that controls the Type I error rate at level α can be used for these intersection hypotheses, which makes the method quite general. In this dissertation, we consider the power properties of a class of tests known as p-value combination tests (PVCTs) when these tests are used in the closure setting. We consider three types of PVCT that use p-value information differently: additive combination (AC) methods, minimum-p-value (MINP) methods, and one "hybrid" approach, the Truncated Product Method (TPM). We find through simulation studies that the power properties for PVCTs as tests of intersection hypotheses do not carry over when these tests are used in the closure setting. Specifically, the AC and TPM tests generally have higher power than MINP methods as global tests, but much less power than MINP tests in the closure setting (however, we show that the TPM can be modified to perform similarly to the MINP tests by decreasing the truncation level τ as the number of tests increases). Underlying the poor performance of AC methods in closure is the hierarchical nature of closed testing, and we give details on how these "hurdles" cause the dramatic power losses we have observed.



Multiple testing, Closed testing, Meta-analysis, P-value combination tests, Power