A neighborhood hypothesis test for high dimensional object data analysis
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Object Data Analysis (ODA) is the statistical analysis of datasets of complex objects which are considered to be the atoms of Statistics. Among the various data types on which ODA is performed, this dissertation work is primarily focused on statistical inference on shapes and functional data. The major challenge of these studies is that the data typically lie in infinite or high dimensional spaces like Hilbert spaces and Hilbert Manifolds.
Well-defined classical statistical methods are no longer valid in these cases since they run into the rank deficiency problem of the variance-covariance operator. Motivated by this, Munk et al (2008) proposed a test replacing classical hypothesis test with neighborhood hypothesis test focusing on data on Hilbert Spaces. The idea of neighborhood hypothesis test is to determine whether a group of means are within a predetermined distance from each other. Ellingson et al (2013) adapted this methodology in order to address the data on Hilbert Manifolds using the extrinsic approach where the data are first embedded in the Hilbert space.
Since this pre-determined distance is difficult to specify and interpret, we present a modified test where the squared distance between the population mean and a hypothesized mean is less than a proportion of the total population variance. We derive the corresponding test statistic and study its properties both theoretically and with simulations for both cases mentioned above. In addition to extrinsic approach, we derive both original and modified neighborhood hypothesis tests for data on Hilbert Manifolds utilizing another method called Procrustes analysis.
We apply the framework designed for functional data to address an important problem arising in the field of precision medicine. Under this study, we investigate the level of agreement between two publicly available pharmacogenomic databases; Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE). Instead of focusing on a few scalar summaries, we posit this problem as a functional hypothesis testing problem by viewing the entire dose-response curves as functional data.
Direct similarity shapes of planar contours, which can be viewed as outlines of 2D objects in an image is a key example of data lies on a Hilbert Manifold. As an application, we test both the extrinsic mean and Procrustes mean calculated on such data using the modified one-sample neighborhood hypothesis tests.