# Multivariate extensions of saddlepoint-based bootstrap and an empirical saddlepoint approximation method for smoothing survival functions under right-censoring

## Date

## Authors

## Journal Title

## Journal ISSN

## Volume Title

## Publisher

## Abstract

Approximations are constantly used in statistics. The saddlepoint approximation is one of the asymptotic theories that was introduced to the eld of statistics by Daniels (1954). Estimating the density or distribution of underlying probability models is an important part of statistical modeling and analysis. Because of the increasing power of computing resources, computationally intensive statistical methods are widely used in numerous applications. The saddlepoint method is a fast and accurate method to approximate the density and distribution of a statistic when exact theory does not exist. Thus, the basic result of the saddlepoint approximation for the distribution of the sample mean has been generalized to many applications in statistics. Considering the two di erent ways of extending the saddlepoint approximation, this dissertation is divided into two parts. The rst part describes the bivariate saddlepoint-based bootstrap (SPBB) method with an application to make inferences for two parameter models, and the second part describes the empirical saddlepoint approximation for incomplete data with an application to survival analysis. The bootstrap is commonly used in many applications when the classical methods are intractable. However, the re-sampling step in the bootstrap method is timeconsuming. The SPBB method has been used to make inferences for a univariate model parameter whose estimator is a unique root of a quadratic estimating equation (QEE). The SPBB method is identical to the parametric bootstrap, but with the slow re-sampling step replaced by a fast saddlepoint distributional approximation. The univariate SPBB method has been used to make inferences for parameters in time series, non-linear regression, penalized smoothing, and spatial regression models. Because of the complexity of the theory of saddlepoint approximation, many applications are limited to univariate inferences. The rst part of the dissertation is devoted to proposing the bivariate SPBB extension and its applications. In chapter one of part one, we present a bivariate SPBB method to make inferences for two parameter models. We establish that the bivariate SPBB works with the roots of QEEs of which the Jacobian of the mapping satis es a mild assumption. Under the normality assumption, these QEEs have a closed-form expression for the joint moment generating function (MGF) which is then inverted via SPBB method to produce an approximation to the bivariate cumulative distribution function (CDF) of the estimators. This approximate bivariate distribution is then pivoted to form a con dence region for parameters. The bivariate SPBB method is illustrated through two applications for making inference on linear model parameters and a semi-parametric regression of multidimensional genetic pathway model. In the same chapter, careful consideration is given to the theoretical study of the saddlepoint approximation for the bivariate distributions, and necessary corrections and clari cations for the formula established by S. Wang (1990) are provided. In chapter two of part one, we devise a method for higher-order asymptotic based inferences for the joint density of estimators in a two-dimensional parameter setting. Speci cally, we consider the multivariate saddlepoint density approximation of estimators, which are unique roots of QEEs in lagged dependent normal variables. The underlying QEEs have a closed-form expression for the joint MGF. Therefore, the joint MGF of the QEEs is inverted via the multivariate saddlepoint method to produce an accurate approximation for the joint density of the estimators. As one of the applications of this approximation, we identi ed the Yule-Walker (YW) estimators, which are unique solutions of QEEs in normal random variables, in the autoregressive process of order two AR(2). The simulation study shows that the saddlepoint density approximation for the YW estimators is much closer to the exact distribution than the bivariate normal density approximation. Apart from the classical saddlpoint approximation, the empirical saddlepoint approximation is useful when the exact expression to the MGF is intractable. According to the statistical literature, the saddlepoint approximation has so far only been used in the presence of incomplete data in the parametric context. The Kaplan-Meier (KM) estimator is a commonly used non-parametric procedure for estimating survival functions. However, KM only de nes the approximate probability of observed failure times and may not deliver a proper density function if the largest observation is right censored. Thus, it is di cult to de ne the closed form expression for the MGF based on the KM estimator. In addition, existing smoothing methods based on KM assume that the largest observation is not censored. To alleviate these issues,we devise a method for smoothing KM survival functions based on an empirical saddlepoint approximation in part two. The method inverts the MGF de ned through a Riemann-Stieltjes integral with respect to the KM approximation to the failure time CDF and exponential right-tail completion. Using tools from the theory of empirical processes, uniform consistency, weak, and strong convergence results are established for this modi ed version of the empirical MGF based on KM weights. Also, the large-sample properties of the empirical saddlepoint density approximation based on the incomplete data is established using M-estimation and multivariate delta methods. The performance of the methodology is evaluated in simulation studies, which demonstrate that the proposed empirical saddlepoint approximation method is faster and more accurate than existing methods for smoothing survival functions.