Novel Bayesian approaches for simultaneous parameter estimation and variable selection in quantile regression models

Date

2021-05

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The quantile regression (QR) model has gained much attention in both theoretical and applied statistical research recently due to its important ability and contribution to enhancing the understanding of the comprehensive relationship between a response variable and its independent variables. There has been much effort spent on studying the quantile regression model in both the frequentist and Bayesian perspectives with prime focuses on increasing statistical inference accuracy and computational efficiency. Literature has shown that assigning priors over model space and assigning priors to all components of the regression coefficients vector are challenging problems that hinder the ability to thoroughly include quantile-specific information when investigating quantile regression. The work of Alhamzawi and Yu (2012) provided insights to overcome this obstacle to improve both the parameter estimation and variable selection procedures thanks to the capability of taking into account the quantile levels of interest.

Moreover, Bayesian variable selection in quantile regression models is of great importance because obtaining appropriate subset selection improves both the model precision and prediction accuracy. Unfortunately, existing Bayesian variable selection methods commonly face complicated computational challenges, making implementation in practice difficult. Furthermore, such approaches often compromise their performance due to the difficulty of assigning priors that are dependent on the quantile levels, as mentioned before. There has been a plethora of research proposed using such a framework, many of which can be seen as using either of the two common practices: (1) incorporating a penalized likelihood function to be minimized, and (2) employing a stochastic search variable selection (SSVS) procedure. However, it is worth noticing that neither of such approaches takes into account the quantile levels while it is apparent that ignoring the quantile level in specifying a prior can be problematic. Since extreme quantile regression should inherently have different regression coefficients compared to the median one, it is necessary to successfully integrate the quantile levels in the prior elicitation for more accurate posterior inference.

This thesis focuses on addressing the two aforementioned main issues. Firstly, the goal is to assign suitable quantile-dependent priors for the model space. Secondly, it aims to devise a computationally efficient posterior sampling scheme. These two objectives are respectively assessed and developed in two different settings of linear quantile regression with continuous response variable in Chapter 2 and binary quantile regression with non-continuous response variable in Chapter 3. In Chapter 2, the usage of the asymmetric Laplace distribution (ALD) as the likelihood function is engaged to set up the Bayesian quantile regression hierarchical modeling. This structure alternatively considers the location-scale mixture representation of the asymmetric Laplace distribution to facilitate the Bayesian posterior inference and model selection procedures. An extension of Zellner's g-prior is adapted to allow for a conditional conjugate prior that has the attractive quality of being quantile dependent. The three-stage Gibbs-Importance computational scheme is then developed to draw independent posterior samples from the intractable posterior distribution, starting first with an expectation-maximization (EM) algorithm, and then the Gibbs sampler, followed by an importance re-weighting step. The independent and identically distributed (iid) posterior samples are used for both parameter estimation and variable selection, with the latter evaluated through the calculated posterior model probabilities of inclusion. In Chapter 3, the proposed adapted g-prior is modified to include the ridge parameter to overcome regularly encountered numerical issues arising with the binary response variable. To expand the three-stage algorithm in Chapter 2, the additional step of updating the latent unobserved response variable is built in the Gibbs sampler. The posterior samples are appropriately calibrated through sampling without replacement based on importance weights in the last step.

The performance of the proposed Gibbs-Importance algorithm is proven satisfactory in both the simulation studies and real-data application settings. The results allow for further research directions, among which are the generalization and extension of the algorithm to the partially functional linear quantile regression model, to be reported elsewhere.


Embargo status: Restricted until 06/2026. To request the author grant access, click on the PDF link to the left.

Description

Keywords

Linear Quantile Regression, Binary Quantile Regression, Variable Selection, Gibbs Sampler, Importance Sampling

Citation