A new analytical framework for missing data imputation and classification with uncertainty: Missing data imputation and heart failure readmission prediction

Date

2020

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Background The wide adoption of electronic health records (EHR) system has provided vast opportunities to advance health care services. However, the prevalence of missing values in EHR system poses a great challenge on data analysis to support clinical decision-making. The objective of this study is to develop a new methodological framework that can address the missing data challenge and provide a reliable tool to predict the hospital readmission among Heart Failure patients. Methods We used Gaussian Process Latent Variable Model (GPLVM) to impute the missing values. Specifically, a lower dimensional embedding was learned from a small complete dataset and then used to impute the missing values in the incomplete dataset. The GPLVM-based missing data imputation can provide both the mean estimate and the uncertainty associated with the mean estimate. To incorporate the uncertainty in prediction, a constrained support vector machine (cSVM) was developed to obtain robust predictions. We first sampled multiple datasets from the distributions of input uncertainty and trained a support vector machine for each dataset. Then an optimal classifier was identified by selecting the support vectors that maximize the separation margin of a newly sampled dataset and minimize the similarity with the pre-trained support vectors. Results The proposed model was derived and validated using Physionet MIMIC-III clinical database. The GPLVM imputation provided normalized mean absolute errors of 0.11 and 0.12 respectively when 20% and 30% of instances contained missing values, and the confidence bounds of the estimations captures 97% of the true values. The cSVM model provided an average Area Under Curve of 0.68, which improves the prediction accuracy by 7% as compared to some existing classifiers. Conclusions The proposed method provides accurate imputation of missing values and has a better prediction performance as compared to existing models that can only deal with deterministic inputs.

Description

Copyright: © 2020 Hu, Du. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. cc-by

Rights

Availability

Keywords

Citation

Hu, Z., & Du, D.. 2020. A new analytical framework for missing data imputation and classification with uncertainty: Missing data imputation and heart failure readmission prediction. PLoS ONE, 15(9 September). https://doi.org/10.1371/journal.pone.0237724

Collections