An investigation of machine learning algorithms and data augmentation techniques for diabetes diagnosis using class imbalanced BRFSS dataset

dc.creatorChowdhury, Mohammad Mihrab (TTU)
dc.creatorAyon, Ragib Shahariar
dc.creatorHossain, Md Sakhawat (TTU)
dc.date.accessioned2024-02-21T21:21:52Z
dc.date.available2024-02-21T21:21:52Z
dc.date.issued2024
dc.description© 2023 The Author(s) cc-by-nc-nd
dc.description.abstractDiabetes is a prevalent chronic condition that poses significant challenges to early diagnosis and identifying at-risk individuals. Machine learning plays a crucial role in diabetes detection by leveraging its ability to process large volumes of data and identify complex patterns. However, imbalanced data, where the number of diabetic cases is substantially smaller than non-diabetic cases, complicates the identification of individuals with diabetes using machine learning algorithms. This study focuses on predicting whether a person is at risk of diabetes, considering the individual's health and socio-economic conditions while mitigating the challenges posed by imbalanced data. We employ several data augmentation techniques, such as oversampling (Synthetic Minority Over Sampling for Nominal Data, i.e.SMOTE-N), undersampling (Edited Nearest Neighbor, i.e. ENN), and hybrid sampling techniques (SMOTE-Tomek and SMOTE-ENN) on training data before applying machine learning algorithms to minimize the impact of imbalanced data. Our study sheds light on the significance of carefully utilizing data augmentation techniques without any data leakage to enhance the effectiveness of machine learning algorithms. Moreover, it offers a complete machine learning structure for healthcare practitioners, from data obtaining to machine learning prediction, enabling them to make informed decisions.
dc.identifier.citationChowdhury, M.M., Ayon, R.S., & Hossain, M.S.. 2024. An investigation of machine learning algorithms and data augmentation techniques for diabetes diagnosis using class imbalanced BRFSS dataset. Healthcare Analytics, 5. https://doi.org/10.1016/j.health.2023.100297
dc.identifier.urihttps://doi.org/10.1016/j.health.2023.100297
dc.identifier.urihttps://hdl.handle.net/2346/97594
dc.language.isoeng
dc.subjectBehavioral Risk Factor Surveillance System (BRFSS)
dc.subjectClassification
dc.subjectDiabetes
dc.subjectImbalanced data
dc.subjectMachine learning
dc.subjectNominal data
dc.subjectSampling
dc.titleAn investigation of machine learning algorithms and data augmentation techniques for diabetes diagnosis using class imbalanced BRFSS dataset
dc.typeArticle

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
chowdhury_article.pdf
Size:
1.63 MB
Format:
Adobe Portable Document Format
Description:
Main article with TTU Libraries cover page

Collections