2023-03-302023-03-302020Lin, C., Wu, D., Liu, H., Xia, X., & Bhattarai, N.. 2020. Factor identification and prediction for teen driver crash severity using machine learning: A case study. Applied Sciences (Switzerland), 10(5). https://doi.org/10.3390/app10051675https://doi.org/10.3390/app10051675https://hdl.handle.net/2346/92114© 2020 by the authors. cc-byCrashes among young and inexperienced drives are a major safety problem in the United States, especially in an area with large rural road networks, such as West Texas. Rural roads present many unique safety concerns that are not fully explored. This study presents a complete machine leaning pipeline to find the patterns of crashes involved with teen drivers no older than 20 on rural roads inWest Texas, identify factors that affect injury levels, and build four machine learning predictive models on crash severity. The analysis indicates that the major causes of teen driver crashes in West Texas are teen drivers who failed to control speed or travel at an unsafe speed when they merged from rural roads to highways or approached intersections. They also failed to yield on the undivided roads with four or more lanes, leading to serious injuries. Road class, speed limit, and the first harmful event are the top three factors affecting crash severity. The predictive machine learning model, based on Label Encoder and XGBoost, seems the best option when considering both accuracy and computational cost. The results of this work should be useful to improve rural teen driver traffic safety inWest Texas and other rural areas with similar issues.engCrash severityMachine learningRural roadsTeen driverFactor identification and prediction for teen driver crash severity using machine learning: A case studyArticle