Preparation and characterization of a multivariate dataset for statistical analysis of fixed object and non-domestic animal collisions in Washington state

Date

2021-05

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Researchers have attempted to identify unobserved causes of fixed Object and non-domestic animal (NDA) collisions that lead to injuries and fatalities. Improper identification of the variables associated with these collisions often leads to biased conclusions on safety interventions, misleading policies and failed road mitigation implementation projects. In many instances, the reason for inaccurate conclusions is the lack of accounting of unobserved heterogeneity. Unobserved heterogeneity can occur due to incomplete data on vehicle involvement in crashes, human factors, environmental factors and infrastructure condition factors. This thesis uses crash data from the Washington State Department of Transportation (WSDOT) to analyze the fixed object-NDA collisions. I will describe the process of fusing WSDOT crash data with other sources of data on vehicle style and size in order to produce a usable dataset for statistical analysis of crash severity. In particular, I will discuss attrition of data and the impact of data attrition on resulting sample sizes and data usability from the standpoint of meaningful variables for the analysis of crash severity. Data attrition can occur when the analysis of unobserved heterogeneity is undertaken for severity prediction. To understand unobserved heterogeneity and its effects on crash severity, the usable crash data dimensions are likely to become high, i.e., rectangular with numerous columns of variables. This is because unobserved heterogeneity can arise from multiple sources, such as human factors, environmental factors, roadway factors, driving environment factors, and vehicle factors. Not all pertinent factors can be captured for comprehensive analysis of crash severity. However, it is advisable to identify as many factors as one can so that omitted variable biases and resulting heterogeneity can be minimized. This can help isolate true unobserved heterogeneity in crash severity analysis. Unobserved heterogeneity results in parameter effects varying across the crash involved driving population, which is of interest to severity modelers, design policy makers, and decision makers who prioritize safety investment decisions. To begin a proper analysis of unobserved heterogeneity effects, a descriptive analysis of the usable dataset is necessary. A descriptive analysis involves the consideration of sample size, the skewness of the distribution, the range of the variable value and the category of the variable (human factors versus other factors for example). Past research has indicated that indicator variables (variables with value of 1 or 0) tend to be influential in crash severity analysis. This is because in crash severity analysis, the probability of a severity type (no apparent injury, possible injury, evident injury, severe injury and fatality) is the outcome being estimated, given that a crash has occurred. Therefore, measurements are specific to the crash. When variables specific to a crash are being used in statistical analysis, their presence or absence influences the outcome. Therefore, the nature of the majority of crash specific variables is binary. A minority of variables can be non-binary, such as the number of occupants involved in the crash, number of vehicles involved in a crash, vehicle speed (as in posted speed limit in the neighborhood of the crash), driver age, etc. Collision severity levels relating to the vehicular style, and size are used from data gathered from the Insurance Institute for Highway Safety (IIHS) website. This information was used to calculate the driver overall death rate average relating to the vehicle categories, the death rate mean, maximum, minimum, and standard deviation would be calculated. Similarly, the death rate mean, maximum, minimum, and standard deviation would be calculated for vehicle style. The goal of this thesis is to analyze the WSDOT data in relationship with the IIHS to determine the statistically significant variables which could emerge as unobserved heterogeneity variables in crashes and level of severity which can also be analyzed further using applicable statistical models. In this thesis I explore the viability of binary and non-binary variables in terms of their effective sample sizes for usability statistical analysis. In the process of making this assessment, attrition issues are discussed. An original contribution of this thesis is therefore the fused outcome of multiple types of data – IIHS death rate data by vehicle style and size, and crash specific data obtained from WSDOT crash datasets for fixed object and NDA collisions. To the best of the author’s knowledge, the fusion of IIHS data for use in crash severity analysis and the assessment of unobserved heterogeneity has not occurred based on my review of the published literature. The published literature fails to recognize that IIHS vehicle ratings are not comparable across vehicle classes. In contrast, IIHS death rates are comparable because they are observed death rates in real crashes. I demonstrate the utility of IIHS death rates in the analysis of unobserved heterogeneity in severity modeling using a sample latent class model and compare the model predictions with those of the conventional multinomial logit model.


Embargo status: Restricted to TTU community only. To view, login with your eRaider (top right). Others may request access exception by clicking on the PDF link to the left.

Description

Keywords

Multivariate Dataset, Statistical Analysis, Fixed Object and Non-Domestic Animal Collisions

Citation