Evaluating statistical classification models with application to corporate bond ratings classifications
MetadataShow full item record
This dissertation is concerned with the common problems appearing in the classification analysis. A simulation study is performed to compare the logit model to the normal linear discriminant analysis model and the rank transformation linear discriminant model under non-normal distributions. The logit model is more efficient than the linear discriminant analysis model and the rank transformation linear discriminant model in many situations, particularly when the sample sizes are unequal. The effects of using some estimates of prior probabilities are measured in the linear discriminant analysis model. Of particular interest in this study is the pattern of the individual apparent error rate. The prior probabilities have greater impact when the distance between categories is small. Another simulation study is concerned with the estimation of the true loss rate in the linear discriminant analysis model under normal distributions and in the logit model under exponential distributions. Since the constant loss rate is known to underestimate the true constant loss rate, it should be used with caution when the sample sizes are not large. The holdout method is not an appropriate estimation method to measure the prediction power when used with small samples. Resampling methods such as the cross-validation and the bootstrap methods are compared with the holdout method in the simulation studies. The bootstrap method produced an almost unbiased estimate with a small variance. Based on the results of the simulation studies, six classification models of bond ratings using prior probabilities, loss functions, and the bootstrap method are compared. The criteria of comparisons are the loss rates and the bootstrap estimate of true loss rate. The logit model clearly dominated all other classification models for both binary classification and multinomial classification cases.