iJournal
Issue 10
Spring 2005

Article Supplement

Classification Differences Between Linear Discriminant Analysis and Logistic Regression

Social researchers involved in statistical analysis usually learn in graduate school that they can develop an equation through either linear discriminant analysis (LDA) or logistic regression (LR) to classify people (or objects) based on a set of predictors. Two researchers (Pui-Wa Lei and Laura M. Koehly) recently published a study that compares LDA and LR in terms of their accuracy of classification when the predictors are assumed to be normally distributed. Overall, the study suggested that LDA compares favorably with LR in terms of classification accuracy for the two-group situation (that is, predicting sample units to be pass/fail or healthy/unhealthy, etc.) More specifically, they made the following conclusions, among others:

  1. "If total misclassification is of interest, the optimal cut-score is .5 [or 50%]. With a cut-score of .5, LR and LDA with proportional or accurate prior specification perform similarly and best among other LDA specifications examined in this study, providing good to excellent classification accuracy for extreme population priors or large D2 [multivariate Mahalonobis distances]...Given that the correct population priors are generally unknown, this suggests that LDA with proportional priors or LR (if the sample is representative of the population in terms of group proportions) is the best approach to take, along with a cut-score of .5 when it is of interest to reduce total error...." [p.41-42]

  2. "When costs of misclassification for the groups are evidently different, one is more interested in the separate-group misclassification rate. If large-group classification error is of concern, the optimal cut-score is also .5. The optimal method is LDA with extreme prior specification, providing excellent classification accuracy with cut-score of .5 ...regardless of true population priors, group distance, or whether covariance matrices were equal or unequal. If the small-group classification error is of interest, on the other hand, the optimal cut-score is .1, and the optimal method is LDA with equal prior specification, providing excellent classification accuracy (with cut-score .1) regardless of true population priors, group distance, or whether covariance matrices are equal or not..." [pp. 42-43]

Lei & Koehly used Monte Carlo simulations to produce data sets that enabled them to compare the performance of LDA and LR under four critical assumptions (homogeneity of covariance matrices, group separation, sample size, and prior probability). Their testing involved a training sample and a validation sample for each LDA vs. LR comparison.

This study should help researchers who need to select a statistical tool for a two-group classification task. The authors summarize the prior research on the suitability of LDA and LR for various statistical situations, and they include more than thirty references to other works. Lei (Pennsylvania State University) and Koehly (Texas A&M University) document their study, along with its limitations, in an article ("Linear Discriminant Analysis Versus Logistic Regression: A Comparison of Classification Errors in the Two-Group Case") in The Journal of Experimental Education (Vol.72, No.1, Fall 2003, pp. 25-49).

[Abstract done by Willard Hom, Director, Research & Planning Unit, System Office, California Community Colleges, 2/11/05]