Discriminant Analysis is a very sensitive modeling methodology, as outliers and group size can potentially cause miscalculation. Additionally, there are various assumptions that must be accounted for prior to application.
These assumptions* are:
Multivariate Normality - Independent variables are normal for each level of the grouping variable.
Homogeneity of Variance - Variances among group variables are the same across levels of predictors.
Multicollinearity - Predictive power can decrease with an increased correlation between predictor variables.
Independence - Participants are assumed to be randomly sampled, and a participant’s score on one variable is assumed to be independent of scores on that variable for all other participants.
Example:
We’ll begin with a familiar sample data set:
From the “Analyze” menu, select “Classify”, then select “Discriminant”.
The following menu should appear. Using the topmost middle arrow, select “Cancer” as the “Grouping Variable”. Using the center arrow, select “Age”, “Obese” and “Smoking” as “Independents”.
Click on the “Define Range” button to populate the following sub-menu. Since “Cancer” is a binary variable, we will set “Minimum” to “0”, and “Maximum” to “1”.
(Note: “1” indicates “Cancer”, and “0” indicates “No Cancer Detected”)
After clicking “Statistics”, check the box adjacent to “Unstandardized”.
Clicking on “Save” will populate the menu below. Check the options labeled “Predicted group membership” and “Probabilities of group membership”.
Once this has been completed, click “OK”.
The following output should be generated:
Canonical Discriminat Function Coefficients – The values presented in the above table are the components of the predictive model. If we were to construct the model as an equation, it would resemble:
Logit(p) = (Age * .026) + (Obese * -.347) + (Smoking * 2.418) - 2.263
The logit value can be utilized in tandem with the R function “plogis” to generate the probability of a positive outcome. For more information pertaining to this function, please consult the article related to Binary Logistical Analysis that was previously featured on this blog.
The final output that we will review is the output that was produced within the original data sheet.
We are presented with three new variables, “Dis_1” represents the model’s predicted outcome given the dependent variable data (1 or 0). “Dis2_2” represents the probability (.00 – 1.00) of a positive outcome occurring, “Dis2_1” represents the probability of a negative outcome occurring.
That’s all for now. Stay subscribed for more analytics and articles. Until next time, Data Heads!
*- https://en.wikipedia.org/wiki/Discriminant_function_analysis
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.