Example (SPSS):
In this demonstration, we will assume that you are attempting to predict an individual’s favorite color based on other aspects of their individuality.
We will begin with our data set:
This sequence of actions should cause the following menu to appear:
Next, click on the button labeled “Reference Category”, this should populate the following menu:
This series of selections should cause the following sub-menu to appear:
Select the box labeled “Predicted category”, then click “Continue”. You will again be returned to the initial menu. From this menu, click “OK”.
This should produce a voluminous output, however we will only concern ourselves with the following output aspects:
The above output provides us with the internal aspects of the model’s synthesis. Though this may appear daunting to behold at first, the information that is illustrated in the chart is no different than the output generated as it pertains to a typical linear model.
In the case of our model, we will have three logistical equations:
Green = (Gender:Female * -35.791) + (Smoker:Yes * 34.774) + (Car:KIA * -17.40) + (Car:Ford * .985) + 17.40
Yellow = (Gender:Female * -36.664) + (Smoker:Yes * 15.892) + (Car:KIA * -35.632) + (Car:Ford * 1.499) + 16.886
Red = (Gender:Female * -19.199) + (Smoker:Yes * 18.880) + (Car:KIA * -37.252) + (Car:Ford * -19.974) + 18.506
As is the case with all log models, we need to transform the output values of each equation to generate the appropriate probabilities.
So, for our first example observation, our equations would resemble:
Observation 1 | Gender: Female | Smoker: No | Car: Chevy
Green = (1 * -35.791) + (0 * 34.774) + (0 * -17.40) + (0 * .985) + 17.40
Yellow = (1 * -36.664) + (0 * 15.892) + (0 * -35.632) + (0 * 1.499) + 16.886
Red = (1 * -19.199) + (0 * 18.880) + (0 * -37.252) + (0 * -19.974) + 18.506
Which produces the values of:
Green = -18.391
Yellow = -19.778
Red = -0.693
To produce probabilities, we have to transform these values through the utilization of the following R code:
Green <- -18.391
Yellow <- -19.778
Red <- -0.693
# Green #
exp(Green) / (1 + exp(Green) + exp(Red) + exp(Yellow))
#Red #
exp(Red) / (1 + exp(Green) + exp(Red) + exp(Yellow))
# Yellow #
exp(Yellow) / (1 + exp(Green) + exp(Red) + exp(Yellow))
# Blue (Reference Category) #
1 / (1 + exp(Green) + exp(Red) + exp(Yellow))
Which produces the following outputs:
[1] 6.867167e-09
[1] 0.333366
[1] 1.715581e-09
[1] 0.666634
[1] 1.715581e-09
[1] 0.666634
Interpretation:
Each output value represents the probability of the occurrence of the dependent variable as it is related to the reference category (“Blue”).
P(Green) = 6.867167e-09
P(Yellow) = 1.715581e-09
P(Red) = 0.333366
R(Blue) = 0.666634
Therefore, in the case of the first observation of our example data set, we can assume that the reference category, “Blue”, has the highest likelihood of occurrence.
The predicted values, as a result of the “Save” option, have been output into a column within the original data set.
Each output value represents the probability of the occurrence of the dependent variable as it is related to the reference category (“Blue”).
P(Green) = 6.867167e-09
P(Yellow) = 1.715581e-09
P(Red) = 0.333366
R(Blue) = 0.666634
Therefore, in the case of the first observation of our example data set, we can assume that the reference category, “Blue”, has the highest likelihood of occurrence.
The predicted values, as a result of the “Save” option, have been output into a column within the original data set.
Example (R):
If we wanted to repeat our analysis through the utilization of the “R” platform, we could do so with the following code:
# (With the package: "nnet", downloaded and enabled) #
# Multinomial Logistic Regression #
color <- c("Red", "Blue", "Green", "Blue", "Blue", "Blue", "Green", "Green", "Green", "Yellow")
gender <- c("Female", "Female", "Male", "Female", "Female", "Male", "Male", "Male", "Female", "Male")
smoker <- c("No", "No", "Yes", "No", "No", "No", "No", "No", "Yes", "No")
car <-c("Chevy", "Chevy", "Ford", "Ford", "Chevy", "KIA", "Ford", "KIA", "Ford", "Ford")
color <- as.factor(color)
gender <- as.factor(gender)
smoker <- as.factor(smoker)
car <- as.factor(car)
testset <- data.frame(color, gender, smoker, car)
mlr <- multinom(color ~ gender + smoker + car, data=testset )
summary(mlr)
This produces the following output:
Call:
multinom(formula = color ~ gender + smoker + car, data = testset)
Coefficients:
(Intercept) genderMale smokerYes carFord carKIA
Green -40.2239699 36.73179 47.085203 21.36387 3.492186
Red -0.6931559 -17.00881 -3.891315 -20.23802 -11.832468
Yellow -41.0510233 37.33637 -10.943821 21.58634 -22.161372
Std. Errors:
(Intercept) genderMale smokerYes carFord carKIA
Green 0.4164966 4.164966e-01 7.125616e-14 3.642157e-01 6.388766e-01
Red 1.2247466 2.899257e-13 1.686282e-23 1.492263e-09 2.899257e-13
Yellow 0.3642157 3.642157e-01 6.870313e-26 3.642157e-01 1.119723e-12
Residual Deviance: 9.364263
AIC: 39.36426
To test the model results, the code below can be utilized:
# Test Model #
# Gender : Male #
a <- 0
# Smoker : Yes #
# Smoker : Yes #
b <- 0
# Car : Ford #
# Car : Ford #
c <- 0
# Car : KIA #
# Car : KIA #
d <- 0
Green <- -40.2239699 + (a * 36.73179) + (b * 47.085203) + (c * 21.36387) + (d * 3.492186)
Red <- -0.6931559 + (a * -17.00881) + (b * -3.891315) + (c * -20.23802) + (d * -11.832468)
Yellow <- -41.0510233 + (a * 37.33637) + (b * -10.943821) + (c * 21.58634) + (d * -22.161372)
# Green #
exp(Green) / (1 + exp(Green) + exp(Red) + exp(Yellow))
#Red #
exp(Red) / (1 + exp(Green) + exp(Red) + exp(Yellow))
# Yellow #
exp(Yellow) / (1 + exp(Green) + exp(Red) + exp(Yellow))
# Blue (Reference Category) #
1 / (1 + exp(Green) + exp(Red) + exp(Yellow))
NOTE: The model’s internal aspects differ depending on the platform which was utilized to generate the analysis. Though the model predictions do not differ, I would recommend, if publishing findings, to utilize SPSS in lieu of R. The reason for this rational, pertains to the auditing record which SPSS possesses. If data output possesses abnormalities, R, being open source, cannot be held to account. Additionally, as the multinomial function within R exists as an additional aspect of an external package, it could potentially cause platform computational errors to have a greater likelihood of occurrence.
Green <- -40.2239699 + (a * 36.73179) + (b * 47.085203) + (c * 21.36387) + (d * 3.492186)
Red <- -0.6931559 + (a * -17.00881) + (b * -3.891315) + (c * -20.23802) + (d * -11.832468)
Yellow <- -41.0510233 + (a * 37.33637) + (b * -10.943821) + (c * 21.58634) + (d * -22.161372)
# Green #
exp(Green) / (1 + exp(Green) + exp(Red) + exp(Yellow))
#Red #
exp(Red) / (1 + exp(Green) + exp(Red) + exp(Yellow))
# Yellow #
exp(Yellow) / (1 + exp(Green) + exp(Red) + exp(Yellow))
# Blue (Reference Category) #
1 / (1 + exp(Green) + exp(Red) + exp(Yellow))
NOTE: The model’s internal aspects differ depending on the platform which was utilized to generate the analysis. Though the model predictions do not differ, I would recommend, if publishing findings, to utilize SPSS in lieu of R. The reason for this rational, pertains to the auditing record which SPSS possesses. If data output possesses abnormalities, R, being open source, cannot be held to account. Additionally, as the multinomial function within R exists as an additional aspect of an external package, it could potentially cause platform computational errors to have a greater likelihood of occurrence.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.