We will again refer to the example data set below. I have added an additional category that specifies the “Race” of each individual surveyed.
Moving forward with our analysis, we will receive the following as a portion of our output:
“Race” has been split into 4 separate variables, with “Race” as a single variable, remaining for evaluation as whole.
Race(1) refers to the “Race” category: “White”.
Race(2) refers to the “Race” category: “African American”.
Race(3) refers to the “Race” category: “Asian”.
Race(4) refers to the “Race” category: “Indian”.
The “Race” category “Native American” is still accounted for within the context of the model. However, its value is that of the constant in addition to all other variables.
In this example case, our equation would resemble:
Logit(p) = -2.13055 + (Age * 0.03335) + (Obese * -0.56859) + (Smoking * 3.02867) + (White * -1.10077) + (African_American * -1.05379) + (Asian * -1.22213) + (Indian * 0.69143)
So, if we wanted to test our model probability for an individual who was:
Race(1) refers to the “Race” category: “White”.
Race(2) refers to the “Race” category: “African American”.
Race(3) refers to the “Race” category: “Asian”.
Race(4) refers to the “Race” category: “Indian”.
The “Race” category “Native American” is still accounted for within the context of the model. However, its value is that of the constant in addition to all other variables.
In this example case, our equation would resemble:
Logit(p) = -2.13055 + (Age * 0.03335) + (Obese * -0.56859) + (Smoking * 3.02867) + (White * -1.10077) + (African_American * -1.05379) + (Asian * -1.22213) + (Indian * 0.69143)
So, if we wanted to test our model probability for an individual who was:
55 Years of Age
Obese
A Smoker
White
The equation would resemble:
Logit(p) = -2.13055 + (55 * 0.03335) + (1 * -0.56859) + (1 * 3.02867) + (1 * -1.10077) + (0 * -1.05379) + (0 * -1.22213) + (0 * 0.69143)
So our logit(p) value would be: 1.06301
Which equals a positive probability of: 0.7432653
Additionally, if our model was tested for an individual who was:
26 Years of Age
Not Obese
A Smoker
Native American
Our equation would resemble:
Logit(p) = -2.13055 + (26 * 0.03335) + (0 * -0.56859) + (1 * 3.02867) + (0 * -1.10077) + (0 * -1.05379) + (0 * -1.22213) + (0 * 0.69143)
Logit(p) would equal: 1.76522
Which equals a positive probability of: 0.8538622
You can test this model in R with the following code:
# Model Test Code #
Age <- 0
Obese <- 0
Smoking <- 0
White <- 0
African_American <- 0
Asian <- 0
Indian <- 0
p <- -2.13055 + (Age * 0.03335) + (Obese * -0.56859) + (Smoking * 3.02867) + (White * -1.10077) + (African_American * -1.05379) + (Asian * -1.22213) + (Indian * 0.69143)
plogis(p)
Here is how you would create the same model through the utilization of the “R” Platform:
# Non-Binary Categorical Variables #
Age <- c(55, 45, 33, 22, 34, 56, 78, 47, 38, 68, 49, 34, 28, 61, 26)
Obese <- c(1,0,0,0,1,1,0,1,1,0,1,1,0,1,0)
Smoking <- c(1,0,0,1,1,1,0,0,1,0,0,1,0,1,1)
Cancer <- c(1,0,0,1,0,1,0,0,1,1,0,1,1,1,0)
White <- c(1,1,1,0,0,0,0,0,0,0,0,0,0,0,0)
African_American <- c(0,0,0,1,1,1,0,0,0,0,0,0,0,0,0)
Asian <- c(0,0,0,0,0,0,1,1,1,0,0,0,0,0,0)
Indian <- c(0,0,0,0,0,0,0,0,0,1,1,1,0,0,0)
Native_American <- c(0,0,0,0,0,0,0,0,0,0,0,0,1,1,1)
CancerModelII <- data.frame(Age, Obese, Smoking, Cancer, White, African_American, Asian, Indian, Native_American )
CancerModelLogII <- glm(Cancer~ Age + Obese + Smoking + White + African_American + Asian + Indian + Native_American, family=binomial)
summary(CancerModelLogII)
# Which produces the output #
Call:
glm(formula = Cancer ~ Age + Obese + Smoking + White + African_American +
Asian + Indian + Native_American, family = binomial)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.9613 -0.7252 0.4240 0.8107 1.7092
Coefficients: (1 not defined because of singularities)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.13055 2.58207 -0.825 0.409
Age 0.03335 0.04641 0.719 0.472
Obese -0.56859 1.60680 -0.354 0.723
Smoking 3.02867 1.95858 1.546 0.122
White -1.10077 2.35673 -0.467 0.640
African_American -1.05379 2.18843 -0.482 0.630
Asian -1.22213 2.40838 -0.507 0.612
Indian 0.69143 2.51153 0.275 0.783
Native_American NA NA NA NA
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 20.728 on 14 degrees of freedom
Residual deviance: 15.366 on 7 degrees of freedom
AIC: 31.366
Number of Fisher Scoring iterations: 4
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.