A General Linear Mixed Model is a methodology of modeling utilized to create linear models which account for categorical variables. This is achieved through the synthesis of a model which contains dynamic intercept values as an aspect of its design. The categorical variables which are included as elements of the model design can be singular or numerous. However, if multiple variable of this nature are included, they will be assessed in categorical combinations.
Example:
We will be utilizing the following data vectors:
w <- c(366, 383, 332, 331, 311, 352, 356, 351, 399, 357)
v <- c(6, 10, 6, 13, 19, 12, 11, 17, 18, 12)
x <- c(8, 1, 4, 10, 8, 10, 3, 1, 1, 2)
y <- c(97, 56, 97, 68, 94, 66, 81, 76, 86, 69)
z <- c(188, 184, 181, 194, 190, 180, 192, 184, 192, 191)
These vectors can be combined to create a single data frame:
model <- data.frame(v, w, x, y, z)
If we were to create a simple linear model, the necessary code could be applied:
lmmodeltest <- lm(x ~ y + z, data=model)
summary(lmmodeltest)
Which produces the output:
Call:
lm(formula = x ~ y + z, data = model)
Residuals:
Min 1Q Median 3Q Max
-4.027 -2.882 -1.643 2.668 5.623
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.587008 53.572275 0.048 0.963
y 0.034052 0.099425 0.342 0.742
z -0.002543 0.285784 -0.009 0.993
Residual standard error: 4.267 on 7 degrees of freedom
Multiple R-squared: 0.01653, Adjusted R-squared: -0.2645
F-statistic: 0.05883 on 2 and 7 DF, p-value: 0.9433
Now, we will produce a general linear mixed model which accounts for the variable “w”.
(This variable, the variable which is accounted for, but is not as an independent variable within the model, is referred to as a “random variable”.)
# You will need to download and enable the package: “nlme” #
mmodeltest1 <- lme(x ~ y + z, data=model, random=~1|w)
summary(mmodeltest1)
This produces the output:
Linear mixed-effects model fit by REML
Data: model
AIC BIC logLik
65.41286 65.14241 -27.70643
Random effects:
Formula: ~1 | w
Example:
We will be utilizing the following data vectors:
w <- c(366, 383, 332, 331, 311, 352, 356, 351, 399, 357)
v <- c(6, 10, 6, 13, 19, 12, 11, 17, 18, 12)
x <- c(8, 1, 4, 10, 8, 10, 3, 1, 1, 2)
y <- c(97, 56, 97, 68, 94, 66, 81, 76, 86, 69)
z <- c(188, 184, 181, 194, 190, 180, 192, 184, 192, 191)
These vectors can be combined to create a single data frame:
model <- data.frame(v, w, x, y, z)
If we were to create a simple linear model, the necessary code could be applied:
lmmodeltest <- lm(x ~ y + z, data=model)
summary(lmmodeltest)
Which produces the output:
Call:
lm(formula = x ~ y + z, data = model)
Residuals:
Min 1Q Median 3Q Max
-4.027 -2.882 -1.643 2.668 5.623
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.587008 53.572275 0.048 0.963
y 0.034052 0.099425 0.342 0.742
z -0.002543 0.285784 -0.009 0.993
Residual standard error: 4.267 on 7 degrees of freedom
Multiple R-squared: 0.01653, Adjusted R-squared: -0.2645
F-statistic: 0.05883 on 2 and 7 DF, p-value: 0.9433
Now, we will produce a general linear mixed model which accounts for the variable “w”.
(This variable, the variable which is accounted for, but is not as an independent variable within the model, is referred to as a “random variable”.)
# You will need to download and enable the package: “nlme” #
mmodeltest1 <- lme(x ~ y + z, data=model, random=~1|w)
summary(mmodeltest1)
This produces the output:
Linear mixed-effects model fit by REML
Data: model
AIC BIC logLik
65.41286 65.14241 -27.70643
Random effects:
Formula: ~1 | w
(Intercept) Residual
StdDev: 3.995423 1.498283
Fixed effects: x ~ y + z
Value Std.Error DF t-value p-value
(Intercept) 2.5870077 53.57227 7 0.0482900 0.9628
y 0.0340519 0.09942 7 0.3424887 0.7420
z -0.0025432 0.28578 7 -0.0088991 0.9931
Correlation:
(Intr) y
y -0.066
z -0.989 -0.081
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-0.3313798 -0.2371629 -0.1352219 0.2195812 0.4627224
Number of Observations: 10
Number of Groups: 10
Generally speaking, the models look identical. What differs between the two models are the coefficient values. It is these values that determine the value of the intercept.
For example, the values for the intercept of the original linear model are:
coef(lmmodeltest)
Output:
(Intercept) y z
2.587007686 0.034051914 -0.002543224
However, the values for the general linear mixed model are:
coef(mmodeltest1)
Output:
(Intercept) y z
311 4.95003237 0.03405191 -0.002543224
331 7.48857278 0.03405191 -0.002543224
332 1.33355478 0.03405191 -0.002543224
351 -0.66296480 0.03405191 -0.002543224
352 7.51706478 0.03405191 -0.002543224
356 0.95902860 0.03405191 -0.002543224
357 0.43833139 0.03405191 -0.002543224
366 4.85601183 0.03405191 -0.002543224
383 -0.06589015 0.03405191 -0.002543224
399 -0.94366472 0.03405191 -0.002543224
As you can observe, the intercept coefficient values, which are sorted categorically, differ. However, the coefficient values of the dependent variables remain static. This phenomenon allows for dynamic model creation.
For example, if we were to test the simple linear model with prior data set variables, the outcome for the first observation would be:
yvalue <- 97
zvalue <- 188
interceptvalue <- 2.5870077
xvalue <- (yvalue * 0.034052) + (zvalue * -0.002543) + interceptvalue
xvalue
Which would produce the output:
[1] 5.41192
The actual value of x in the initial set was:
8
If we were to test the generalized mixed linear model with prior data set variables, the outcome for the first observation would be:
yvalue <- 97
zvalue <- 188
interceptvalue <- 4.85601183
xvalue <- (yvalue * .0340519) + (zvalue * -0.0025432) + interceptvalue
xvalue
Which would produce the output:
[1] 7.680925
Which is closer to the actual value of x:
8
To produce a general linear mixed model which accounts for both the variables “w” and “v”, the following code can be utilized:
mmodeltest2 <- lme(x ~ y + z, data=model, random=~1|w/v)
summary(mmodeltest2)
This code produces the output:
Linear mixed-effects model fit by REML
Data: model
AIC BIC logLik
67.41286 67.08832 -27.70643
Random effects:
Formula: ~1 | w
(Intercept)
StdDev: 2.916513
Formula: ~1 | v %in% w
(Intercept) Residual
StdDev: 2.916513 1.093692
Fixed effects: x ~ y + z
Value Std.Error DF t-value p-value
(Intercept) 2.5870077 53.57227 7 0.0482900 0.9628
y 0.0340519 0.09942 7 0.3424887 0.7420
z -0.0025432 0.28578 7 -0.0088991 0.9931
Correlation:
(Intr) y
y -0.066
z -0.989 -0.081
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-0.24189513 -0.17312025 -0.09870703 0.16028629 0.33777045
Number of Observations: 10
Number of Groups:
w v %in% w
10 10
With the model output produced, we must again generate coefficient values. This can be achieved through the utilization of the following code:
coef(mmodeltest2)
Which produces the output:
(Intercept) y z
311/19 5.1052676 0.03405191 -0.002543224
331/13 7.8105734 0.03405191 -0.002543224
332/6 1.2512112 0.03405191 -0.002543224
351/17 -0.8764666 0.03405191 -0.002543224
352/12 7.8409371 0.03405191 -0.002543224
356/11 0.8520811 0.03405191 -0.002543224
357/12 0.2971775 0.03405191 -0.002543224
366/6 5.0050705 0.03405191 -0.002543224
383/10 -0.2401681 0.03405191 -0.002543224
399/18 -1.1756067 0.03405191 -0.002543224
Each combination of categorical variables, along with their coinciding intercept values, had been output in the above data stream. If the model were to be utilized to assess new observational values generated from the same system of observation, cross tabulated categorical values, would determine the value of the utilized intercept.
The type of categorical data which would satisfy the “random” element of model synthesis, typically would refer to variables such as: gender, race, ethnicity, zipcode, etc.
That’s all for now, Data Heads! Thanks for visiting! I appreciate your patronage.
Random effects:
Formula: ~1 | w
(Intercept)
StdDev: 2.916513
Formula: ~1 | v %in% w
(Intercept) Residual
StdDev: 2.916513 1.093692
Fixed effects: x ~ y + z
Value Std.Error DF t-value p-value
(Intercept) 2.5870077 53.57227 7 0.0482900 0.9628
y 0.0340519 0.09942 7 0.3424887 0.7420
z -0.0025432 0.28578 7 -0.0088991 0.9931
Correlation:
(Intr) y
y -0.066
z -0.989 -0.081
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-0.24189513 -0.17312025 -0.09870703 0.16028629 0.33777045
Number of Observations: 10
Number of Groups:
w v %in% w
10 10
With the model output produced, we must again generate coefficient values. This can be achieved through the utilization of the following code:
coef(mmodeltest2)
Which produces the output:
(Intercept) y z
311/19 5.1052676 0.03405191 -0.002543224
331/13 7.8105734 0.03405191 -0.002543224
332/6 1.2512112 0.03405191 -0.002543224
351/17 -0.8764666 0.03405191 -0.002543224
352/12 7.8409371 0.03405191 -0.002543224
356/11 0.8520811 0.03405191 -0.002543224
357/12 0.2971775 0.03405191 -0.002543224
366/6 5.0050705 0.03405191 -0.002543224
383/10 -0.2401681 0.03405191 -0.002543224
399/18 -1.1756067 0.03405191 -0.002543224
Each combination of categorical variables, along with their coinciding intercept values, had been output in the above data stream. If the model were to be utilized to assess new observational values generated from the same system of observation, cross tabulated categorical values, would determine the value of the utilized intercept.
The type of categorical data which would satisfy the “random” element of model synthesis, typically would refer to variables such as: gender, race, ethnicity, zipcode, etc.
That’s all for now, Data Heads! Thanks for visiting! I appreciate your patronage.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.