The F-Test is a statistical method for comparing two population variances. It’s most recognized utilization is as one of the aspects of the ANOVA method. This method will be discussed in a later article.
Essentially, the F-Test model enables the creation of a test statistic, critical value and a distribution model. With these values derived, a hypothesis test can be stated, and from such, the comparison of two variance can be achieved.
Some things to keep in mind before moving forward:
1. The F-Test assumes that the samples provided, originated from a normal distribution.
2. The F-Test attempts to discover whether two samples originate from populations with equal variances.
So for example, if we were comparing the following two samples:
samp1 <- c(-0.73544189, 0.36905647, 0.69982679, -0.91131589, -1.84019291, -1.02226811, -1.85088278, 2.24406451, 0.63377787, -0.80777949, 0.60145711, 0.43853971, -1.76386879, 0.32665597, 0.32333871, 0.90197004, 0.29803556, 0.47333427, 0.23710263, -1.48582332, -0.45548478, 0.36490345, -0.08390139, -0.46540965, -1.66657385)
samp2 <- c(0.67033912, -1.23197505, -0.18679478, 1.06563032, 0.08998155, 0.22634414, 0.06541938, -0.22454059, -1.00731073, -1.43042950, -0.62312404, -0.22700636, -0.71908729, -0.36873910, 0.15653935, -0.19328338, 0.56259671, 0.31443699, 1.02898245, 1.18903593, -0.14576090, 0.68375259, -0.15348007, 1.58654607, 0.01616986)
For a right tailed test, we would state the following hypothesis:
H0: σ2/1 =σ2/2
Ha: σ2/1>σ2/2
samp2 <- c(0.67033912, -1.23197505, -0.18679478, 1.06563032, 0.08998155, 0.22634414, 0.06541938, -0.22454059, -1.00731073, -1.43042950, -0.62312404, -0.22700636, -0.71908729, -0.36873910, 0.15653935, -0.19328338, 0.56259671, 0.31443699, 1.02898245, 1.18903593, -0.14576090, 0.68375259, -0.15348007, 1.58654607, 0.01616986)
For a right tailed test, we would state the following hypothesis:
H0: σ2/1 =σ2/2
Ha: σ2/1>σ2/2
# Null Hypothesis = Variances are equal. #
# Alternative Hypothesis = The first measurement of variance is greater than the second measurement of variance. #
With both samples imported into R, we can now utilize the following code to perform the F-Test:
(We will assume an alpha of .05):
var.test(samp1, samp2, alternative = "greater", conf.level = .95)
F test to compare two variances
F = 1.9112, num df = 24, denom df = 24, p-value =
0.05975
alternative hypothesis: true ratio of variances is greater than 1
95 percent confidence interval:
0.9634237 Inf
sample estimates:
ratio of variances
1.911201
Let us review each aspect of this output:
“ F = “ is the F-Test test statistic.
“num df = “ is the value of the degrees of freedom found within the numerator.
“denom df = “ is the value of the degrees of freedom found within the denomenator.
“p-value = “ is the probability of the corresponding F-Test statistic.
“95 percent confidence interval:” is the ratio between the two population variances at the 95% confidence level.
“ratio of variances” is the value of the variance of sample 1 divided by the variance of sample 2.
Looking at the p-value, which is greater than our alpha value (0.05975 > .05), we cannot conclude, that at a 95% confidence level, that our samples were taken from populations with differing variances.
Additionally, we can confirm this conclusions by comparing our F-Test statistic of 1.9112, to the F-Value which coincides with the appropriate degrees of freedom and alpha value. To find this value, we would typically consult a chart in the back of a statistics textbook. However, R makes the situation simpler by providing us with a method to reference this value.
Utilizing the code:
qf(.95, df1=24, df2=24) #Alpha .05, Numerator Degrees of Freedom = 24, Denomenator Degrees of Freedom = 24#
Produces the output:
[1] 1.98376
Again, we cannot conclude that because 1.9112 < 1.98376, that our samples were taken from populations with differing variances.
If we were to graph this test and distribution, the illustration would resemble:
“ F = “ is the F-Test test statistic.
“num df = “ is the value of the degrees of freedom found within the numerator.
“denom df = “ is the value of the degrees of freedom found within the denomenator.
“p-value = “ is the probability of the corresponding F-Test statistic.
“95 percent confidence interval:” is the ratio between the two population variances at the 95% confidence level.
“ratio of variances” is the value of the variance of sample 1 divided by the variance of sample 2.
Looking at the p-value, which is greater than our alpha value (0.05975 > .05), we cannot conclude, that at a 95% confidence level, that our samples were taken from populations with differing variances.
Additionally, we can confirm this conclusions by comparing our F-Test statistic of 1.9112, to the F-Value which coincides with the appropriate degrees of freedom and alpha value. To find this value, we would typically consult a chart in the back of a statistics textbook. However, R makes the situation simpler by providing us with a method to reference this value.
Utilizing the code:
qf(.95, df1=24, df2=24) #Alpha .05, Numerator Degrees of Freedom = 24, Denomenator Degrees of Freedom = 24#
Again, we cannot conclude that because 1.9112 < 1.98376, that our samples were taken from populations with differing variances.
If we were to graph this test and distribution, the illustration would resemble:
If you would like to create your own f-distribution graphs, sans the mark-ups, you could use the following code:
curve(df(x, df1=24, df2=24), from=0, to=5) # Modify the degrees of freedom only #
Below is an illustration of a few various f-distribution types by varying degrees of freedom:
Below is an illustration of a few various f-distribution types by varying degrees of freedom:
I hope that you found this article useful, in the next post, we will begin to discuss the concept of ANOVA.
* A helpful article pertaining to the F-Test statistic: http://atomic.phys.uni-sofia.bg/local/nist-e-handbook/e-handbook/eda/section3/eda359.htm
** Source for F-Distribution Image: https://en.wikipedia.org/wiki/F-distribution