Spearman’s Rank Correlation Coefficient
Spearman’s Rank Correlation Coefficient, also referred to as Spearman’s rho, is a non-parametric alternative to the Pearson correlation. The Spearman alternative is utilized in circumstances when either data samples are non-linear*, or the data type contained within those samples are ordinal**. The output variable that this method produces is known as “rho”. Hence the alternative name which this method is referred to as (“Spearman’s Rho).
As is case with non-parametric alternatives, the particular design of this procedure utilizes a rank system.
Example:
We are presented with the following data vectors from two survey prompts:
# Create data vector (scale 1-5) #
x <- c(5, 1, 1, 1, 3, 2, 5, 3, 3, 2, 4, 4, 4, 2, 5, 4, 4, 4, 4, 2)
# Create data vector (scale 1-5) #
y <- c(4,5, 4, 3, 1, 1, 5, 4, 5, 4, 3, 4, 3, 4, 5, 5, 3, 3, 5, 4)
# Create Spearman’s Rank Correlation #
cor.test(x, y, method=c("spearman"))
This produces the output:
Spearman's rank correlation rho
data: x and y
S = 1072.1, p-value = 0.4126
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.1939455
From this output, we can first determine that the model strength is not the best, as the p-value = 0.4126, a value which is far above the common alpha level of .05. Next, we will assess the rho value output, which is 0.192955. This value is measured on a scale similar to the Pearson’s correlation. Since this value is relatively low, we will assume a weak positive correlation.
*- For an example as to what non-linear data might resemble, please refer to the article “(R) Polynomial Regression”, published April 17, 2018.
**- For example, survey response data which asked the respondent to rank a particular item on a scale of 1-10.
Kendall Rank Correlation Coefficient
The Kendall Rank Correlation Coefficient, also referred to as Kendall’s Tau, is also a non-parametric alternative to the Pearson correlation. Like Spearman’s rho, Kendall’s Tau is also utilized in circumstances when either data samples are non-linear, or the data type contained within those samples are ordinal. The output variable that this method produces is known as “rho”. As is case with non-parametric alternatives, the particular design of this procedure utilizes a rank system.
Example:
We are presented with the following data vectors from two survey prompts:
# Create data vector (scale 1-5) #
x <- c(5, 1, 1, 1, 3, 2, 5, 3, 3, 2, 4, 4, 4, 2, 5, 4, 4, 4, 4, 2)
# Create data vector (scale 1-5) #
y <- c(4,5, 4, 3, 1, 1, 5, 4, 5, 4, 3, 4, 3, 4, 5, 5, 3, 3, 5, 4)
# Create Kendall Rank Correlation #
cor.test(x, y, method=c("kendall"))
This produces the output:Kendall's rank correlation tau
data: x and y
z = 0.84528, p-value = 0.398
alternative hypothesis: true tau is not equal to 0
sample estimates:
tau
0.1617271
From this output, we can first determine that the model strength is not the best, as the p-value = 0.398, a value which is far above the common alpha level of .05. Next, we will assess the rho value output, which is 0.1617271. This value is measured on a scale similar to the Pearson’s correlation. Since this value is relatively low, we will assume a weak positive correlation.
Conclusion:
While both methods provide similar functionality, the Spearman’s Rank Correlation is utilized far more frequently than the Kendall Rank Correlation. I typically utilize both methodologies, compare the results of each, and then report my findings in a subsequent research composition.
I hope that you have found this article to be informative and interesting. Until next time, stay inquisitive, Data Heads!
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.