x <-c(53,46,61,97,44,87,40,15,29,99,85,98,17,3,46,25,15,19,2,32,67,34,39,100,88,40,40,87,89,86,69,67,89,84,98,43,75,66,40,76,48,82,45,99,10,59,15,13,99,45,78,66,59,26,2,91,80,42,94,12,9,24,37,14,18,86,35,96,56,50,22,39,58,82,11,56,50,30,99,64,74,13,14,7,5,97,59,91,57,69,58,36,43,77,36,2,58,86,89)
y <- c(1,1,1,2,2,2,3)
summary()
Summary is a useful R function, in that it provides the user with console output pertaining to the value that was initially passed to it.
Summary will print to the console:
Min (the smallest value within the set)
1st Qu. (the value of the first quartile)
Median (the median value)
Mean (the mean value)
3rd Qu. (the value of the third quartile)
Max (the max value)
If were to utilize this function while passing to it the value of 'x', the following information would be generated and printed to the R console window:
summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
If you wanted to generate each value independently, you could use the following functions:
mean()
For the mean value.
median()
For the median value.
range()
For the lowest and highest values.
Finding the Mode
Unfortunately, R does not have a standard function contained within its library that can be utilized to generate the mode. However, after careful searching, I found a very good substitute. The code is below. This code was taken from a YouTube user named, economicurtis. It was featured in his video: "Calculating Mode with R Software (More on R's Summary Stats". A link exists to this video at the end of the article.*
temp <- table(as.vector(<vectorname>))
names(temp)[temp == max(temp)]
The first line creates a new table for the data vector, and the second line generates the value. If the data is bi-modial, two values will be generated. Here is an example of the code with 'y' being utilized as the vector value.
temp <- table(as.vector(y))
names(temp)[temp == max(temp)]
Since 'y' is bi-modial, the output that is printed to the console window should be:
Finding the Variance
To derive the variance from a vector, the following function can be utilizied:
var()
Finding the Standard Deviation
This funciton can be used to derive the standard deviation from a vector:
sd()
Tukey's Five Number Summary
This function provides sample percentiles, which can be useful in descriptive statistics:
fivenum()
For example, if were to use this function on x:
fivenum(x)
The following information would be printed to the console window:
(29.5) The second value is the value of the first quartile.
(53.0) The third value is the median.
(82.0) The fourth value is the value of the third quartile.
(100.0) And the final value is largest obervation.
Interquartile Range
The interquartile range, or IQR, is the value between the third and first quartiles. This value can be derived with the following function.
IQR()
In the next article, we will begin graphing box plots, and histograms.
* - https://www.youtube.com/watch?v=YvdYwC2YgeI
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.