Reflections of a Data Scientist: “Means” (SPSS)

The “Means” function within SPSS generates summary statistic data and allows for the comparison of categorical means within a data set. This feature is similar to the function summary(), which is found within the “R” platform.

I will demonstrate the premise of the function with an example:

The column: “VAR00001” represents categorical data.

The column: “VAR00002” represents numerical data.

To perform the “Means” function, first select “Analyze” from the top menu, then select “Compare Means”. After this selection has been made, click on the option “Means”.

This should bring up the following menu. I have selected “Options” from this interface, and have selected additional options to be considered for analyzation. The “Dependent List” variable will be your numerical variable, and the “Layer 1 of 1” will contain your categorical variable.

Clicking “OK” produces the output:

Case Processing Summary

Included – These two columns contain the count (N) and percentage of the total of the number of sample values that were included in the analysis.

Cases Excluded – These two columns contain the count (N) and percentage of the total of the number of sample values that were excluded from the analysis. Excluded values are those numerical values which did not contain input data.

Included – These two columns contain the count (N) and percentage of the total of the number of sample values that were included, and excluded, within the analysis.

Report

Mean – The mean value of the total values analyzed from this categorical variable.

N – The number of observed values which are included within the category.

Std. Deviation – The standard deviation of the values contained within the category.

Minimum – The minimum value of the observed set of values contained within the category.

Maximum - The maximum value of the observed set of values contained within the category.

Median – The median value of the observed set of values contained within the category.

How to reproduce this analyzation within the “R” platform:

Cat1 <- c(12)

Cat2 <- c(11, 13, 14, 19)

Cat3 <- c(17, 18)

summary(Cat1)

sd(Cat1)

summary(Cat2)

sd(Cat2)

summary(Cat3)

sd(Cat3)

Which produces the output:

> summary(Cat1)
Min. 1st Qu. Median Mean 3rd Qu. Max.
12 12 12 12 12 12
> sd(Cat1)
[1] NA
>
> summary(Cat2)
Min. 1st Qu. Median Mean 3rd Qu. Max.
11.00 12.50 13.50 14.25 15.25 19.00
> sd(Cat2)
[1] 3.40343
>
> summary(Cat3)
Min. 1st Qu. Median Mean 3rd Qu. Max.
17.00 17.25 17.50 17.50 17.75 18.00
> sd(Cat3)
[1] 0.7071068

Reflections of a Data Scientist

Sunday, January 28, 2018

“Means” (SPSS)

No comments:

Post a Comment