Example:
You will be presented with a menu which presents the following options:
This menu is establishing the parameters in which the model will adhere to upon its creation. If “Determine automatically” is selected, the output, which will contain the model itself, will comprise of a full analyzation of the selected variables, which therein will comprise of groupings which the computer algorithm determined most appropriate for the situation.
If “Specify fixed” is selected, the computer will put forth its best efforts to create the amount of groupings specified by the user. This forced number of groupings will be utilized for the creation of the model output. (Reminder: If your model will comprise of only categorical variables, and you would like to specify the number of groupings, it is best to change “Distance Measure” to “Euclidean”.)
Example (cont.)
If we were to continue with our example and select “Output” from the above menu, we would be presented with the following:
Let’s select “Cont_Var2” from our “Variables” list. This will move the selection to the “Evaluation_Fields” menu box. Selecting “Create cluster membership variable” beneath the “Working Data File” header will write output to the data table after the model output is provided.
Clicking on “Continue” from this menu, and “OK” from the prior menu, will provide the following output:
Model Summary (Explanation)
Model Summary
Algorithm – This cell entry is providing the algorithm utilized to create the model.
Inputs – This cell entry is providing the number of inputs utilized to create the model.
Clusters – This cell entry is providing the number of clusters produced by the sorting algorithm.
Cluster Quality
This output is illustrating the overall strength of the model.
(Double clicking on the TwoStep Cluster output provides the following illustration)
What is shown in the above output is a graphical illustration of the clusters which combined, represent the model in its entirety.
Chart (Explanation)
Size of Smallest Cluster – This is number of entries from which the smallest cluster is comprised. To the right of this value is the percentage of the model which the cluster represents.
Size of Largest Cluster – This is number of entries from which the largest cluster is comprised. To the right of this value is the percentage of the model which the cluster represents.
Ratio of Sizes: Largest Cluster to Smallest Cluster – This value is representative of the ratio produced when largest cluster is divided by smallest cluster. The value of this ratio should be no greater than 2.
Cluster – Each cluster segmented by a numerical value.
Label – There is no default label provided. However, if you would like to create a label for cluster “1”, this field enables you to do so.
Description - There is no default description provided. If you would like to create a description for cluster “1”, this field enables you to do so.
Size – The size of each cluster as it relates to the total number of observations contained within the model. Percentage of Total Model (number of observations within cluster).
Inputs
Listed in the order of predictive importance are the variables which make up each cluster. If you hover your mouse above any cell, a box will appear which contains a key pertaining to what is represented within the cell.
If a variable is categorical, its most frequent category is listed along with the frequency of its occurrence within the group.
If a variable is continuous, its mean value is listed instead.
You may recall that at the beginning of this exercise that we selected “Cont_Var2” for our “Evaluation Field”. These next steps will demonstrate what this accomplished.
Returning to the initial data set, you will witness an additional column has been created.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.