Clustering analysis is a statistical technique used to arrange cases in categories so that the cases in each category are similar to each other and different from cases in other categories. Each category is a cluster. Social scientists use SPSS (Statistical Package for the Social Sciences) to conduct cluster analyses. In K-Means clustering the researcher designates the number of clusters desired. K is the number of clusters researchers indicate they want. K-Means clustering allows researchers to cluster very large data sets.
- Skill level:
Click on "Analyze" at the top of th SPSS screen. Select "Classify" from the drop-down menu and "K-Means Cluster."
Select a sample of cases. In the dialogue box, click on "Variables" and highlight the variables you wish to use in the initial K-Means analysis. Click on the left arrow to move the variables into the box. Set the number of clusters, usually 5 in an average size data set, in the box "Number of Clusters." The number of clusters must be more than two and no more than the number of cases. Click on "Iterate and classify" in the dialogue box to obtain cluster centres. Click on "Write final."
Include the whole data file for the final K-Means analysis. Click on "Analyze" at the top of the SPSS screen. Select "Classify" from the drop-down menu and "K-Means Cluster." In the dialogue box select "Variables" and highlight the variables you wish to use. Click on the left arrow. Set the number of clusters at 5 in the box "Number of Clusters." Click on "Classify" in the dialogue box. Choose "Read Initial" to get the cluster centres from the sample in Step 2. Click on "Save." Click on "Cluster membership." Click on "Continue."
Review the first table in the output, labelled "Final Cluster Centers." The top of the table has the numbers 1 through 5 across it, indicating each of the 5 clusters. The left hand column lists the "REGR factor score" (Regression factor score or how well each variable predicts the score) for each of the analyses. If you follow the line beside score 1 for analysis 1 to the right, it will give you the factor score for each cluster.
Read the next table in the output headed "Number of Cases in each Cluster." The box to the left lists the clusters by number, 1 through 5. Follow the Cluster number to the right and you will find the number of cases in that cluster.
Look at the last table in the output, "Cluster membership," which shows which cases are in each cluster. The cases are listed in the left column and the cluster number is found in the column to the far right.
Read the Output
Tips and warnings
- Remove outliers before conducting the analyses.
- 20 of the funniest online reviews ever
- 14 Biggest lies people tell in online dating sites
- Hilarious things Google thinks you're trying to search for