Cosine Similarity Matrix

This summary displays pairwise cosine similarity measures between cases in the data table. The values are presented in the form of a matrix with heatmap coloring to represent the direction and strength of each relationship.

The dialog for displaying a cosine similarity matrix for two or more numerical variables

At least two cases and two numeric variables are required. In addition, a categorical variable must be specified; this is used as the case identifier and is used to label the axes of the matrix that is displayed.

The cosine similarity dialog prompts for:

  • Whether to use all data in the data table or just the subset that has been selected (e.g., by clicking on the table or map).

  • Whether to remove missing values by dropping cases (the default) or by dropping variables. Removal of missing data may reduce the data set to fewer than two cases or fewer than two variables.

  • Two or more numeric variables from a list that is displayed at the left side of the dialog.

  • The categorical variable to be used as the case identifier. If this variable is not a candidate key, then this variable identifies the groups of rows with values to be aggregated prior to the similarity calculation. There must be at least two values for this categorical variable after removal of missing values.

  • The method by which individual cases are to be aggregated for each group if the grouping variable is not a candidate key. Options are: the arithmetic mean (the default), the geometric mean, the harmonic mean, and the sum. Geometric and harmonic means cannot be calculated if there are values equal to or less than zero.

The dialog then displays a matrix of cosine similarity values. The matrix is immediately updated if any changes are made to the data selections on this dialog or, if only selected data are being used, changes are made to the selected data in the data table.

The numeric value of the cosine similarity is shown in each cell of the matrix by default. The Alt-L hotkey will toggle display of the numeric values on or off. When the numeric values are not shown, similarities will be represented by only the heatmap coloring.

The values in the matrix are symmetric about the main diagonal.

The “Source Data” button will display a table of the selected data after removal of missing values. This table can be saved using the Ctrl-S hotkey.

The “Aggregates” button will display a table of the selected data after aggregation is performed (if necessary). This table can be saved using the Ctrl-S hotkey.

The “Similarities” button will display a table of the cosine similarity values in the upper right corner of the matrix. This table can be saved using the Ctrl-S hotkey.