Statistics

The Statistics menu

Several types of statistical summaries are available from the Statistics menu. These include univariate, bivariate, and multivariate summaries of numeric data, and a summary of co-occurrences of categorical data. Statistical summaries can be run using all the data in the data table or using only the subset that has been selected in the table and on the map.

Each of the statistical summaries is briefly described below. The links from these items and from the menu to the right provide more detailed descriptions.

  • Univariate statistics – Tabulates minimum, maximum, mean, and other distribution statistics for one or more numeric variables, including normality and outlier tests. Statistics are tabulated for both untransformed and log10-transformed values.

  • Fit univariate distribution – Fits one or more parametric distributions to a numeric variable, producing parameter estimates and goodness-of-fit statistics. Fitting can be performed for either untransformed or log10-transformed values.

  • Bivariate statistics – Shows correlations, ordinary least squares (OLS) regression, and trend tests for two numeric variables. A scatter plot shows the OLS regression line and confidence limits. The Thiel-Sen trend line can be added to the plot. Both X and Y variables may be log10-transformed.

  • Correlation matrix – Displays a correlation matrix for two or more numeric variables, with heatmap coloring. Pearson, Spearman, Kendall, or Chatterjee correlation coefficients may be used. Values can be log10 transformed.

  • Cosine similarity matrix – Displays a matrix of cosine similarity values to compare two or more cases using two or more variables.

  • Analysis of variance (ANOVA) – Displays the result of a one-way ANOVA for a single numeric variable, including the non-parametric Kruskal-Wallis equivalent, the Alexander-Govern test for unequal variances, and other statistics to assess normality and equivalence of variances. Values can be log10 transformed.

  • Contingency table – Displays a 2x2 contingency table for two numeric or categorical variables, and tabulates the results of tests of independence, odds ratios, conditional probabilities, and other statistics.

  • Receiver Operating Characteristics (ROC) – Displays a ROC curve for one numeric or categorical condition variable and one numeric predictor variable, and a table of ROC statistics for any selected value of the predictor variable.

  • Principal Component Analysis (PCA) – Simplifies a multidimensional data set into a smaller number of new variables that represent a known fraction of the variance between rows in the data set.

  • t-Distributed Stochastic Neighbor Embedding (t-SNE) – Simplifies a multidimensional data set down to just two dimensions, and displays the 2D coordinates in a scatter plot. Cluster identifiers can be created and propagated back to the main data table. This is similar to the UMAP tool but uses a different dimensionality reduction method.

  • Uniform Manifold Approximation and Projection (UMAP) – Simplifies a multidimensional data set down to just two dimensions, and displays the 2D coordinates in a scatter plot. Cluster identifiers can be created and propagated back to the main data table. This is similar to the t-SNE tool but uses a different dimensionality reduction method.

  • NMF unmixing – Identifies underlying patterns of variable values, and the proportional contribution of each of those patterns to the observed distribution of variable values in each sample.

  • Correspondence of values of categorical variables – Tabulates the co-occurrences of distinct values of two different categorical variables, as both the number and percent of data rows with each distinct combination of values.

  • Categorical similarity matrix – Displays a matrix of pairwise similarities between cases, calculated using one or more categorical values. The matrix includes heatmap coloring. Lin, Goodall3, OF, IOF, and Overlap similarity measures can be used.