Bivariate Statistics

This summary describes the relationship between two numeric variables. It presents several measures of the presence, or strength, of a linear or monotonic relationship between two selected variables. The relationship can be assessed for the untransformed variables, or one or both variables can be log10-transformed.

The dialog for displaying statistics about the relationship between two numerical variables

The bivariate statistics dialog prompts for:

  • Two numeric variables.

  • Whether to use log-transformed values for each of them.

  • Whether to use all data in the data table or just the subset that has been selected (e.g., by clicking on the table or map).

After these selection have been made, the dialog then displays a table of bivariate statistics and a scatter plot of the two variables with an ordinary least-squares (OLS) regression line. The table of statistics is immediately updated if any changes are made to the user’s selections on this dialog or to the data used.

The summary statistics can be displayed for either all data in the data table or for only the data rows that have been selected. Data rows where either of the variables are missing are not included in the summary, and there must be at least three rows of data for the calculations to be carried out.

The summary includes a table of statistics on the left and three plots on separate tabs to the right. The statistics that are displayed in the table are:

  • The name of the X variable

  • The name of the Y variable

  • N, the number of data points for which neither variable is missing

  • Covariance

  • Pearson’s correlation coefficient r (Wright 1921)

  • p value for the hypothesis that r is zero

  • Spearman’s correlation coefficient rho (Spearman 1904)

  • p value for the hypothesis that rho is zero

  • Kendall’s correlation coefficient tau (Kendall 1938)

  • p value for the hypothesis that tau is zero

  • Chatterjee’s correlation coefficient xi (Chatterjee 2020)

  • Slope of an ordinary least-squares (OLS) linear regression

  • Intercept of the OLS linear regression

  • R-squared (R2) for the regression

  • Adjusted R-squared for the regression

  • Total sum of squares (SS) for the regression

  • Sum of squares explained by the regression

  • Residual sum of squares, not explained by the regression

  • Coefficient of determination in the scale of the standard deviation (CoDSD) (Berggren 2024)

  • p value for the hypothesis that the regression slope is zero

  • p value for the hypothesis that the regression intercept is zero

  • Akaike’s Information Criterion (AIC) for the regression

  • Bayesian Information Criterion (BIC) for the regression

  • p value for the Breusch-Pagan test for heteroskedasticity of the OLS residuals; also see the plot of residuals

  • p value for a Mann-Kendall Trend Test of the Y variable. There must be at least four rows of data for this to be evaluated.

  • p value for a Runs Test of the Y variable, using the median as a cutoff and with a small-sample correction for N < 50 (per NIST)

  • Theil-Sen slope

  • 95% confidence interval on the Theil-Sen slope

  • Theil-Sen intercept, calculated using the median and the Theil-Sen slope.

  • The robust R-squared (R2) per Kvålseth (1985). This is calculated with respect to the Theil-Sen line rather than the OLS regression.

If a date or date/time column is chosen for the X variable, only the number of points and the results of the Mann-Kendall and Runs tests will be shown.

The three plots to the right of the data table are on separate tabs. These plots are:

  • A scatter plot showing all of the individual data points with the OLS regression line and 95% confidence bands about the regression line.

  • The distribution of residuals for the regression. Residuals are not shown if a date or date/time column is chosen for the X variable.

  • A trend plot per Şen (2012) showing the second half of the values of the Y variable plotted on the ordinate against the first half of the Y values on the abcissa. A line with a 1:1 slope is also shown. An increasing trend in the Y variable is indicated by points above the line, and a decreasing trend by points below the line. There must be at least four data points. If there is an odd number of data points, the first one will be dropped.

The bivariate statistics summary showing the Sen trend plot

The table of statistics can be exported to a file using the keystroke Ctrl-S.

The scatter plot can be modified using the following hotkeys:

  • The opacity (alpha value) of the symbols on the plot can be changed using the Alt-A keystroke.

  • Display of a Theil-Sen line on the plot can be toggled on and off using the Alt-S keystroke.

  • The Alt-T, Alt-X, and Alt-Y keystrokes can be used to modify the plot title, X axis label, and Y axis label, respectively.

The scatter plot is similar to that which can be created using the Plot/General dialog. Differences are that this one does not display multiple groups, natural breaks, or a LOESS line, but does show the confidence band of the OLS regression line and the regression residuals.