Bivariate Statistics¶
This summary describes the relationship between two numeric variables. It presents several measures of the presence, or strength, of a linear or monotonic relationship between two selected variables. The relationship can be assessed for the untransformed variables, or one or both variables can be log10-transformed.
The bivariate statistics dialog prompts for:
Two numeric variables.
Whether to use log-transformed values for each of them.
Whether to use all data in the data table or just the subset that has been selected (e.g., by clicking on the table or map).
After these selection have been made, the dialog then displays a table of bivariate statistics and a scatter plot of the two variables with an ordinary least-squares (OLS) regression line. The table of statistics is immediately updated if any changes are made to the user’s selections on this dialog or to the data used.
The summary statistics can be displayed for either all data in the data table or for only the data rows that have been selected. Data rows where either of the variables are missing are not included in the summary, and there must be at least three rows of data for the calculations to be carried out.
The summary includes a table of statistics on the left and three plots on separate tabs to the right. The statistics that are displayed in the table are:
The name of the X variable
The name of the Y variable
N, the number of data points for which neither variable is missing
Covariance
Pearson’s correlation coefficient r (Wright 1921)
p value for the hypothesis that r is zero
Spearman’s correlation coefficient rho (Spearman 1904)
p value for the hypothesis that rho is zero
Kendall’s correlation coefficient tau (Kendall 1938)
p value for the hypothesis that tau is zero
Chatterjee’s correlation coefficient xi (Chatterjee 2020)
Slope of an ordinary least-squares (OLS) linear regression
Intercept of the OLS linear regression
R-squared (R2) for the regression
Adjusted R-squared for the regression
Total sum of squares (SS) for the regression
Sum of squares explained by the regression
Residual sum of squares, not explained by the regression
Coefficient of determination in the scale of the standard deviation (CoDSD) (Berggren 2024)
p value for the hypothesis that the regression slope is zero
p value for the hypothesis that the regression intercept is zero
Akaike’s Information Criterion (AIC) for the regression
Bayesian Information Criterion (BIC) for the regression
p value for the Breusch-Pagan test for heteroskedasticity of the OLS residuals; also see the plot of residuals
p value for a Mann-Kendall Trend Test of the Y variable. There must be at least four rows of data for this to be evaluated.
p value for a Runs Test of the Y variable, using the median as a cutoff and with a small-sample correction for N < 50 (per NIST)
Theil-Sen slope
95% confidence interval on the Theil-Sen slope
Theil-Sen intercept, calculated using the median and the Theil-Sen slope.
The robust R-squared (R2) per Kvålseth (1985). This is calculated with respect to the Theil-Sen line rather than the OLS regression.
If a date or date/time column is chosen for the X variable, only the number of points and the results of the Mann-Kendall and Runs tests will be shown.
The three plots to the right of the data table are on separate tabs. These plots are:
A scatter plot showing all of the individual data points with the OLS regression line and 95% confidence bands about the regression line.
The distribution of residuals for the regression. Residuals are not shown if a date or date/time column is chosen for the X variable.
A trend plot per Şen (2012) showing the second half of the values of the Y variable plotted on the ordinate against the first half of the Y values on the abcissa. A line with a 1:1 slope is also shown. An increasing trend in the Y variable is indicated by points above the line, and a decreasing trend by points below the line. There must be at least four data points. If there is an odd number of data points, the first one will be dropped.
The table of statistics can be exported to a file using the keystroke Ctrl-S.
The scatter plot can be modified using the following hotkeys:
The opacity (alpha value) of the symbols on the plot can be changed using the Alt-A keystroke.
Display of a Theil-Sen line on the plot can be toggled on and off using the Alt-S keystroke.
The Alt-T, Alt-X, and Alt-Y keystrokes can be used to modify the plot title, X axis label, and Y axis label, respectively.
The scatter plot is similar to that which can be created using the Plot/General dialog. Differences are that this one does not display multiple groups, natural breaks, or a LOESS line, but does show the confidence band of the OLS regression line and the regression residuals.