Receiver Operating Characteristics¶

This summary displays a Receiver Operating Characteristics (ROC) curve (Fawcett 2006) and a table of related statistics for a user-specified condition variable and predictor variable.

This tool displays graphical and tabular relationships between a condition variable and a predictor variable. The condition variable separates the set of cases (data rows) into two sets, nominally ‘positive’ and ‘negative’. A threshold value for the predictor variable also separates the cases into ‘positive’ and ‘negative’ subsets. The ROC curve and statistics provide an assessment of the performance of the predictor variable at predicting ‘positive’ and ‘negative’ values of the condition variable. The ROC curve visually represents the performance of different threshold values, and the ROC statistics quantify the performance of a single selected threshold value for the predictor variable.

The ROC dialog prompts for:

Whether to use all data or just the selected rows from the data table

A numeric or categorical condition variable.

How to subdivide the values of the condition variable into ‘positive’ and ‘negative’ subsets. For numeric variables, an upper or lower threshold must be specified. For categorical variables, the ‘positive’ group of values must be selected from a list box.

A numeric predictor variable.

A threshold value for the predictor variable.

The dialog then displays a Receiver Operating Characteristic (ROC) curve (Fawcett 2006), and ROC statistics for the threshold value of the predictor variable. The ROC plot and table are immediately updated if any changes are made to the user’s selections on this dialog or, if only selected data are being used, changes are made to the selected data.

The predictor variable must be numeric. The ROC curve and statistics will be displayed as soon as the condition variable, positive condition, and predictor variable are specified. The statistics that are initially displayed will be for the default predictor threshold value of 0.0. Modifying the predictor threshold will change the ROC statistics but not affect the ROC curve.

Terminology used for ROC curves and associated statistics varies depending on the field and author. The senses of condition and predictor variables may be reversed, for instance, and the statistics listed below may be known by other names. For example, sensitivity is also known as the true positive rate, hit rate, probability of detection, and power.

The ROC statistics that are displayed are:

The total number of observations.

The number of actual positive values for the condition variable.

The number of actual negative values for the condition variable.

The name of the predictor variable.

The minimum value of the predictor variable in the data set.

The maximum value of the predictor variable in the data set.

The chosen prediction threshold.

The number of predicted positive values.

The number of predicted negative values.

The number of correctly predicted positive values.

The number of correctly predicted negative values.

The number of false positives.

The number of false negatives.

The sensitivity, or the number of correctly predicted positive values as a fraction of the number of actual positive values.

The specificity, or the number of correctly predicted negative values as a fraction of the number of actual negative values.

The precision, or the number of correctly predicted positive values as a fraction of the total number of correct predictions.

The positive likelihood ratio (LR+) for the threshold value (Hajian-Tilaki 2013, Nahm 2022).

The negative likelihood ratio (LR-) for the threshold value (Hajian-Tilaki 2013, Nahm 2022).

The maximum value of LR+ for all data points.

The false positive rate, or the number of false positives as a fraction of the number of actual negative values.

The false negative rate, or the number of false negatives as fraction of the number of actual positive values.

The critical success index, or the number of correctly predicted positive values as a fraction of the total number of correct predictions, false positives, and false negatives. This is also known as the Jaccard Index and the F^* metric of Hand et al. (2021).

The accuracy, or the total number of correct predictions as a fraction of the total number of observations.

Youden’s J statistic (Nahm 2022): The difference in sensitivity between the threshold value and the diagonal line that represents random predictive ability.

The maximum value of Youden’s J statistic that is found for all data points.

The Euclidean distance (ED: Nahm 2022) from the ROC point represented by the threshold value and the upper-left corner of the ROC plot.

The minimum value of ED for all data points.

Some of these statistics will not be displayed if there are not any actual positives or actual negatives in the selected data.

The table of ROC statistics can be exported to a CSV file or spreadsheet with the Ctrl-S keystroke.

Receiver Operating Characteristics¶

Contents

Go To

Search