Glossary¶
- ANOVA¶
Analysis of Variance. A statistical method to determine whether two or more groups of data values have the same distribution.
- API key¶
An alphanumeric code that identifies a user as having permission to use a specific Application Programming Interface. In mapdata, API keys must be used with some basemap providers.
- candidate key¶
- candidate keys¶
A column, or a set of columns, in a data table that contains a value, or a combination values, that is unique for every row in the data table. A candidate key can be used to uniquely identify every row. In a relational database, every table ordinarily should have at least one candidate key, one of which should be the primary key that is used by the DBMS to uniquely identify each row.
- case¶
- cases¶
In the context of data summarizations carried out by mapdata, a case is a single row of the data table. Depending on the context in which mapdata is used, a case may also be known as an observation or a sample, or by other terms.
- contingency table¶
A table with two rows and two columns that displays the relationship between two variables, each of which is divided into two categories. Categories may represent successes and failures, or positive responses and negative responses. The rows of the table contain the numbers of ‘successes’ and ‘failures’ for one variable, and the columns of the table contain the numbers of ‘successes’ and ‘failures’ for the other variable. The upper left cell of the table ordinarily contains the number of ‘successes’ for both variables, and the lower right cell contains the number of ‘failures’ for both variables.
- correlation matrix¶
A square table with rows and columns for the same set of numeric variables, where each cell of the table contains a correlation coefficient that describes how strongly the corresponding row and column variables co-vary. Several different types of correlation coefficients can be used. The table is symmetrical about the upper-left to lower-right diagonal.
- CRS¶
Coordinate Reference System. A short code (generally 4-6 digits) that identifies how a pair of geographic coordinates should be interpreted to represent a point on the surface of the Earth. The CRS indirectly specifies the mathematical ellipsoid used to describe the Earth’s shape, the origin of the coordinate system (e.g., on the Prime Meridian at the equator), and possibly a projection used to represent the curved surface of the Earth on a flat surface such as a map or computer monitor.
The acronyms SRS (Spatial Reference System) and SRID (Spatial Reference Identifier) are often used as synonyms for the CRS. Many CRS codes originated with the European Petroleum Survey Group (EPSG), and are therefore also known as EPSG codes.
- CSV¶
Comma-Separated Value. A text file format where successive items on a row are separated by commas, and any item containing a comma is double-quoted. Other items may also be double-quoted. Files with other types of delimiters and quotes may also be referred to, and treated as, CSV files.
- DBMS¶
Database Management System. Software that manages data, particularly stored in a relational database. DBMSs supported by mapdata are PostgreSQL, SQLite, MariaDB, MySQL, SQL Server, Oracle, Firebird, and DuckDB.
- EPSG code¶
See CRS.
- Euclidean distance¶
The shortest straight-line distance between two points in two or more dimensions. In two dimensions, on an X-Y plane, this is the length of the straight line connecting the points. The distance can be computed with the Pythagorean Theorem. This same calculation can be extended to any larger number of dimensions.
- Gaussian distribution¶
- Gaussian¶
See Normal distribution.
- GUI¶
Graphical User Interface. A computer interface composed of windows on the monitor screen, with prompts, buttons, list boxes, text entry areas, and other elements that the user can interact with using the mouse and keyboard.
- Jenks Natural Breaks¶
The result of a method that performs a one-dimensional cluster analysis of a set of data points. The method identifies the lower boundary of gaps between some data points that are large relative to most of the gaps between data points.
- non-parametric¶
For a set of data values, not conforming to any parameterized statistical distribution. In some contexts, this specifically means a non-Normal distribution.
- Normal distribution¶
- Normal¶
A statistical distribution described by a mean and standard deviation, having the shape of a bell curve. Also referred to as a Gaussian distribution.
- ODS¶
OpenDocument Spreadsheet. A spreadsheet format following the International Organization for Standardization (ISO) and International Electrotechnical Commission (IEC) standard 26300. The ODS format can be used with common open-source and commercial spreadsheet programs.
- OLS regression¶
- OLS¶
Ordinary Least-Squares regression. A method of fiting a line to a set of X,Y data pairs by minimizing the sum of the squared distances between the Y values and the line’s coordinate for the same X value.
- opacity¶
The ‘opaqueness’ of a symbol or other item on the screen. Opacity is the inverse of transparency. Opacity is represented by the alpha value, which is a number that ranges from 0.0 to 1.0. An object with an opacity of 0.0 is fully transparent, and an object with an opacity of 1.0 is fully opaque. Symbols, lines, and areas displayed in mapdata plots ordinarily can have their alpha value modified using the Alt-A keystroke
- PyPI¶
The Python Package Index.
- ROC¶
Receiver Operating Characteristics. Receiver operating characteristics are a set of values that characterize how well one variable predicts another variable. Example ROC values are the number of correct predictions and the number of incorrect predictions.
- SQL¶
Structured Query Language. The common language for selecting and summarizing data from relational databases.
- SRID¶
See CRS.
- SRS¶
See CRS.
- standard deviation¶
A measure of the spread of a set of data values around the mean, or average value. Although a standard deviation can be calculated for any set of data values, the standard deviation is the parameter that defines the shape of a Normal distribution.
- t-SNE¶
t-Distributed Stochastic Neighbor Embedding. A method for reducing the dimensionality of a multivariate data set to facilitate visualization of the relationships between items. See van der Maaten and Hinton (2008).
- UMAP¶
Uniform Manifold Approximation and Projection. A method for reducing the dimensionality of a multivariate data set to facilitate visualization of the relationships between items. See McInnes et al (2020).
- WGS84¶
World Geodetic System 1984. A geodetic system for representing the Earth’s surface that is in common use; it is the default coordinate reference system used by the Global Positioning System. The two-dimensional CRS for decimal degrees of latitude and longitude in WGS84 is 4326, and is the default CRS used by mapdata.