Skip to main content
Back to Syllabus

Reference

Glossary

Every term defined in this course, in plain English. Click a session link to see the term in context.

A
Absolute Value
The distance of a number from zero, ignoring the sign. |−3| = 3 and |3| = 3. Session 2
Accuracy
The percentage of predictions a model gets right: (correct predictions) / (total predictions). Can be dangerously misleading when classes are imbalanced. Session 10
Activation Function
A rule applied after a weighted sum in a neural network that decides whether a neuron fires. ReLU is the most common: positive values pass through, negative values become zero. Session 8
AUC (Area Under the Curve)
A metric ranging from 0.5 (coin flip) to 1.0 (perfect). An AUC of 0.91 means in 91% of random pairs (one positive, one negative), the model gives the positive case a higher score. Session 15
B
Base Rate
How common something is before you test for it. If 1% of the population has a disease, the base rate is 1%. This number determines how many false alarms a test will produce. Session 7
Baseline
The simplest possible model or existing system you compare your new model against. If your fancy model cannot beat the baseline, it is not worth deploying. Session 15
Bayes' Theorem
A formula for updating your beliefs when you get new evidence. P(hypothesis | evidence) = P(evidence | hypothesis) × P(hypothesis) / P(evidence). Session 7
Bias (Neural Network)
A constant number added to the weighted sum in a neuron, shifting the output up or down. Like the intercept b in y = mx + b. Different from statistical bias (systematic error). Session 8
C
Calibration
When a model's probability outputs match reality. A well-calibrated model that says "70% chance" is right about 70% of the time. Session 4
Classification
Predicting a category (like "spam" or "not fraud"). The opposite of regression, which predicts a number. Session 10
Class Imbalance
When one category has far more examples than another. If 98 out of 100 transactions are legitimate, the data is heavily imbalanced toward "not fraud." Session 1
Coefficient
The numbers m and b in y = mx + b that control how the function behaves. In ML, finding the best coefficients is the goal of training. Session 2
Conditional Probability
The probability of one event given that another has already happened. P(spam | contains "free") is very different from P(spam). Session 6
Confidence Interval
A range of values that likely contains the true answer. "23% improvement (95% CI: 10%-35%)" means we are 95% confident the true improvement is between 10% and 35%. Session 15
Confusion Matrix
A 2-by-2 table showing true positives, true negatives, false positives, and false negatives. The full picture of a classifier's performance. Session 10
Cross-Validation
Splitting data into several parts, training on some and testing on the rest, then rotating. Gives a more reliable performance estimate than a single train-test split. Session 15
D
Data Leakage
When information from the test data accidentally sneaks into the training process, making the model look better than it really is. Session 3
Deep Neural Network
A neural network with many layers stacked on top of each other. "Deep" just means "has many layers." Session 12
Distribution
How values are spread across a range. A histogram of house prices shows the distribution: where values cluster, where they thin out, and whether there are extreme outliers. Session 4
Dot Product
The sum of multiplying corresponding elements of two vectors. [0.5, 0.3] · [4, 8] = 0.5×4 + 0.3×8 = 4.4. What one neuron computes. Session 8
E
Epoch
One complete pass through all the training data. Training for 10 epochs means the model has seen the entire dataset 10 times. Session 1
Euclidean Distance
The straight-line distance between two points, like measuring with a ruler on a map. Used by algorithms like KNN and K-Means. Session 4
F
F1 Score
The harmonic mean of precision and recall. A single number that balances both. F1 = 2 × (P × R) / (P + R). Session 10
False Negative
A real case the model misses. A fraud transaction that the model says is legitimate. The thief gets away with the money. Session 10
False Positive
A legitimate case the model wrongly flags. A real transaction the model calls fraud. The customer's card gets frozen for no reason. Session 10
Feature
A piece of information you feed into a model. For a house: number of bedrooms, location, square footage. Each feature is one column in your dataset. Session 8
Feature Engineering
Choosing and transforming the input variables your model uses. Converting a date into "day of week" or combining two measurements into a ratio. Session 12
H
Harmonic Mean
A special average that penalizes imbalance. The regular average of 90% and 10% is 50%. The harmonic mean is only ~18% — much more honest about the weak number. Session 10
Holdout Set
Data deliberately set aside and not used during training. Used only at the end to check how the model performs on data it has never seen. Session 13
Hyperparameter
A setting you choose before training begins, like learning rate. The prefix "hyper" means "above" — it is a setting above the model, not learned from data. Session 9
L
Learning Rate
Controls how big each step is during gradient descent. Too big and you overshoot. Too small and training takes forever. The most important hyperparameter. Session 9
Loss Function
Measures how wrong your model is. Different loss functions define "wrong" differently: MSE punishes big mistakes harshly, MAE treats all errors equally. Session 9
M
Matrix
A grid of numbers. Your dataset is a matrix: each row is a sample, each column is a feature. A 1000-sample dataset with 10 features is a 1000×10 matrix. Session 8
Mean
The arithmetic average: sum divided by count. Sensitive to outliers. Session 3
Median
The middle value when data is sorted. Robust to outliers. Session 3
Mean Squared Error (MSE)
Average of the squared differences between predictions and true values. Punishes large errors much more than small ones. Session 9
Model
A mathematical function that takes inputs (features) and produces predictions. Trained on data to find the best internal parameters. Session 2
Model Card
A short document describing what a model does, how it was trained, its performance, and its limitations. Like a nutrition label for ML. Session 14
N
Normal Distribution
The bell curve. Symmetric: most values near the center, fewer at extremes. Many ML algorithms assume features follow this pattern. Session 4
O
Overfitting
When a model memorizes training data so well it fails on new data. Like a student who memorizes practice test answers instead of learning the concepts. Session 9
P
Parameter
Something the model learns from data, like weights. Different from a hyperparameter, which you set before training. Session 9
Precision
Of everything the model flagged as positive, how many actually were positive? High precision means few false alarms. Session 10
Probability
A number between 0 and 1 representing how likely something is. 0.8 means about 80% likely. The mathematical language of uncertainty. Session 6
R
Recall
Of everything that actually was positive, how many did the model catch? High recall means few missed cases. Session 10
Regression
Predicting a number (like a price or temperature). The opposite of classification, which predicts a category. Session 9
S
Sensitivity
Also called recall or true positive rate. The probability a test correctly identifies someone who actually has the condition. Session 6
Standard Deviation
The square root of variance. Tells you the typical distance of values from the mean, in the same units as the data. Session 3
T
Threshold
The cutoff point for a classifier. If the model outputs 72% probability and your threshold is 50%, you say "yes." Choosing the threshold is a business decision. Session 10
Train-Test Split
Splitting data into two parts: training (usually 80%) the model learns from, and test (20%) held back for final evaluation. Session 15
V
Variance
The average of the squared differences from the mean. Measures how spread out values are. High variance means the data (or model performance) is unreliable. Session 3
Vector
A list of numbers. The features of one data point form a vector. [3 bedrooms, 150 sqm, 2 bathrooms] is a 3-dimensional vector. Session 8
W
Weight
A number that controls how much each input feature influences the prediction. Like turning a dial up or down. Learned during training. Session 9