Reference
Glossary
Every term defined in this course, in plain English. Click a session link to see the term in context.
- Absolute Value
- The distance of a number from zero, ignoring the sign. |−3| = 3 and |3| = 3. Session 2
- Accuracy
- The percentage of predictions a model gets right: (correct predictions) / (total predictions). Can be dangerously misleading when classes are imbalanced. Session 10
- Activation Function
- A rule applied after a weighted sum in a neural network that decides whether a neuron fires. ReLU is the most common: positive values pass through, negative values become zero. Session 8
- AUC (Area Under the Curve)
- A metric ranging from 0.5 (coin flip) to 1.0 (perfect). An AUC of 0.91 means in 91% of random pairs (one positive, one negative), the model gives the positive case a higher score. Session 15
- Base Rate
- How common something is before you test for it. If 1% of the population has a disease, the base rate is 1%. This number determines how many false alarms a test will produce. Session 7
- Baseline
- The simplest possible model or existing system you compare your new model against. If your fancy model cannot beat the baseline, it is not worth deploying. Session 15
- Bayes' Theorem
- A formula for updating your beliefs when you get new evidence. P(hypothesis | evidence) = P(evidence | hypothesis) × P(hypothesis) / P(evidence). Session 7
- Bias (Neural Network)
- A constant number added to the weighted sum in a neuron, shifting the output up or down. Like the intercept b in y = mx + b. Different from statistical bias (systematic error). Session 8
- Calibration
- When a model's probability outputs match reality. A well-calibrated model that says "70% chance" is right about 70% of the time. Session 4
- Classification
- Predicting a category (like "spam" or "not fraud"). The opposite of regression, which predicts a number. Session 10
- Class Imbalance
- When one category has far more examples than another. If 98 out of 100 transactions are legitimate, the data is heavily imbalanced toward "not fraud." Session 1
- Coefficient
- The numbers m and b in y = mx + b that control how the function behaves. In ML, finding the best coefficients is the goal of training. Session 2
- Conditional Probability
- The probability of one event given that another has already happened. P(spam | contains "free") is very different from P(spam). Session 6
- Confidence Interval
- A range of values that likely contains the true answer. "23% improvement (95% CI: 10%-35%)" means we are 95% confident the true improvement is between 10% and 35%. Session 15
- Confusion Matrix
- A 2-by-2 table showing true positives, true negatives, false positives, and false negatives. The full picture of a classifier's performance. Session 10
- Cross-Validation
- Splitting data into several parts, training on some and testing on the rest, then rotating. Gives a more reliable performance estimate than a single train-test split. Session 15
- Data Leakage
- When information from the test data accidentally sneaks into the training process, making the model look better than it really is. Session 3
- Deep Neural Network
- A neural network with many layers stacked on top of each other. "Deep" just means "has many layers." Session 12
- Distribution
- How values are spread across a range. A histogram of house prices shows the distribution: where values cluster, where they thin out, and whether there are extreme outliers. Session 4
- Dot Product
- The sum of multiplying corresponding elements of two vectors. [0.5, 0.3] · [4, 8] = 0.5×4 + 0.3×8 = 4.4. What one neuron computes. Session 8
- Epoch
- One complete pass through all the training data. Training for 10 epochs means the model has seen the entire dataset 10 times. Session 1
- Euclidean Distance
- The straight-line distance between two points, like measuring with a ruler on a map. Used by algorithms like KNN and K-Means. Session 4
- F1 Score
- The harmonic mean of precision and recall. A single number that balances both. F1 = 2 × (P × R) / (P + R). Session 10
- False Negative
- A real case the model misses. A fraud transaction that the model says is legitimate. The thief gets away with the money. Session 10
- False Positive
- A legitimate case the model wrongly flags. A real transaction the model calls fraud. The customer's card gets frozen for no reason. Session 10
- Feature
- A piece of information you feed into a model. For a house: number of bedrooms, location, square footage. Each feature is one column in your dataset. Session 8
- Feature Engineering
- Choosing and transforming the input variables your model uses. Converting a date into "day of week" or combining two measurements into a ratio. Session 12
- Harmonic Mean
- A special average that penalizes imbalance. The regular average of 90% and 10% is 50%. The harmonic mean is only ~18% — much more honest about the weak number. Session 10
- Holdout Set
- Data deliberately set aside and not used during training. Used only at the end to check how the model performs on data it has never seen. Session 13
- Hyperparameter
- A setting you choose before training begins, like learning rate. The prefix "hyper" means "above" — it is a setting above the model, not learned from data. Session 9
- Learning Rate
- Controls how big each step is during gradient descent. Too big and you overshoot. Too small and training takes forever. The most important hyperparameter. Session 9
- Loss Function
- Measures how wrong your model is. Different loss functions define "wrong" differently: MSE punishes big mistakes harshly, MAE treats all errors equally. Session 9
- Matrix
- A grid of numbers. Your dataset is a matrix: each row is a sample, each column is a feature. A 1000-sample dataset with 10 features is a 1000×10 matrix. Session 8
- Mean
- The arithmetic average: sum divided by count. Sensitive to outliers. Session 3
- Median
- The middle value when data is sorted. Robust to outliers. Session 3
- Mean Squared Error (MSE)
- Average of the squared differences between predictions and true values. Punishes large errors much more than small ones. Session 9
- Model
- A mathematical function that takes inputs (features) and produces predictions. Trained on data to find the best internal parameters. Session 2
- Model Card
- A short document describing what a model does, how it was trained, its performance, and its limitations. Like a nutrition label for ML. Session 14
- Normal Distribution
- The bell curve. Symmetric: most values near the center, fewer at extremes. Many ML algorithms assume features follow this pattern. Session 4
- Overfitting
- When a model memorizes training data so well it fails on new data. Like a student who memorizes practice test answers instead of learning the concepts. Session 9
- Parameter
- Something the model learns from data, like weights. Different from a hyperparameter, which you set before training. Session 9
- Precision
- Of everything the model flagged as positive, how many actually were positive? High precision means few false alarms. Session 10
- Probability
- A number between 0 and 1 representing how likely something is. 0.8 means about 80% likely. The mathematical language of uncertainty. Session 6
- Recall
- Of everything that actually was positive, how many did the model catch? High recall means few missed cases. Session 10
- Regression
- Predicting a number (like a price or temperature). The opposite of classification, which predicts a category. Session 9
- Sensitivity
- Also called recall or true positive rate. The probability a test correctly identifies someone who actually has the condition. Session 6
- Standard Deviation
- The square root of variance. Tells you the typical distance of values from the mean, in the same units as the data. Session 3
- Threshold
- The cutoff point for a classifier. If the model outputs 72% probability and your threshold is 50%, you say "yes." Choosing the threshold is a business decision. Session 10
- Train-Test Split
- Splitting data into two parts: training (usually 80%) the model learns from, and test (20%) held back for final evaluation. Session 15
- Variance
- The average of the squared differences from the mean. Measures how spread out values are. High variance means the data (or model performance) is unreliable. Session 3
- Vector
- A list of numbers. The features of one data point form a vector. [3 bedrooms, 150 sqm, 2 bathrooms] is a 3-dimensional vector. Session 8
- Weight
- A number that controls how much each input feature influences the prediction. Like turning a dial up or down. Learned during training. Session 9
A
B
C
D
E
F
H
L
M
N
O
P
R
S
T
V
W