Glossary

Every term defined in this course, in plain English. Click a session link to see the term in context.

Absolute Value: The distance of a number from zero, ignoring the sign. |−3| = 3 and |3| = 3. Session 2
Accuracy: The percentage of predictions a model gets right: (correct predictions) / (total predictions). Can be dangerously misleading when classes are imbalanced. Session 10
Activation Function: A rule applied after a weighted sum in a neural network that decides whether a neuron fires. ReLU is the most common: positive values pass through, negative values become zero. Session 8
AUC (Area Under the Curve): A metric ranging from 0.5 (coin flip) to 1.0 (perfect). An AUC of 0.91 means in 91% of random pairs (one positive, one negative), the model gives the positive case a higher score. Session 15
Base Rate: How common something is before you test for it. If 1% of the population has a disease, the base rate is 1%. This number determines how many false alarms a test will produce. Session 7
Baseline: The simplest possible model or existing system you compare your new model against. If your fancy model cannot beat the baseline, it is not worth deploying. Session 15
Bayes' Theorem: A formula for updating your beliefs when you get new evidence. P(hypothesis | evidence) = P(evidence | hypothesis) × P(hypothesis) / P(evidence). Session 7
Bias (Neural Network): A constant number added to the weighted sum in a neuron, shifting the output up or down. Like the intercept b in y = mx + b. Different from statistical bias (systematic error). Session 8
Calibration: When a model's probability outputs match reality. A well-calibrated model that says "70% chance" is right about 70% of the time. Session 4
Classification: Predicting a category (like "spam" or "not fraud"). The opposite of regression, which predicts a number. Session 10
Class Imbalance: When one category has far more examples than another. If 98 out of 100 transactions are legitimate, the data is heavily imbalanced toward "not fraud." Session 1
Coefficient: The numbers m and b in y = mx + b that control how the function behaves. In ML, finding the best coefficients is the goal of training. Session 2
Conditional Probability: The probability of one event given that another has already happened. P(spam | contains "free") is very different from P(spam). Session 6
Confidence Interval: A range of values that likely contains the true answer. "23% improvement (95% CI: 10%-35%)" means we are 95% confident the true improvement is between 10% and 35%. Session 15
Confusion Matrix: A 2-by-2 table showing true positives, true negatives, false positives, and false negatives. The full picture of a classifier's performance. Session 10
Cross-Validation: Splitting data into several parts, training on some and testing on the rest, then rotating. Gives a more reliable performance estimate than a single train-test split. Session 15
Data Leakage: When information from the test data accidentally sneaks into the training process, making the model look better than it really is. Session 3
Deep Neural Network: A neural network with many layers stacked on top of each other. "Deep" just means "has many layers." Session 12
Distribution: How values are spread across a range. A histogram of house prices shows the distribution: where values cluster, where they thin out, and whether there are extreme outliers. Session 4
Dot Product: The sum of multiplying corresponding elements of two vectors. [0.5, 0.3] · [4, 8] = 0.5×4 + 0.3×8 = 4.4. What one neuron computes. Session 8
Epoch: One complete pass through all the training data. Training for 10 epochs means the model has seen the entire dataset 10 times. Session 1
Euclidean Distance: The straight-line distance between two points, like measuring with a ruler on a map. Used by algorithms like KNN and K-Means. Session 4
F1 Score: The harmonic mean of precision and recall. A single number that balances both. F1 = 2 × (P × R) / (P + R). Session 10
False Negative: A real case the model misses. A fraud transaction that the model says is legitimate. The thief gets away with the money. Session 10
False Positive: A legitimate case the model wrongly flags. A real transaction the model calls fraud. The customer's card gets frozen for no reason. Session 10
Feature: A piece of information you feed into a model. For a house: number of bedrooms, location, square footage. Each feature is one column in your dataset. Session 8
Feature Engineering: Choosing and transforming the input variables your model uses. Converting a date into "day of week" or combining two measurements into a ratio. Session 12
Harmonic Mean: A special average that penalizes imbalance. The regular average of 90% and 10% is 50%. The harmonic mean is only ~18% — much more honest about the weak number. Session 10
Holdout Set: Data deliberately set aside and not used during training. Used only at the end to check how the model performs on data it has never seen. Session 13
Hyperparameter: A setting you choose before training begins, like learning rate. The prefix "hyper" means "above" — it is a setting above the model, not learned from data. Session 9
Learning Rate: Controls how big each step is during gradient descent. Too big and you overshoot. Too small and training takes forever. The most important hyperparameter. Session 9
Loss Function: Measures how wrong your model is. Different loss functions define "wrong" differently: MSE punishes big mistakes harshly, MAE treats all errors equally. Session 9
Matrix: A grid of numbers. Your dataset is a matrix: each row is a sample, each column is a feature. A 1000-sample dataset with 10 features is a 1000×10 matrix. Session 8
Mean: The arithmetic average: sum divided by count. Sensitive to outliers. Session 3
Median: The middle value when data is sorted. Robust to outliers. Session 3
Mean Squared Error (MSE): Average of the squared differences between predictions and true values. Punishes large errors much more than small ones. Session 9
Model: A mathematical function that takes inputs (features) and produces predictions. Trained on data to find the best internal parameters. Session 2
Model Card: A short document describing what a model does, how it was trained, its performance, and its limitations. Like a nutrition label for ML. Session 14
Normal Distribution: The bell curve. Symmetric: most values near the center, fewer at extremes. Many ML algorithms assume features follow this pattern. Session 4
Overfitting: When a model memorizes training data so well it fails on new data. Like a student who memorizes practice test answers instead of learning the concepts. Session 9
Parameter: Something the model learns from data, like weights. Different from a hyperparameter, which you set before training. Session 9
Precision: Of everything the model flagged as positive, how many actually were positive? High precision means few false alarms. Session 10
Probability: A number between 0 and 1 representing how likely something is. 0.8 means about 80% likely. The mathematical language of uncertainty. Session 6
Recall: Of everything that actually was positive, how many did the model catch? High recall means few missed cases. Session 10
Regression: Predicting a number (like a price or temperature). The opposite of classification, which predicts a category. Session 9
Sensitivity: Also called recall or true positive rate. The probability a test correctly identifies someone who actually has the condition. Session 6
Standard Deviation: The square root of variance. Tells you the typical distance of values from the mean, in the same units as the data. Session 3
Threshold: The cutoff point for a classifier. If the model outputs 72% probability and your threshold is 50%, you say "yes." Choosing the threshold is a business decision. Session 10
Train-Test Split: Splitting data into two parts: training (usually 80%) the model learns from, and test (20%) held back for final evaluation. Session 15
Variance: The average of the squared differences from the mean. Measures how spread out values are. High variance means the data (or model performance) is unreliable. Session 3
Vector: A list of numbers. The features of one data point form a vector. [3 bedrooms, 150 sqm, 2 bathrooms] is a 3-dimensional vector. Session 8
Weight: A number that controls how much each input feature influences the prediction. Like turning a dial up or down. Learned during training. Session 9