Session 1
Numbers That Matter
Percentages, Ratios, and Rates of Change
Concept Lesson
Imagine you just joined the ML team at a Lagos fintech startup that builds credit scoring models. On your first day, your team lead drops a training log in front of you and says, “Our model hit 92% accuracy.” You nod, but what does 92% actually mean? A percentage is a fraction of 100. If your model processes 1,000 loan applications and correctly predicts whether each applicant will repay or default 920 times out of 1,000, that is 920 ÷ 1,000 × 100 = 92%. But here is where it gets tricky: if only 5% of applicants in your dataset actually default, a lazy model that predicts “everyone repays” would score 95% accuracy without learning anything useful. This is why in ML you can never look at a percentage in isolation—you need to ask what it is measuring, what the baseline is, and whether the number tells the full story of model performance.
Ratios compare two quantities directly, and they show up constantly in data science work. A dataset with a 60/40 class split means that for every 60 samples of class A, there are 40 of class B. You can express this as 60:40, simplify it to 3:2, or say class A is 1.5 times more common. Why does this matter? Because a model trained on a 60/40 split will behave very differently from one trained on a 99/1 split. Consider your fintech credit scoring scenario: if only 2% of historical applicants defaulted, your training data has a 98/2 ratio. The model might learn to ignore the minority class entirely because it is never penalized for doing so. Recognizing and quantifying these ratios is the first step toward fixing class imbalance—which means one category has far more examples than the other — for example, if 98 out of every 100 transactions are legitimate and only 2 are fraud, the dataset is heavily imbalanced. But the good news is: there are specific techniques to fix this — oversampling (creating more examples of the rare class), undersampling (removing some examples of the common class), or adjusting class weights (telling the model to pay more attention to mistakes on the rare class) — but those come later in your ML journey. The ratio tells you the problem; your job is to design the solution.
Rates of change tell you how fast something is moving, and in ML the thing that is usually moving is your training loss (a number that measures how wrong the model's predictions are — we will explore this deeply in Week 5). If your loss started at 3.50 at the beginning of epoch 1 (one complete pass through all the training data) and dropped to 2.80 by the end, the absolute change is 2.80 − 3.50 = −0.70, and the percentage change is (−0.70 ÷ 3.50) × 100 = −20%. Now suppose in epoch 5 the loss only drops from 1.02 to 1.00: that is an absolute change of −0.02 but a percentage change of roughly −2%. The rate of change is slowing down dramatically, which tells you the model is converging — meaning the model is settling into its best performance and further training produces smaller and smaller improvements —it is squeezing out the last bits of improvement and additional epochs may not be worth the compute cost. A common mistake is confusing absolute and percentage improvement: going from 72% to 81% accuracy is a 9-percentage-point absolute gain, but the relative improvement is (81 − 72) ÷ 72 × 100 = 12.5%. Both numbers matter, but they answer different questions—absolute change tells you the raw difference, while percentage change tells you how significant that difference is relative to where you started.
Guided Exercises
Discussion Prompt
A company claims their new model is “50% better” than the previous version. Before you believe this claim, what specific questions would you ask? Consider: 50% better at what metric? Compared to what baseline? On what dataset? Over how many runs? Could “50% better” mean accuracy went from 40% to 60% (still poor), or from 90% to 95% (impressive)? Draft three precise questions you would email the team.
Key Takeaway
Numbers without context are meaningless. Always ask: percentage of what? Compared to what? Over what time period? A single number never tells the full story—the context around it determines whether it is a triumph or a trap.