Objective Interpret and calculate percentages, ratios, and rates of change in ML contexts.

Concept Lesson

Imagine you just joined the ML team at a Lagos fintech startup that builds credit scoring models. On your first day, your team lead drops a training log in front of you and says, “Our model hit 92% accuracy.” You nod, but what does 92% actually mean? A percentage is a fraction of 100. If your model processes 1,000 loan applications and correctly predicts whether each applicant will repay or default 920 times out of 1,000, that is 920 ÷ 1,000 × 100 = 92%. But here is where it gets tricky: if only 5% of applicants in your dataset actually default, a lazy model that predicts “everyone repays” would score 95% accuracy without learning anything useful. This is why in ML you can never look at a percentage in isolation—you need to ask what it is measuring, what the baseline is, and whether the number tells the full story of model performance.

Ratios compare two quantities directly, and they show up constantly in data science work. A dataset with a 60/40 class split means that for every 60 samples of class A, there are 40 of class B. You can express this as 60:40, simplify it to 3:2, or say class A is 1.5 times more common. Why does this matter? Because a model trained on a 60/40 split will behave very differently from one trained on a 99/1 split. Consider your fintech credit scoring scenario: if only 2% of historical applicants defaulted, your training data has a 98/2 ratio. The model might learn to ignore the minority class entirely because it is never penalized for doing so. Recognizing and quantifying these ratios is the first step toward fixing class imbalance—which means one category has far more examples than the other — for example, if 98 out of every 100 transactions are legitimate and only 2 are fraud, the dataset is heavily imbalanced. But the good news is: there are specific techniques to fix this — oversampling (creating more examples of the rare class), undersampling (removing some examples of the common class), or adjusting class weights (telling the model to pay more attention to mistakes on the rare class) — but those come later in your ML journey. The ratio tells you the problem; your job is to design the solution.

Rates of change tell you how fast something is moving, and in ML the thing that is usually moving is your training loss (a number that measures how wrong the model's predictions are — we will explore this deeply in Week 5). If your loss started at 3.50 at the beginning of epoch 1 (one complete pass through all the training data) and dropped to 2.80 by the end, the absolute change is 2.80 − 3.50 = −0.70, and the percentage change is (−0.70 ÷ 3.50) × 100 = −20%. Now suppose in epoch 5 the loss only drops from 1.02 to 1.00: that is an absolute change of −0.02 but a percentage change of roughly −2%. The rate of change is slowing down dramatically, which tells you the model is converging — meaning the model is settling into its best performance and further training produces smaller and smaller improvements —it is squeezing out the last bits of improvement and additional epochs may not be worth the compute cost. A common mistake is confusing absolute and percentage improvement: going from 72% to 81% accuracy is a 9-percentage-point absolute gain, but the relative improvement is (81 − 72) ÷ 72 × 100 = 12.5%. Both numbers matter, but they answer different questions—absolute change tells you the raw difference, while percentage change tells you how significant that difference is relative to where you started.

Guided Exercises

Exercise 1: Your fraud detection model’s accuracy goes from 0.72 to 0.81 after retraining on a cleaned dataset. Step 1: Calculate the absolute improvement (subtract the old value from the new). Step 2: Calculate the percentage improvement relative to the original 0.72. Step 3: Are these two numbers the same? Explain why a startup founder might report the percentage improvement while an engineer might focus on the absolute gap. In what situations could the percentage improvement be misleading?

Exercise 2: Dataset A has 10,000 samples with a 70/30 class split (7,000 class A, 3,000 class B). Dataset B has 500 samples with a 50/50 split (250 each). Express each dataset’s class balance as a simplified ratio. Which dataset might cause more problems for training a classification model, and why? Consider both the imbalance ratio and the total sample size—a small but balanced dataset has its own dangers (overfitting, high variance) that a large but slightly imbalanced dataset might not.

Exercise 3: Your training loss over 5 epochs is: 3.2, 2.1, 1.5, 1.3, 1.25. Calculate the rate of change (absolute and percentage) between each consecutive pair of epochs. What pattern do you notice in the rates? At which epoch does the improvement slow down most dramatically? Based on this, would you recommend training for 10 more epochs or stopping early? Justify your answer using the numbers.

Discussion Prompt

A company claims their new model is “50% better” than the previous version. Before you believe this claim, what specific questions would you ask? Consider: 50% better at what metric? Compared to what baseline? On what dataset? Over how many runs? Could “50% better” mean accuracy went from 40% to 60% (still poor), or from 90% to 95% (impressive)? Draft three precise questions you would email the team.

Key Takeaway

Numbers without context are meaningless. Always ask: percentage of what? Compared to what? Over what time period? A single number never tells the full story—the context around it determines whether it is a triumph or a trap.

Objective Understand the concept of a function as a machine that maps inputs to outputs, and connect this directly to how ML models work.

Concept Lesson

Imagine you are building a house price predictor for the Nigerian real estate market. A potential buyer enters three details: the number of bedrooms, the square footage, and the neighbourhood. Your system takes those inputs, runs them through some rule, and spits out a price in Naira. That rule is a function. In mathematical notation, you might write f(bedrooms, sqft, neighbourhood) = price. A function is simply a machine that accepts inputs and produces exactly one output for each set of inputs. Give it the same inputs twice, and you get the same output every time. This is the core idea behind every ML model you will ever use: you feed in features (inputs), the model applies learned rules (the function), and you get a prediction (the output). The only difference between a simple function like f(x) = 2x + 3 and a trained neural network is complexity—the neural network is still a function, just one with millions of parameters tuned by data rather than written by hand.

Linear functions follow the pattern y = mx + b, and they are the backbone of linear regression, one of the most widely used ML algorithms. The slope (m) tells you how much the output changes when you increase the input by one unit. The intercept (b) is the output when the input is zero. Consider a salary prediction model for tech workers in Nairobi: predicted_salary = 5000 × years_of_experience + 300,000 (in KSh). Here, m = 5,000 means each additional year of experience adds KSh 5,000 to the predicted monthly salary, and b = 300,000 is the baseline salary for someone with zero years of experience. Linear functions are powerful because they are easy to interpret: you can explain to a non-technical stakeholder exactly what each coefficient means. In the equation y = mx + b, the numbers m and b are called coefficients — they are the specific values that control how the function behaves. In ML, finding the best coefficients is literally the goal of training. The danger is assuming everything is linear when it is not. If you fit a straight line to curved data, your predictions will be systematically wrong in predictable ways—overestimating in some regions and underestimating in others.

Non-linear functions curve, and most real-world ML relationships are non-linear. Consider y = x²: when x goes from 1 to 2, y goes from 1 to 4 (a change of 3), but when x goes from 5 to 6, y goes from 25 to 36 (a change of 11). The output grows faster and faster as the input increases. This is exactly what happens with model accuracy as you add training data. Suppose your model’s accuracy follows this pattern: 100 samples gives 60% accuracy, 1,000 gives 78%, 10,000 gives 85%, and 100,000 gives 88%. The relationship is clearly not linear—going from 100 to 1,000 samples (a 10x increase) buys you 18 percentage points, but going from 10,000 to 100,000 (another 10x increase) only buys you 3 points. This diminishing-returns curve is one of the most important patterns in machine learning, and recognizing it prevents you from wasting resources collecting data that will barely improve your model. The function is still mapping input (data size) to output (accuracy), but the relationship curves downward instead of rising in a straight line.

Guided Exercises

Exercise 1: A simple salary prediction model for an Accra-based company is: predicted_salary = 5,000 × years_experience + 30,000 (in GHS). What does the 5,000 mean in plain English? What does the 30,000 represent? Predict the salary for someone with 0, 3, 7, and 15 years of experience. At what point does the model start to feel unrealistic, and why might a linear model break down for very senior roles?

Exercise 2: On paper, plot the function y = x² for x = −3 to 3. Now plot y = |x| — absolute value (the distance from zero, so |−3| = 3 and |3| = 3; it simply removes the negative sign) — on the same axes. Describe the difference in words. Which function “punishes” large errors more harshly? This connects to loss functions: two common ways to measure error in ML are Mean Squared Error (MSE) — which squares each error before averaging, so big mistakes get punished much harder — and Mean Absolute Error (MAE) — which just averages the raw errors, treating all mistakes equally. MSE uses a squared term and penalizes big errors much more than MAE. If a model makes one prediction that is off by 10 units, how much worse does MSE penalize it compared to MAE?

Exercise 3: Your model’s accuracy as a function of training data size is: 100 samples → 60%, 1,000 → 78%, 10,000 → 85%, 100,000 → 88%. Plot these points. Is this relationship linear, logarithmic, or something else? Your manager wants to hit 95% accuracy. Based on the curve, roughly how many samples would you need? Is collecting that much data realistic? What alternatives might you propose instead?

Discussion Prompt

If an ML model is “just a function,” what makes it different from the functions you learned in secondary school? What is the same? Consider: Who writes the rules? How complex can the function be? How do you know if the function is “good”? And what happens when the function encounters inputs it has never seen before?

Key Takeaway

Every ML model is a function: features go in, predictions come out. Understanding simple linear and non-linear functions builds the intuition you need to reason about complex models. If you can read f(x) = mx + b, you can read a trained linear regression. If you understand that x² grows faster as x increases, you understand diminishing returns in data collection. The math is simpler than you think—the hard part is applying it well.

Numbers & Proportions

Numbers That Matter

Concept Lesson

Guided Exercises

Quick Check

Key Terms — Tap to Flip

Thinking in Functions

Concept Lesson

Guided Exercises

Quick Check

Key Terms — Tap to Flip

Interactive Tools