Week 7 of 8

Communicating ML

Develop the ability to communicate ML concepts and results to diverse audiences, and write clear, reproducible technical descriptions. These are career-defining skills that separate practitioners from engineers.

Before this week

✓ Reading ML papers

✓ Logical reasoning

Objective Translate ML concepts and results into clear, accurate language that non-technical stakeholders can understand and act on.

Concept Lesson

Your fraud detection model is ready. It has been trained on 2.3 million transactions from a Lagos-based fintech, tested on a holdout set of 580,000 transactions (A holdout set — also called a test set — is a portion of data you deliberately set aside and do NOT use during training. You use it only at the end to check how well the model performs on data it has never seen before. This is like studying with practice tests but saving one final exam to see if you actually learned.), and it achieves 90% precision (Precision: of everything the model flagged, how many were actually positive? Recall: of all the actual positives, how many did the model catch? We covered these in detail in Week 5.). The bank's risk committee — the CFO, the head of operations, and the compliance officer, none of whom have written a line of code in their lives — have given you 10 minutes in their Thursday meeting to make your case. They want to know one thing: should they deploy this model, and what happens if they do? You have spent three months building this system, and now your entire project hinges on whether you can explain it clearly to people who think in terms of risk exposure, customer complaints, and regulatory fines — not F1 scores and confusion matrices.

The core skill here is finding the right abstraction level for each audience. Your CEO does not need to know about gradient descent, training configurations, or the difference between regularization techniques. But they absolutely need to know why the model sometimes flags legitimate transactions as fraudulent, how often that happens, and what the cost of those errors is in customer churn. The principle is simple: match your explanation to what the listener needs to make a decision. A useful structure for any stakeholder presentation is three questions: What does it do? How well does it work? What are its limitations? If you answer those three questions clearly and honestly, you have done your job. For instance, when you tell the risk committee that the model is "like an experienced fraud analyst who has reviewed 100,000 cases and can spot patterns a junior analyst would miss, but who also occasionally flags a normal purchase because it looks unusual," you have communicated capability, mechanism, and failure mode in one sentence without using any technical vocabulary.

Honesty about uncertainty is not a liability — it is what builds trust. Consider the difference: "Our AI will solve your fraud problem" versus "Our model catches 90 out of every 100 fraudulent transactions, and it incorrectly flags about 3 out of every 100 legitimate ones. That means for every 10,000 transactions, roughly 300 customers will have a payment temporarily blocked. We recommend a quick manual review queue so those customers are inconvenienced for minutes, not hours." The second statement is longer, harder to deliver, and less exciting. But it is the one that lets the CFO model the customer support cost, the compliance officer assess the regulatory exposure, and the head of operations plan the manual review workflow. When you hide or downplay limitations, you do not just erode trust — you make it impossible for decision-makers to plan around the real behavior of your system. The most effective ML communicators are the ones who present limitations as clearly as capabilities, because that clarity is what gives an organization the confidence to actually rely on your work.

A common mistake is to assume that non-technical audiences cannot handle nuance. They can. What they cannot handle is jargon, unexplained acronyms, or presentations where the speaker buries caveats in footnotes hoping nobody notices. Another mistake is over-hyping: telling a bank board that your model "uses advanced deep learning" when a logistic regression would have worked just as well. If the board later learns you overstated the technology, you have lost credibility permanently. The right approach is to be precise about what your system does, honest about what it does not do, and concrete about the numbers — always tying performance metrics back to business outcomes that your audience cares about. If your recall is 70%, do not just say the number. Say: "Out of every 100 fraudulent transactions, our system catches 70. The remaining 30 will need to be caught by your existing manual review process. Here is how we recommend supplementing the model."

Guided Exercises

Exercise 1: Explain the concept of overfitting (Overfitting is when a model memorizes the training data so perfectly that it fails on new data — like a student who memorizes practice test answers instead of learning the concepts.) to three different audiences: (a) a 10-year-old who loves football — think about a player who memorizes one team's playbook perfectly but cannot play against any other team; (b) a business executive at a telecoms company who needs to decide whether to trust a churn prediction model (Churn means customers leaving a service — a churn prediction model tries to identify which customers are likely to cancel their subscription or stop using the product.); and (c) a backend software engineer at your company who has never taken an ML course. Write out each explanation (3-4 sentences each). Notice how the core concept stays the same but the analogies, vocabulary, and what you emphasize change completely.

Exercise 2: Your fraud detection model has 90% precision and 70% recall. You are presenting to a risk committee at a Nigerian bank that processes 5 million transactions per month. Write a 3-sentence summary that explains: (1) what these numbers mean in plain language, (2) what the practical impact is on customers and fraud losses, and (3) one honest limitation. No jargon. No acronyms. The committee members need to leave the room knowing whether to approve deployment or not.

Exercise 3: A journalist from TechCabal emails you: "Your company just won a contract to deploy AI for loan approvals in Kenya. Critics say AI is biased. Is your AI biased?" Draft a 1-paragraph response (5-7 sentences) that is honest, clear, and avoids all ML-specific terminology. Address what bias means in this context, what steps you have taken to detect and reduce it, what limitations remain, and what safeguards are in place. Remember: the journalist will quote you, so every word matters.

Discussion Prompt

Think about the last three AI headlines you read. Were they over-hyped ("AI will revolutionize healthcare"), over-fearful ("AI will destroy all jobs"), or balanced? Why do you think public communication about AI tends to swing between extremes? How can someone with strong quantitative and verbal reasoning skills push back against both hype and fear with evidence and clear language?

Key Takeaway

Clear communication is not a soft skill — it is a technical skill. If you cannot explain your model's behavior, its error rates, and its limitations to the people making deployment decisions, your model might as well not exist. The engineer who can build a model and the engineer who can explain it to a non-technical stakeholder are two very different people — and the second one is far more valuable.

Quick Check

When explaining ML to non-technical stakeholders, the best approach is to:

Adapt your explanation to what the audience needs to make a decision
Use as much technical jargon as possible to show expertise
Only share good news and hide limitations

Honesty about uncertainty in ML communication:

Makes stakeholders lose confidence in the model
Builds trust and helps decision-makers plan around real behavior
Is unnecessary if accuracy is high enough

To explain overfitting to a football-loving child, the best analogy is:

A player who practices penalties every day
A coach who reads every book about football
A player who memorizes one team's playbook perfectly but can't play against any other team

Key Terms — Tap to Flip

1 / 3

What is audience adaptation?

Changing your explanation's vocabulary, detail level, and framing to match what your specific audience needs to understand and decide. The core concept stays the same.

Why be honest about limitations?

Because hiding limitations erodes trust and makes it impossible for decision-makers to plan. The CFO needs to model customer support costs; the compliance officer needs to assess regulatory risk.

What makes a good analogy?

A good analogy maps the core mechanism of a concept to something the audience already understands. It should preserve the key insight while dropping unnecessary technical detail.

Objective Write clear, reproducible descriptions of ML experiments and results.

Concept Lesson

You ran three experiments this week. Your team lead sends you a Slack message at 4 PM on Friday: "Hey, can you give me a quick summary of how the models performed? I need to update the product manager before end of day." You stare at the blank text box. Where do you start? Do you list all the hyperparameters? Lead with the best number? Mention the one that failed? This is the moment where technical writing skill determines whether your team lead walks into that meeting armed with useful information or confused and guessing. Good technical writing follows a simple four-part framework: Context (what problem are you solving and why does it matter?), Method (what did you actually do, with enough detail to reproduce it?), Results (what happened, presented objectively?), and Interpretation (what does it mean, and what should we do next?). This structure works for a Slack message, a formal report, a model card, or a conference paper — the level of detail changes, but the skeleton stays the same.

Reproducibility requires specificity, and specificity requires you to anticipate what a reader needs to replicate your work. Consider the difference between these two descriptions of the same experiment: "We trained a model on customer data and got good results" versus "We trained a Random Forest classifier (200 trees, max depth 12, min samples split 5) on 15,000 labeled customer support tickets from Q1-Q3 2025, using TF-IDF vectorization (TF-IDF is a method for converting text into numbers — it counts how often a word appears in a document and adjusts for how common that word is across all documents. The point here is the level of detail, not the technique itself.) with 5,000 features. We evaluated using 5-fold stratified cross-validation (Cross-validation means splitting your data into several parts, training on some and testing on the rest, then rotating through all the parts. '5-fold' means five parts. 'Stratified' means each part keeps the same ratio of classes as the full dataset.) and selected the model with the highest mean F1 score. The best model achieved F1 0.86 on the validation set." The second version lets a colleague three months from now — or even yourself, after you have forgotten the details — understand exactly what was done and reproduce the result. This level of detail is not pedantic. In production ML teams at companies like Flutterwave or Paystack, the inability to reproduce a previous experiment costs real engineering time. When you leave out details because they seem obvious to you right now, you create a gap that someone else will have to fill by repeating your entire process from scratch, or worse, making incorrect assumptions about what you did.

Common writing mistakes in ML reports are predictable and avoidable. The first is burying the key result: a colleague once wrote a 6-page report where the best model's F1 score appeared on page 5, after four pages of methodology. Your team lead reading a Slack summary does not have four pages of patience — lead with the number. The second mistake is not stating a baseline. Saying "our model achieved 82% accuracy" is meaningless without context. Is 82% good? For a balanced binary classification problem, maybe. For a dataset where 80% of samples belong to one class, a model that always predicts the majority class gets 80% accuracy — so your 82% is barely better than guessing. Always anchor your results against a baseline: majority class, random guessing, or the previous best model. A third mistake is passive voice that obscures responsibility. "The data was preprocessed" leaves the reader wondering who did it and how. "We removed 340 rows with missing values (2.1% of the dataset) and normalized numerical features using min-max scaling (rescaling numbers so they all fall between 0 and 1 — the smallest value becomes 0, the largest becomes 1, and everything else is proportionally in between)" is precise and actionable. Finally, many ML writers skip limitations entirely, either out of enthusiasm or fear. A single sentence — "We did not evaluate on out-of-distribution data (data that looks different from what the model was trained on — for example, a model trained on formal English reviews struggling with Pidgin English text), so production performance on new customer segments is unknown" — can save your team from deploying a model that quietly fails in the real world.

Guided Exercises

Exercise 1: Rewrite this vague description to be specific and reproducible: "We used AI to analyze customer data and found interesting patterns that could improve sales." Your rewrite should include: the type of model used, the dataset size and source, the features, the evaluation method, the key result with a number, and one limitation. A colleague reading your version should be able to understand exactly what was done without asking you follow-up questions.

Exercise 2: You trained three models to predict customer churn at a telecoms company with 50,000 subscribers: Logistic Regression (F1: 0.78), Random Forest (an algorithm that builds many decision trees and averages their predictions) (F1: 0.83), XGBoost (a more advanced algorithm that builds trees sequentially, with each new tree correcting the errors of the previous ones) (F1: 0.84). The baseline — predicting the majority class (no churn) (A majority class baseline is the simplest possible model: it always predicts the most common class. If 80% of customers don't churn, it predicts 'no churn' for everyone and gets 80% accuracy — but it never catches any churners.) — achieves F1: 0.0 because it never predicts churn. Write a results paragraph (4-6 sentences) that includes: the baseline comparison, a clear comparison across models, your recommendation, and one limitation. Do not just list numbers. Interpret them. What does the 1-point difference between Random Forest and XGBoost actually mean for a decision?

Exercise 3: Draft a model card (A model card is a short document — usually one page — that describes what a model does, how it was trained, how well it performs, and what its limitations are. Think of it as a nutrition label for a machine learning model.) (1 page) for a sentiment analysis (Sentiment analysis is the task of determining whether a piece of text expresses a positive, negative, or neutral opinion — for example, classifying customer reviews as happy, unhappy, or neutral.) classifier trained on 10,000 customer reviews from Jumia Nigeria. The model classifies reviews as positive, negative, or neutral. F1 score is 0.81. It performs worst on sarcastic reviews and reviews written in Pidgin English. Use the Context-Method-Results-Interpretation framework. Include sections for: intended use, training data description, performance metrics, known limitations, and ethical considerations.

Discussion Prompt

You have just finished a successful experiment. Consider three audiences: (a) your teammate on Slack who needs to integrate your model this week, (b) your engineering manager who needs to decide whether to allocate two more sprints to this project, and (c) an ML blog audience who wants to learn from your approach. How does your writing change for each audience? What stays the same across all three? Which audience is hardest to write for, and why?

Key Takeaway

Writing is thinking made visible. If your write-up of an experiment is unclear, vague, or missing key details, it is almost always because your thinking about that experiment was unclear too. Structured writing — Context, Method, Results, Interpretation — forces structured thinking. When you commit to writing a clear experiment report, you are not just documenting your work. You are stress-testing your own understanding.

Quick Check

The Context-Method-Results-Interpretation framework is useful because:

It ensures readers can understand what was done, what happened, and what it means
It makes reports longer and more impressive
It only works for academic papers

Specificity in experiment descriptions matters because:

It makes the report look more professional
It enables reproducibility — others can replicate your work without guessing
Specificity is not important

A model card is best described as:

A credit card for ML models
The training code for a model
A nutrition label — describing what the model does, how well, and what its limitations are

Key Terms — Tap to Flip

1 / 3

What is the CMRI framework?

Context, Method, Results, Interpretation. A four-part structure for any technical writing: what's the problem, what did you do, what happened, what does it mean.

What is reproducibility?

The ability for someone else to repeat your experiment and get the same results. Requires specific details: model type, hyperparameters, dataset size, evaluation method.

What is a model card?

A one-page document describing a model: intended use, training data, performance metrics, limitations, and ethical considerations. Like a nutrition label for ML.

Interactive Tools

Bayes' Calculator Confusion Matrix Builder Loss Visualizer Glossary

Communicating ML

Explaining to Humans

Concept Lesson

Guided Exercises

Quick Check

Key Terms — Tap to Flip

Writing About ML

Concept Lesson

Guided Exercises

Quick Check

Key Terms — Tap to Flip

Interactive Tools