Week 6 of 8

Reading & Interpreting

Shift from quantitative to verbal reasoning. Learn to read technical ML writing critically and apply logical reasoning to evaluate claims. These skills are the bridge between understanding ML and using it responsibly.

Before this week

✓ Loss functions & optimization

✓ Evaluation metrics

Objective Read and extract key claims, methods, and results from ML technical writing without getting lost in jargon.

Concept Lesson

Your lead engineer at a fintech startup in Lagos drops a PDF in the team Slack channel and says, "This new architecture (the structural design of a model — how many layers it has, what operations it uses, how data flows through it) gets state-of-the-art (the best result anyone has published on this task so far) results — we should consider it for our credit scoring pipeline (the end-to-end system: raw data goes in, predictions come out)." Before you get excited or allocate engineering time to implement it, you need to read the paper critically. Technical ML writing follows a predictable structure that you can learn to navigate efficiently, and once you see the pattern, you can extract the core value of any paper in under two minutes. A paper's abstract tells you exactly three things: what problem the authors are solving, how they propose to solve it, and what results they achieved. These three components map to the problem-solution-evidence framework that organizes nearly every research paper, blog post, or technical documentation page you will encounter in ML. The introduction then expands on the problem, the methods section describes the solution in detail, and the results section presents the evidence. If you can extract these three elements from the abstract alone, you have captured roughly 80% of the paper's actionable value. The remaining 20% — implementation details, ablation studies (tests where the authors remove one component of their system to see how much that component contributed to the result), related work — matters when you need to reproduce the results or compare approaches, but for an initial go/no-go decision, the abstract is your most efficient tool.

Jargon is the primary barrier to reading ML papers, but it is a manageable one if you develop the right filtering habit. When you encounter an unfamiliar term, ask yourself a simple question: is this a concept I need to understand to get the main point, or is it a technical detail I can skip for now? Most jargon that appears in an abstract is the second type — a specific technique name like "multi-head self-attention," a benchmark (a standardized dataset or test that everyone in the field uses to compare models — like a common exam that all students take) label like "GLUE score," or a model variant identifier like "DeBERTa-v3-large" that adds precision without changing the core argument. These terms matter if you need to reproduce the work, but they do not change whether the paper's central claim is valid. Focus on the verbs and claims instead, because they carry the weight of the argument. Verbs like "outperforms," "achieves," "demonstrates," and "reduces" tell you what the authors are asserting. Noun modifiers tell you the specific tool they used. A practical technique: cover the technical nouns with your thumb and read only the verbs and numbers. If the sentence still makes a clear claim, the jargon was decorative. If it collapses into nonsense, the term is essential and you should look it up.

Watch carefully for hedging language, because it signals the author's own confidence in their results and tells you how seriously to take the claim. Compare these three statements: "Our approach outperforms the baseline" is a strong, direct claim with no qualifications — the authors are asserting superiority. "Our approach tends to outperform the baseline" introduces frequency language, meaning it works better most of the time but not always, and you should wonder under what conditions it fails. "Our approach outperforms the baseline in some settings" limits the claim to specific conditions entirely, and you must read further to learn which settings those are — because the settings where it does not outperform might be exactly the ones relevant to your use case. Being sensitive to these differences is one of the most important verbal reasoning skills you can develop in ML, because the distance between a strong claim and a qualified one is often the difference between a technology you can trust in production and one that will break in edge cases you never tested.

Finally, learn to read figures and tables before the surrounding text, because a well-designed figure communicates a result faster and more honestly than any paragraph. When you open a paper, scan for charts and tables first. Look at the axes: what is being measured, and what units are used? Look at the trend lines or bar heights: which direction represents improvement? Look at the bold or highlighted values in tables — these are the numbers the authors want you to notice. Before you read the author's interpretation, form your own conclusion about what the figure shows. This habit of independent evaluation before accepting the author's framing is what separates critical readers from passive ones. A paper might claim "significant improvement" while its own figure shows a 2% gain on a single benchmark with wide error bars (the small lines above and below a bar in a chart that show the range of uncertainty — if two models' error bars overlap, the difference between them might just be random noise) that overlap with the baseline. If you read only the text, you miss this. If you read the figure first, you catch it immediately.

Guided Exercises

Exercise 1: You are given an abstract from a machine learning paper. Read it carefully and extract: (a) the problem the authors are addressing, (b) the proposed solution or method, (c) the key numerical result, and (d) any hedging language that limits or qualifies the claim. Then write a two-sentence summary of the entire abstract using only the information you extracted. Compare your summary with a partner's — did you both extract the same core claims?

Exercise 2: Here are five sentences drawn from real ML papers. Rank them from strongest to weakest claim: "We achieve state-of-the-art results on ImageNet." / "Our method shows promising results across several benchmarks." / "We demonstrate a 15% improvement in F1 score on the CoNLL-2003 NER task." / "Our approach may improve classification performance in some cases." / "Preliminary experiments suggest modest gains in low-resource settings." For each sentence, write one sentence identifying the confidence level and the specificity of the claim. Explain why your #1 ranked sentence is stronger than your #5 ranked sentence.

Exercise 3: Take a paragraph from a recent ML blog post (provided by the instructor). With a pen, circle every term you do not fully understand. For each circled term, classify it as essential (you cannot understand the main argument without knowing this) or skippable (it adds precision but you can follow the point without it). Look up only the essential terms and write a one-sentence plain-language definition for each. Count how many terms you actually needed to research versus how many you could safely skip. You will usually find that you needed far fewer than you expected.

Discussion Prompt

Why is the ability to read technical papers important even if you are not an academic researcher? Think about a time when someone recommended a tool, library, or approach to you at work or in a project. Did you read the source, or did you trust the recommendation? What happened? How might reading the original source have changed your decision?

Key Takeaway

You do not need to understand every equation to extract value from a paper. Focus on three things: what claim is being made, what evidence supports it, and what limitations the authors acknowledge. That is verbal reasoning applied to technical text.

Quick Check

A well-structured ML paper abstract tells you three things. They are:

Problem, method, and result
Introduction, bibliography, and conclusion
Code, data, and model architecture

"Our approach outperforms the baseline" versus "Our approach tends to outperform the baseline in some settings." Which statement is stronger?

The second — hedging shows intellectual honesty
The first — it makes a direct, unqualified claim
They are equally strong

Key Terms — Tap to Flip

1 / 3

What is a benchmark?

A standardized dataset or test everyone in the field uses to compare models. Like a common exam. Examples: ImageNet, GLUE, CoNLL.

What is hedging language?

Words that qualify or limit a claim: "tends to," "in some settings," "may," "preliminary results suggest." They signal the author's confidence level.

What is state-of-the-art (SOTA)?

The best result anyone has published on a specific benchmark. Claiming SOTA means your model beats all previously reported results.

Objective Apply formal logical reasoning to evaluate ML claims and avoid common reasoning errors.

Concept Lesson

A blog post from a major tech publication headline reads: "Companies that adopted AI grew 40% faster than those that didn't. Therefore, AI drives revenue growth." Your CTO forwards this to the team and suggests accelerating the AI roadmap. But does the logic hold? This is the kind of reasoning failure that costs companies millions in misallocated resources, and it happens every day because people confuse correlation with causation and skip the logical steps that would expose the error. If/then reasoning is the backbone of all critical technical thinking, and it deserves careful attention. Consider this conditional statement: "If the data is imbalanced, then accuracy is a misleading metric." This is a valid claim — a specific condition (imbalance) leads to a specific consequence (accuracy fails). However, the converse (the logical reversal of a conditional — "if A then B" becomes "if B then A") — "If accuracy is a misleading metric, then the data is imbalanced" — is NOT necessarily true, because accuracy can be misleading for other reasons: noisy labels, a poorly chosen evaluation split, or a task where the output categories are not well-defined. Confusing a statement with its converse is one of the most common reasoning errors in ML discussions. A team might see poor accuracy, assume imbalance is the cause, spend weeks resampling (adjusting the dataset to balance the classes — for example, duplicating fraud cases so the model sees equal numbers of fraud and legitimate transactions) their data, only to discover the real problem was mislabeled test examples.

Necessary versus sufficient conditions are another critical distinction that ML practitioners get wrong constantly, often with expensive consequences. "More data is necessary for good model performance" and "more data is sufficient for good model performance" sound similar but make fundamentally different claims. The first says you cannot achieve good performance without adequate data — data is a prerequisite, a gate you must pass through. The second says that data alone guarantees good performance — if you have enough data, nothing else matters. In reality, data is usually necessary but almost never sufficient. Consider a Lagos-based logistics company that collected 2 million delivery records and expected their route optimization model to work flawlessly. The data was necessary — without it, no model could learn delivery patterns. But the data alone was not sufficient: the records had inconsistent address formats, many GPS coordinates were off by hundreds of meters, and the team used a linear model (one that assumes the relationship between inputs and output is a straight line — double the input, double the output) when the delivery patterns were clearly nonlinear (where the relationship curves — for example, doubling your training data might improve accuracy by only 5%, not 100%). They had abundant data and a poor result, proving that data alone does not guarantee success. You also need clean data, appropriate algorithms, good feature engineering (the process of choosing and transforming the input variables your model uses — for example, converting a raw date into "day of week" or combining two measurements into a ratio), proper evaluation methodology, and careful hyperparameter tuning (adjusting the settings you choose before training — like learning rate, number of trees, or regularization strength — to find what works best). Understanding this distinction prevents the dangerous oversimplification that "just get more data" or "just get a bigger model" solves every ML problem.

Common fallacies pervade ML discussions at every level, from Twitter threads to boardroom presentations, and learning to spot them is a practical survival skill. Correlation versus causation is the most famous and the most frequently violated: your model discovers that customers who browse the pricing page for more than 3 minutes are 3x more likely to purchase. The correlation is real. But does lingering on the pricing page cause purchases, or do customers who have already decided to buy naturally spend more time comparing options before they commit? Acting on the wrong causal interpretation — for example, redesigning the pricing page to force longer browsing — could actually reduce conversions. Survivorship bias shapes the entire ML landscape in ways you rarely notice. You read papers about successful models and architectures that achieved breakthrough results. You do not read the hundreds of failed experiments that preceded them, the architectures that were tried and abandoned, the hyperparameter settings that produced garbage. This gives you a distorted view of how likely success is and how much experimentation real progress requires. Appeal to complexity tempts teams to choose a deep neural network (a neural network with many layers stacked on top of each other — "deep" just means "has many layers") when a logistic regression (a simple, well-understood algorithm that predicts a probability by fitting a straight line to the data — it is fast, interpretable, and often surprisingly effective) would solve the problem at 90% of the performance for 1% of the engineering cost, because the complex model "must" be better if it is more sophisticated. Cherry-picking metrics lets any model look good: if your fraud detection model has terrible recall but excellent precision, and you only report precision, you are cherry-picking — and someone relying on your report will make a dangerously uninformed decision.

Building logical discipline means developing the habit of pausing before you accept or make a claim and asking three questions: what are the premises being assumed, what is the conclusion being drawn, and does the conclusion actually follow from those premises? This pause is uncomfortable because it slows you down when everyone around you is moving fast and celebrating a result. But it prevents errors that are expensive to fix later. In a field where conclusions drawn from data can affect hiring decisions, medical diagnoses, loan approvals, and criminal justice outcomes, the cost of sloppy reasoning is not just intellectual — it is ethical, financial, and deeply human. Every time you catch a logical error before it reaches a decision-maker, you are doing the most important work a quantitative and verbal reasoner can do.

Guided Exercises

Exercise 1: Identify the logical error in this statement: "Our model achieved 95% accuracy on the test set. Therefore, it will perform well in production." (a) What hidden assumption connects test performance to production performance? (b) List at least three specific things that could cause the model to fail in production despite high test accuracy. (c) Rewrite the statement to make a logically valid claim about the relationship between test and production performance.

Exercise 2: For each claim below, classify the relationship described as necessary, sufficient, or neither, and justify your answer with a counterexample where applicable: (a) "You need GPU hardware to train deep learning models." (b) "Using a larger model guarantees better performance on any task." (c) "You must clean your data before training a model on it." (d) "Having labeled training data is required for supervised learning." Which of these claims do you think is most commonly believed in the industry, and why might that belief be dangerous?

Exercise 3: A McKinsey-style blog post claims: "In a survey of 500 companies, those that adopted AI tools saw an average 40% increase in revenue over three years, compared to 12% for non-adopters. Therefore, AI adoption drives revenue growth." (a) Identify the specific logical fallacy at work. (b) List at least four alternative explanations for the observed correlation that do not involve AI causing revenue growth. (c) Describe how you would design a better study to test whether AI adoption actually causes revenue increases — what would you need to control for, and what kind of evidence would be convincing?

Discussion Prompt

Think of a specific claim you have recently heard or read about AI — from a news article, social media post, a colleague, or a workplace announcement. Apply the logical tools from this session to evaluate it: identify the premises, the conclusion, and whether the conclusion actually follows. Does the claim hold up under scrutiny? What would you need to see to be convinced? Share your analysis with the group.

Key Takeaway

Sloppy reasoning leads to wrong conclusions even when the data is correct. Logic is the quality control system for your thinking — without it, you cannot distinguish a sound argument from a convincing-sounding one, and the consequences of that confusion are real.

Quick Check

"If the data is imbalanced, then accuracy is misleading." Which is NOT necessarily true?

The original statement
Accuracy can be misleading with imbalanced data
If accuracy is misleading, then the data must be imbalanced

Data is necessary for good model performance but not sufficient. This means:

Good data guarantees a good model
You need data, but you also need clean features, the right algorithm, and proper evaluation
Data doesn't matter at all

"Companies that adopted AI grew 40% faster. Therefore, AI drives revenue growth." The logical error is:

Confusing correlation with causation — growing companies may simply be more likely to adopt AI
The sample size is too small
AI doesn't actually work

Key Terms — Tap to Flip

1 / 3

What is a converse error?

Assuming "if A then B" means "if B then A." If imbalance causes misleading accuracy, it doesn't mean misleading accuracy always means imbalance.

What is necessary vs. sufficient?

Necessary = you must have it (no data, no model). Sufficient = it alone guarantees success (data alone doesn't guarantee a good model). Data is necessary but not sufficient.

What is correlation ≠ causation?

Two things happening together doesn't mean one caused the other. Companies using AI may grow faster because they were already the biggest companies, not because AI caused the growth.

Interactive Tools

Bayes' Calculator Confusion Matrix Builder Loss Visualizer Glossary

Reading & Interpreting

Reading Technical Writing

Concept Lesson

Guided Exercises

Quick Check

Key Terms — Tap to Flip

Logic and Arguments

Concept Lesson

Guided Exercises

Quick Check

Key Terms — Tap to Flip

Interactive Tools