Week 06 · Days 26–30
Growth Frameworks & Experimentation
Learn how growth PMs design experiments and think about A/B testing.
Portfolio deliverable · An experiment design doc (hypothesis + A/B test plan)
Growth loops vs funnels
Lesson: Growth loops
A funnel is linear: input → output, end of story. A growth loop is circular: the output of one cycle becomes the input for the next, compounding over time. Common loop types: referral loops (existing users invite new users, e.g. Dropbox's extra storage for referrals), content loops (user-generated content gets indexed/shared, drawing in new users, e.g. Pinterest, TikTok), and paid loops (revenue from users funds ads that acquire more users, viable when LTV > CAC with margin). The key question for a growth PM: where does this loop leak, and what's the smallest change that increases the 'conversion rate' of one step in the loop?
Task: Diagram your practice app's growth loop
Draw (in Figma or on paper) the growth loop you believe drives your practice app's user acquisition.
Forming hypotheses
Lesson: Hypothesis-driven thinking
The hypothesis format — 'If we [change], then [metric] will [increase/decrease], because [underlying reason], measured by [specific metric/method]' — forces three things: a testable change (not vague), a causal mechanism (so you learn something even if you're wrong), and a way to measure success before you build. Weak hypothesis: 'Adding social proof will help.' Strong: 'If we show "12,000 people completed this today" on the signup screen, signup completion rate will increase by 5%+ because social proof reduces hesitation for new users, measured by completion rate in the signup funnel over 2 weeks.'
Task: Write 3 growth hypotheses
Using your growth loop from Day 26, write 3 hypotheses for changes that could strengthen the weakest part of the loop.
A/B testing fundamentals
Lesson: A/B testing basics
In an A/B test, users are randomly split between control (current experience) and variant (your change). Statistical significance (commonly p < 0.05) tells you whether a difference is likely real vs random noise — but you need enough sample size to detect the effect you expect; small effects on low-traffic features may take weeks to reach significance. Guardrail metrics are things that shouldn't get worse even if your primary metric improves (e.g. a change that boosts signups but tanks retention is a bad trade). The two most common mistakes: peeking at results early and stopping as soon as it 'looks significant' (this inflates false positives), and running too many variants at once, which splits your sample size and slows everything down.
Task: Design an A/B test
Pick your top hypothesis from Day 27. Design the A/B test: control vs variant, primary metric, guardrail metrics, and rough sample size needed.
Reading experiment results
Lesson: Interpreting results & making ship decisions
Reading a results readout is rarely a clean 'it worked' or 'it didn't.' Common scenarios: (1) Primary metric significant and positive, guardrails flat → ship it. (2) Primary metric positive but a guardrail metric (e.g. retention, revenue per user) is negative → this is the hard case; you weigh the size of each effect and whether the guardrail regression is acceptable or a dealbreaker. (3) Not statistically significant → don't conclude 'it doesn't work,' conclude 'we don't have enough evidence yet' — you may need more time/traffic, or the effect may genuinely be too small to matter. Document your reasoning, not just the decision — that's what builds trust with stakeholders over time.
Task: Write a mock results readout
Write a fictional results readout for your A/B test from Day 28 (made-up numbers), including a ship/no-ship recommendation with reasoning.
Synthesize: Experiment design doc
Deliverable: Experiment Design Doc
Combine hypothesis, growth loop diagram, A/B test design, and mock readout into a single experiment design doc.
Advanced Challenge: Design a prompt/model experiment
Classic A/B testing assumes a deterministic change. For AI features, you're often testing prompts, model versions, or retrieval strategies — where outputs vary even within a 'variant.' Design an experiment for an AI feature (e.g. two different system prompts for a conversational assistant): define what you'd hold in a fixed evaluation set (so you can compare quality offline before any user sees it), what you'd A/B test live (user-facing metrics once you've cleared an offline quality bar), and how 'guardrails' differ — e.g. a guardrail might be 'refusal rate doesn't increase' or 'response length stays within X tokens.' This two-stage approach (offline eval → online A/B) is how mature AI teams ship model/prompt changes safely.