Growth #growth#experiments#metrics

Growth Experiments in 2026: A Framework That Prevents Randomness

Most growth work is busywork. A structured experimentation framework using ICE scoring: hypotheses, constraints, measurement, and decision loops that compound.

14 min · January 1, 2026 · Updated January 27, 2026

TL;DR

Experiments need hypotheses and success metrics — without them, you can’t decide
Use ICE scoring (Impact × Confidence × Ease) to prioritize objectively
Run fewer experiments with higher quality — target 80% yielding statistically reliable learnings
Log every decision so you don’t repeat mistakes — learning compounds
The weekly decision loop: review results → decide (ship/iterate/kill) → record what you learned
Establish baseline KPIs before launching any experiment

Why Most Growth Work Is Busywork

Random tactics without structure:

Waste resources on low-impact tests
Yield inconclusive results
Get repeated because nobody remembers what was tried
Don’t compound into understanding

The Experiment Paradox

More Experiments	Better Experiments
Lots of activity	Focused learning
Many inconclusive	Mostly decisive
No pattern emerges	Mental model builds
Tactics over strategy	Strategy informs tactics

Run fewer experiments with higher quality.

The Experiment Template

Every experiment needs these elements:

Required Components

Component	Description	Example
Hypothesis	If we change X, Y will improve because Z	”If we add social proof to pricing page, conversion will increase because users need validation”
Metric	What specifically will change	Pricing page → checkout conversion
Segment	Who is being tested	New visitors from paid channels
Duration	How long to run	2 weeks minimum
Success threshold	What counts as a win	+10% relative improvement
Sample size	Statistical requirements	1,000 visitors per variant
Rollback plan	What if things go wrong	Revert to control immediately

Example Experiment Doc

## Experiment: Social Proof on Pricing Page

### Hypothesis
If we add customer logos and testimonials to the pricing page,
checkout conversion will increase by 10%+ because enterprise
visitors need social validation before purchasing.

### Design
- Control: Current pricing page
- Variant: Add logos above pricing table + 2 testimonials

### Success Metrics
- Primary: Pricing → Checkout conversion (+10%)
- Secondary: Time on page (monitor, no threshold)
- Guardrail: Bounce rate (must not increase >5%)

### Segment
New visitors from paid enterprise campaigns

### Duration
2 weeks (minimum 1,000 visitors per variant)

### Rollback Trigger
Conversion drops >15% after 3 days

### Owner
Growth Lead

### Start Date
2026-02-01

ICE Scoring for Prioritization

ICE is the gold standard framework for ranking growth experiments.

The Three Dimensions

Dimension	Question	Scale
Impact	How much will this move the key metric if successful?	1-10
Confidence	How sure are you it will work? (Data, research, precedent)	1-10
Ease	How simple is it to build and launch?	1-10

Calculating ICE Score

Two common approaches:

Average: (Impact + Confidence + Ease) / 3

Multiply: Impact × Confidence × Ease

Multiplication gives more separation between ideas.

ICE Scoring Example

Experiment	Impact	Confidence	Ease	ICE Score
Social proof on pricing	7	6	9	378
New checkout flow	9	4	3	108
Email sequence update	5	7	8	280
Referral program	8	3	4	96

Rank by score and work top-down.

When to Adjust Weighting

Context	Emphasize
New to testing	Ease (build confidence)
Executive scrutiny	Impact (need big wins)
Low traffic	Ease + statistical feasibility
Scaling phase	Impact (bigger bets)

The Weekly Decision Loop

Structure prevents drift and enables compounding.

Weekly Cadence

Day	Activity
Monday AM	Review last week’s experiment results
Monday PM	Decide: ship, iterate, or kill
Tuesday	Queue next experiments
Wed-Thu	Execute and monitor
Friday	Log learnings, prep for Monday

Decision Framework

Result	Decision	Action
Clear win (> threshold)	Ship	Roll out to 100%, document learning
Marginal win (< threshold)	Iterate	Improve and re-test, or combine with other wins
No effect	Kill	Stop, document why it didn’t work
Negative	Kill immediately	Revert, document learning
Inconclusive	Extend or kill	Either need more time or sample too small

The Decision Log

Every experiment gets a closing entry:

## Experiment: Social Proof on Pricing Page
Status: SHIPPED

### Result
+14% conversion (significant at p<0.05)

### Learning
Enterprise visitors respond strongly to peer validation.
Logo recognition matters more than testimonial length.

### Follow-up
Test adding case study links for even higher lift.

### Closed
2026-02-15 by Growth Lead

Before You Experiment: Baseline KPIs

You can’t measure improvement without knowing your starting point.

Essential Baselines

KPI	Why	How Often
Funnel conversion rates	Know each step’s current performance	Weekly
Activation rate	New user success baseline	Weekly
Retention curves	Cohort performance	Monthly
CAC by channel	Acquisition efficiency	Monthly
Revenue per visitor	Overall efficiency	Weekly

Baseline Hygiene

Practice	Why
Segment by source	Channels behave differently
Track trends, not just snapshots	Seasonality matters
Document methodology	Reproducible measurement
Flag anomalies	Know when something’s off

A/B Testing Best Practices

Statistical Requirements

Element	Guideline
Sample size	Calculate before starting (power analysis)
Duration	Minimum 1-2 weeks, capture weekly cycles
Significance	p < 0.05 for most decisions
One primary metric	Multiple metrics = multiple comparison problem

Common Mistakes

Mistake	Problem	Fix
Stopping early	False positives	Pre-commit to duration
Too many variants	Diluted sample	Max 2-3 variants
Changing mid-test	Invalidates results	Lock test after start
No guardrail metrics	Miss negative effects	Always monitor key metrics

When A/B Testing Doesn’t Work

Situation	Alternative
Low traffic	Sequential testing
Complex changes	Before/after with caution
UX redesigns	Qualitative + quantitative
Pricing	Survey + cohort analysis

Prioritization Frameworks Beyond ICE

PIE Framework

Factor	Question
Potential	How much improvement is possible?
Importance	How valuable is improving this page?
Ease	How difficult to implement?

RICE Framework

Factor	Description	Unit
Reach	How many users affected	Number
Impact	Effect on metric	Scale
Confidence	Certainty of success	%
Effort	Resources required	Person-weeks

Score = (Reach × Impact × Confidence) / Effort

When to Use Which

Framework	Best For
ICE	Quick prioritization, early stage
PIE	Page-level optimization
RICE	Feature-level decisions

Building an Experiment Backlog

Idea Sources

Source	How to Capture
User interviews	Pain points → experiment ideas
Support tickets	Common issues → fixes
Analytics	Drop-off points → optimizations
Competitor analysis	What they do → what to test
Team brainstorms	Weekly idea collection

Backlog Structure

## Growth Experiment Backlog

### Scored (Ready to Run)
1. Social proof on pricing (ICE: 378)
2. Email sequence update (ICE: 280)
3. Homepage hero test (ICE: 245)

### Needs Scoring
- Exit-intent popup
- Chatbot on docs
- Annual pricing nudge

### Parked
- Mobile app push notifications (needs mobile first)
- Enterprise landing page (needs content)

Backlog Hygiene

Frequency	Action
Weekly	Add new ideas
Bi-weekly	Score unsorted ideas
Monthly	Prune stale ideas
Quarterly	Theme review

Experiment Velocity

Target Metrics

Metric	Target	Why
Experiments/month	4-8	Sustained learning
Conclusive rate	80%+	Quality over quantity
Win rate	30-40%	Some wins expected
Time to decision	2-4 weeks	Avoid dragging

Velocity vs. Quality

Low Velocity	Right Balance	Too Fast
1 test/month	1-2 tests/week	1 test/day
No learning	Steady learning	Low quality
Missed opportunities	Compounding gains	Inconclusive results

Documenting Learnings

Why Documentation Matters

Without Docs	With Docs
Repeat same tests	Build on history
New hires start over	Onboard with context
Random tactics	Pattern recognition
No institutional memory	Compounding knowledge

Learning Repository Structure

/experiments
  /2026-Q1
    /social-proof-pricing.md
    /email-sequence-v2.md
    /checkout-simplification.md
  /2026-Q2
    /...
  /meta
    /what-works.md  (patterns that win)
    /what-fails.md  (patterns that lose)
    /framework.md   (how we experiment)

Pattern Recognition

After 20+ experiments, look for:

What consistently works in your product?
What never moves the needle?
Which segments respond to which tactics?
What baseline changes indicate?

Implementation Checklist

Setup:

For each experiment:

Weekly:

Monthly:

Review experiment velocity
Analyze win rate
Identify patterns
Update baselines

FAQ

What’s the biggest experiment failure?

No clear success threshold. If you don’t define “win” before starting, you’ll rationalize any result. Commit to the threshold upfront.

How do I get more experiment ideas?

Source	Method
Users	”What almost stopped you from buying?”
Analytics	Where do people drop off?
Support	What do people complain about?
Competitors	What do they do that we don’t?
Team	Weekly brainstorm sessions

What if my traffic is too low?

Option	Trade-off
Longer test duration	Slower learning
Bigger effect sizes only	Miss small wins
Sequential testing	More complex analysis
Qualitative research	Less statistical rigor

Should I test everything?

No. Some decisions don’t need A/B tests:

Obvious bugs (just fix them)
Legal requirements (no choice)
Very small changes (not worth the setup)
Strategic bets (test after shipping)

How do I handle multiple tests at once?

Scenario	Approach
Tests on different pages	Run in parallel
Tests on same page	Run sequentially
Overlapping audience	Use exclusion groups

Sources & Further Reading

Interested in our research?

We share our work openly. If you'd like to collaborate or discuss ideas — we'd love to hear from you.

Get in Touch