Skip to main content

Designing Statistically‑Sound Experiments in Wisepops

Updated over a month ago

TL;DR:
• Know your baseline metric (current conversion rate, CTR, AOV…)
• Pick the smallest uplift that would be worth the effort
• Each extra variant ⬆ the total sample size – be ruthless about ideas that don’t move the needle
• Your eligible traffic & campaign targeting determine how long it takes
• Aim for ≤ 8 weeks; longer tests drift because of seasonality, promos, code changes, etc.

1. Why sample size matters

Running an A/B test with too little traffic is like flipping a coin only once—whatever result you see is pure chance.
Wisepops defaults to 95 % statistical significance (α = 0.05) and 80 % power (β = 0.20). Those guardrails protect you against false winners and missed winners, but they also dictate how many visitors you must expose to each version.


2. Inputs you need before you build a test

Input

Where you find it

Typical range

Baseline metric (p₁)

Wisepops reports

0.5 % – 10 % CTR or CVR

Smallest uplift

Business judgement – what’s a material lift?

5 % – 30 %

# of groups

k = control + variants

2–6

Eligible traffic / day

Site analytics × campaign targeting

500 – 50 000+

Maximum test length

4–8 weeks recommended


3. How many visitors do you need?

Reading the chart:
With a 2 % baseline CVR, detecting a 10 % lift (i.e. 2.0 → 2.2 %) needs ≈ 80 000 visitors per variant. Lower baselines or smaller uplifts explode the requirement quickly.

Quick‑reference table (95 % / 80 %)

Baseline CVR

+5 % uplift

+10 % uplift

+20 % uplift

1 %

540 k / variant

140 k

36 k

2 %

270 k

80 k

21 k

5 %

105 k

32 k

8 k

Rule of thumb: If the test would need > 250 k visitors per variant, try either (a) aiming for a bolder change, (b) simplifying to a single strong variant, or (c) segmenting traffic so you test on the audience that really matters.


4. The cost of extra variants

Every new idea you add is another mouth to feed with traffic.
With a 10 % smallest detectable uplift at 2 % baseline, moving from a classic A/B to A/B/C/D/E/F multiplies the total sample 3×.

Best practice: Prioritise ruthlessly—run sequential waves of tests instead of a mega‑test with 5 mediocre ideas.


5. Turning sample size into calendar time

  1. Daily eligible traffic = total daily visitors × targeting fraction (e.g. exit‑intent popup shown to 30 % of sessions).

  2. Duration (days) = total sample size ÷ daily eligible traffic.

Example:
40 000 total visitors / day × 30 % targeting = 12 000 eligible / day.
Need 320 k visitors total → ≈ 27 days runtime.

Keep it < 8 weeks. Beyond that you’ll hit promo periods, product launches, cookie expiration, or just plain audience fatigue.

Did this answer your question?