A/B test significance calculator
Two-proportion z-test for LinkedIn outreach A/B tests. Input the number of sends and conversions (accepts, replies, meetings booked - whatever you're measuring) for two variants. See p-value, confidence interval, and whether the difference is statistically significant.
Variant A (control / champion)
Variant B (challenger)
How to read the result
Significance threshold: conventionally, p < 0.05 (95% confidence) means "the difference is unlikely to be random". p < 0.10 (90% confidence) is acceptable for fast-iteration testing. Anything above p = 0.10 means "we don't know yet, run more sends".
The lift number: shows the relative improvement of variant B over A. 25% relative lift on a 25% baseline = 6.25 percentage points absolute. Lift is the right metric to track because absolute differences mean less in isolation.
Sample size sanity check:
- Under 100 sends per variant - probably noise, run more.
- 200-400 per variant - detects 5+ point absolute differences.
- 800-2,000 per variant - detects 1-3 point absolute differences.
- Over 5,000 per variant - you're past the point of diminishing returns; you have your answer.
FAQ
What is a two-proportion z-test? The standard frequentist test for comparing two binary outcomes (converted vs not) across two groups. Used widely in marketing A/B testing and clinical trials. We're using the pooled-variance version which is the most common and slightly conservative.
Can I use this for revenue or dollar-value tests? No - this calculator is only for binary outcome rate comparisons (acceptance rate, reply rate, meeting-booked rate). For dollar-value tests (deals closed × ACV), use a t-test on the mean instead.
Why does the result oscillate as I add data? Normal for A/B tests. Small samples have high variance. Don't make decisions until you've crossed both your sample size threshold AND your significance threshold. Stopping early when you "see significance" is called peeking and gives false positives.
Running A/B tests inside SocialScalr? Sequences support 4 variants per step with built-in significance tracking.