How many sends do I need for an A/B test to be significant?

Rule of thumb: 200-400 sends per variant for typical LinkedIn outreach baselines (28-32% acceptance, 10-18% reply). Larger effects (5+ percentage points difference) become significant at 200 sends; smaller effects (1-2 points) need 800-2,000 sends per variant. Use the calculator on this page with your actual numbers - it tells you the exact p-value.

What significance level should I use for LinkedIn A/B tests?

95% confidence (p<0.05) is the standard for clean decisions. 90% confidence (p<0.10) is acceptable for fast-iteration testing where the cost of being wrong is low (just keep testing). Don't promote a challenger to champion below 90% confidence - the variance will burn you.

Home › Tools › A/B significance

A/B test significance calculator

Two-proportion z-test for LinkedIn outreach A/B tests. Input the number of sends and conversions (accepts, replies, meetings booked - whatever you're measuring) for two variants. See p-value, confidence interval, and whether the difference is statistically significant.

Variant A (control / champion)

Sends

Conversions (accepts, replies, etc)

Variant B (challenger)

Sends

Conversions

How to read the result

Significance threshold: conventionally, p < 0.05 (95% confidence) means "the difference is unlikely to be random". p < 0.10 (90% confidence) is acceptable for fast-iteration testing. Anything above p = 0.10 means "we don't know yet, run more sends".

The lift number: shows the relative improvement of variant B over A. 25% relative lift on a 25% baseline = 6.25 percentage points absolute. Lift is the right metric to track because absolute differences mean less in isolation.

Sample size sanity check:

Under 100 sends per variant - probably noise, run more.
200-400 per variant - detects 5+ point absolute differences.
800-2,000 per variant - detects 1-3 point absolute differences.
Over 5,000 per variant - you're past the point of diminishing returns; you have your answer.

FAQ

What is a two-proportion z-test? The standard frequentist test for comparing two binary outcomes (converted vs not) across two groups. Used widely in marketing A/B testing and clinical trials. We're using the pooled-variance version which is the most common and slightly conservative.

Can I use this for revenue or dollar-value tests? No - this calculator is only for binary outcome rate comparisons (acceptance rate, reply rate, meeting-booked rate). For dollar-value tests (deals closed × ACV), use a t-test on the mean instead.

Why does the result oscillate as I add data? Normal for A/B tests. Small samples have high variance. Don't make decisions until you've crossed both your sample size threshold AND your significance threshold. Stopping early when you "see significance" is called peeking and gives false positives.

Running A/B tests inside SocialScalr? Sequences support 4 variants per step with built-in significance tracking.