Testing has become one of the most reliably cited best practices in digital advertising. Run A/B tests. Test your creative. Let the data decide. The advice is everywhere, and it is not wrong. The problem is what most brands actually do when they follow it.
Most brands are running tests constantly. They are also accumulating very little genuine learning from those tests. The two things coexist because testing and learning are not the same activity, and conflating them produces a lot of data that does not actually inform better decisions.
The Testing Without Hypothesis Problem
A test without a hypothesis is not a test. It is a comparison.
Running two ad creatives against each other and seeing which gets the better click-through rate tells you which got the better click-through rate. It does not tell you why. It does not tell you which element made the difference. It does not tell you what to do differently next time or what principle you can apply to a different creative challenge.
Most testing programmes are comparison programmes dressed up as learning programmes. The volume of variants being tested is high. The clarity of what each test is designed to answer is low. The result is an accumulation of historical data that is difficult to draw transferable conclusions from.
The discipline that separates useful testing from activity is the hypothesis: a specific, falsifiable statement about what you expect to happen and why. “Version B will outperform Version A because leading with the problem rather than the solution will create stronger relevance for a cold audience” is a hypothesis. “Let’s see which performs better” is not.
What Good Testing Actually Teaches
When testing is designed to answer specific questions, it generates principles rather than just results.
A test designed to understand whether social proof in the headline outperforms benefit statements produces a finding that can inform creative decisions beyond the specific campaign. A test designed to understand whether video or static performs better for a particular product at a particular funnel stage produces a finding that shapes format strategy. A test designed to understand whether a specific audience segment responds differently to a specific message type produces a finding that sharpens audience and creative strategy simultaneously.
These kinds of tests are harder to design than simple variant comparisons. They require thinking before you build the creative, not after. They require holding variables constant so you are genuinely isolating the element you are testing. And they require enough traffic to reach statistical significance on the specific metric you have defined as the measure of success.
The investment is worth it. A well-designed testing programme that runs twelve focused tests in a year will generate more actionable learning than a poorly designed programme that runs a hundred comparisons in the same period.
The Statistical Significance Problem
Many tests are being called at insufficient sample sizes. A campaign that has served an ad to 800 people and generated 12 conversions does not have enough data to conclude that one variant outperforms another with any reliability. The difference could be noise.
This is uncomfortable because it means slowing down the testing cadence and accepting that some tests will take longer to reach meaningful conclusions. The alternative — calling tests early based on thin data — produces false confidence in findings that do not replicate.
The practical implication is that it is better to run fewer, properly powered tests than many underpowered ones. Smaller budgets and lower traffic volumes constrain how many simultaneous tests can be run meaningfully. Accepting this constraint and prioritising the tests with the highest potential learning value is more productive than treating every campaign as a test regardless of the evidence available.
The Optimisation Trap
Constant testing can become a substitute for strategy rather than a tool of it. When the default response to underperformance is “let’s test something new,” the underlying strategic question of whether the approach is fundamentally right can go unexamined for a long time.
A brand that has been testing creative for six months without improving performance is usually not dealing with a creative testing problem. It is dealing with an offer problem, an audience problem, or a funnel problem that creative iteration will not solve. But because testing is always available as an action to take, it keeps being taken.
The question worth asking periodically is: what would need to be true for this campaign to work? If the honest answer involves changes to things outside the creative, that is where attention should go — not another round of headline tests.
Building a Testing Programme That Actually Compounds
The testing programmes that generate compounding value over time share a few characteristics.
They are documented. Each test has a clearly stated hypothesis, a defined primary metric, and a written record of the result and the interpretation. This sounds basic, but most testing programmes have no systematic record of what was tested and what was learned. Without documentation, the same questions get re-tested repeatedly and the same lessons have to be relearned.
They build on each other. Each test informs the next hypothesis. The programme has a direction — a set of questions being progressively answered — rather than a random walk through creative variants.
They feed strategy. The findings from creative testing should be informing brief writing, audience strategy, and channel decisions. If the insights from testing are staying inside the ads manager and not reaching the strategic conversation about the brand and the campaign, the loop is not closed.
Testing is a tool for learning. Learning is a tool for improving strategy. The chain is only as strong as the thinking at each link.
0 Comments