Those in marketing analytics, research, and academia tend to view experiments as a gold standard. Careful…test vs. control reads are hard work too.
I am going to show you how testing can overestimate lift and underestimate advertising ROI. The reasons for this misestimation will also give you avenues to take corrective action and prevent this from happening to your research and testing.
First, the only pure experiment is randomized controlled testing, but hardly anyone can pull that off other than Google and Meta on ad dollars given to them to execute. Post hoc experiments (e.g. people get exposed and you then construct a matching control cell) are almost always what is conducted in practice…but they require all kinds of weighting and modeling to make the unexposed cell properly match to those who saw the ad.
Why incrementality due to ad exposure might get misestimated.
Not matching on brand propensity
In particular, analysts often fail to match on prior brand propensity. This is fatal to clean measurement. In my experience, not matching on prior brand propensity leads to overstatement of lift. Matching on demos and media consumption patterns are not enough to get to the right answer.
Not accounting for media covariance patterns
Your test vs. control read is likely to be contaminated by exposure to other tactics that are correlated with the one you are trying to isolate. Consider this scenario…you want to know the lift due to online video. You have identified consumers who were exposed vs. not exposed to the tactic so, after matching/twinning/weighting you can do a straight reading on the difference in sales or conversion rates, right?
Wrong! Especially If the marketer’s DSP directs both online video and programmatic/direct buy display, you are guaranteed to find a strong correlation between consumers seeing online video and display advertising. That means most of those who were exposed to video, also saw display. So you really are testing the combined effects of multiple tactics, not one. There is a method for counterfactual modeling that can clean this up nicely that I have used.
Crazy media weight levels implied by your test
When you conduct an exposed/unexposed study to measure lift due to a given tactic, you have results with no clear relationship to investment. Consider this…you have created a difference between two alternative marketing scenarios…100% reach and 0% reach for the tactic being tested. In the real world, you cannot achieve 100% reach and trying to get there would cost much more than a marketer would spend in real life. So, in real life, you might spend $5MM behind CTV and consider going to $10MM if it demonstrates substantial lift. However, your test actually might reflect a difference of 0 spending vs. $15MM in spending over, say, a 2 month campaign.
Now you have a bowl of spaghetti to disentangle. On one hand, the absolute lift is higher than you would ever see in-market (because you would never execute a $15MM increase in CTV) but on the other hand, the return on investment is lower because of diminishing returns.
So your test that should have been simple to interpret because a thorny analytic problem…does the marketer increase CTV spending? Unclear which interpretation dominates. So we need to untangle.
I have worked on a whole set of modeling and normalization protocols for dealing with the issues I am mentioning. If I can help, please let me know.