The Multi-Armed Bandit: How to Leverage Machine Learning for More Efficient A/B-Testing

When running any campaign, it is important for advertisers to gauge early on which creative variations are outperforming others in order to optimize and shift budget towards the better-performing variations to increase ROI.

Standard A/B-testing often reaches its limitations when it comes to the fast-paced world of mobile advertising, where it is important to react instantly and figure out the best techniques so as not to waste budgets. The multi-armed bandit is superior to standard A/B-testing as under this approach A/B-testing is innately embedded in the campaign. Where standard A/B-testing requires a data-gathering period of about a week before the decision is made, the multi-armed bandit does this much faster and in an automated fashion, updating itself every 10 minutes.

How Exactly Does Multi-Armed Bandit Work?

In the initial phase of exploration the bandit will try every arm — one arm per each creative variation — and as it gathers data from every arm, it will be able to infer which is the best among them.

The strategy for this is reinforcement learning via the Thompson Sampling method. Initially the same amount of traffic will be given to all arms until it finds out which arm performs better. The algorithm updates distribution as more data for every arm comes in. At every step it asks itself: “What is the probability that arm X is the best?” It needs the probability for every arm, that arm X is better than all other arms until it arrives at the conclusion whichever arm has the maximum probability of being the best.

It requires a certain strategy to know which arm to play in every round. This is usually based on data available at the current time. The dimensions to optimize on must be determined, this can, for example, be the game play time or scene settings in case of a Playable ad. With Playable ads, the strategy needs to be closely aligned with the gameplay, to define a feedback mechanism for the machine learning algorithm. This strategy should eventually lead to the point where the best arm will be recommended. The final decision is made when the probability of one arm to be the best is at 95%.

Benefits of Multi-Armed Bandit Creative Optimization:

Speed up the process: Save time that would apply during standard A/B-testing via the automated multi-armed bandit creative optimization approach.

Stay cost-conscious: Standard A/B-testing can cause advertisers to lose money on testing out underperforming creative variations. The multi-armed bandit approach will ensure to attribute more budgets towards the better-performing arms and mitigate the budget loss this way.

Control optimization dimensions: With the multi-armed bandit approach advertisers can control which dimensions they wish to optimize on and learn exactly why a certain creative variation may be performing better than the others.