Optimal Header Bidding

Using Thompson Sampling

Proceedings of KDD 2018

http://www.kdd.org/kdd2018/

Ad tech

Automation of digital media sales
Creation of a spot market (real-time bidding)
~$300bn digital ad spending
Soon (2020?) 50% of all media spending
Dominated by Google and Facebook
But constantly evolving

The idealized SSP

In practice

Lots of market imperfections

Integration costs
Business partnerships
Cross subsidies

Each SSP has some exclusive demand or supply
Some publishers don't want to rely on one tech stack
Hence the idea of Header Bidding

SSP competition

The header bidding problem

For each impression $i$
SSP closing ones auction at $p_i$
Advertizer will pay $p_i$ if winning
SSP returning $q_i$ to the publisher
$q_i$ competes with other SSPs (first price)
SSP wins if $q_i > x_i$ the largest of other SSPs responses
If winning SSP pays $q_i$ and charges $p_i$
What is the best $q_i$?

The payoff with full information

What is the best $q_i$?
Without per impression competition it could be
If $x_i$ is known the SSP tries to maximize
$q_i$ should be as small as possible
But if $q_i$ too small everything is lost

Payoff

Value $p_i$
Next $x_i$
Bid $q_i$

The stochastic setting

$x_i$ drawn from an unknown distribution $f_\theta$
$f_\theta$ parametrized by $\theta$
What is the best $q_i$?
Given $\theta$, the SSP tries to maximize

Expected payoff

Value $p_i$
$\mu$/$\sigma$
Bid $q_i$

Learning $\theta$

Given $\theta$, the SSP tries to maximize
When $\theta$ is unknown, we have a bandit problem
The SSP must find trade-off between exploration and exploitation

The bandit problem

Arms are possible $q_i$s
Everything conditional on some context $c$
>10k events/sec
But many contexts (levels of $p_i$)
Learning quickly is critical

Trying UCB and Exp3

Each arm payoff is assumed independent
Reducing the number of $q_i$ buckets helps
It captures some of the local correlations and accelerates learning speed

Introducing the auction structure

Idea

Learning $f_\theta$ from a parametric familly
Using Thomson sampling to solve explore vs exploit

Information really scarce (won or not), $\theta$ should be low-dimensional
Storing the sequence of bayesian updates impractical

Particle filter

Posterior stored as a sample of the true distribution
Weights updated and normalized at each step
$\theta$s are perturbated slightly to account for non-stationarities
To avoid degeneracy $\Big(\sum_{k=1}^K w_{c,k,t}^2\big)^{-1}$, we resample $\theta$s

Thompson Sampling

Sample a value $\theta$ from the posterior distribution $\pi_{c_i,t_{i-1}}$
Compute the bid $q_i$ that would maximize the SSP's expected revenue if $x_i \sim f_{\theta}$ (see below)
Observe the auction outcome $\mathbf{1}_{q_i \geq x_i}$ and update the posterior $\pi_{c_i,t_{i}}$
As the particle filter provides a discrete approximation of the posterior distribution, the sampling step is straightforward

Particle Representation

$\mu$/$\sigma$
Noise/Resampling
Speed

Performance

$\mu$/$\sigma$
Noise/Resampling
J
Discount
Speed

Thank you

https://alephd.github.io/assets/header_bidding/slides