Optimal Header Bidding
Using Thompson Sampling
Ad tech
- Automation of digital media sales
- Creation of a spot market (real-time bidding)
- ~$300bn digital ad spending
- Soon (2020?) 50% of all media spending
- Dominated by Google and Facebook
- But constantly evolving
The idealized SSP
In practice
- Lots of market imperfections
- Integration costs
- Business partnerships
- Cross subsidies
- Each SSP has some exclusive demand or supply
- Some publishers don't want to rely on one tech stack
- Hence the idea of Header Bidding
SSP competition
The header bidding problem
- For each impression $i$
- SSP closing ones auction at $p_i$
- Advertizer will pay $p_i$ if winning
- SSP returning $q_i$ to the publisher
- $q_i$ competes with other SSPs (first price)
- SSP wins if $q_i > x_i$ the largest of other SSPs responses
- If winning SSP pays $q_i$ and charges $p_i$
- What is the best $q_i$?
The payoff with full information
- What is the best $q_i$?
- Without per impression competition it could be
$$q_i = p_i\cdot(1-\text{fees})$$
- If $x_i$ is known the SSP tries to maximize
$$R_i(q_i) = \mathbf{1}_{q_i \geq x_i}(p_i - q_i)$$
- $q_i$ should be as small as possible
- But if $q_i$ too small everything is lost
The stochastic setting
- $x_i$ drawn from an unknown distribution $f_\theta$
- $f_\theta$ parametrized by $\theta$
- What is the best $q_i$?
- Given $\theta$, the SSP tries to maximize
$$\mathbb{E}\left[R_i(q_i)\right] = (p_i - q_i) F_{\theta}(q_i)$$
Learning $\theta$
- Given $\theta$, the SSP tries to maximize
$$\mathbb{E}\left[R_i(q_i)\right] = (p_i - q_i) F_{\theta}(q_i)$$
- When $\theta$ is unknown, we have a bandit problem
- The SSP must find trade-off between exploration and exploitation
The bandit problem
- Arms are possible $q_i$s
- Everything conditional on some context $c$
- >10k events/sec
- But many contexts (levels of $p_i$)
- Learning quickly is critical
Trying UCB and Exp3
- Each arm payoff is assumed independent
- Reducing the number of $q_i$ buckets helps
- It captures some of the local correlations and accelerates learning speed
Introducing the auction structure
- Idea
- Learning $f_\theta$ from a parametric familly
- Using Thomson sampling to solve explore vs exploit
- Information really scarce (won or not), $\theta$ should be low-dimensional
- Storing the sequence of bayesian updates impractical
$$\pi_{c,t}(\theta) \propto \pi_{c,0}(\theta) \prod_{i \in \mathcal{D}_{t,c}} \big[F_{\theta}(q_i) \mathbf{1}_{x_i \leq q_i} + (1 - F_{\theta}(q_i)) \mathbf{1}_{x_i > q_i}\big]$$
Particle filter
- Posterior stored as a sample of the true distribution
$$\hat{w}_{c,k,t} = \hat{w}_{c,k,t-1} \times \big[F_{\theta_{c,k,t}}(q_t) \mathbf{1}_{x_t \leq q_t} + (1 - F_{\theta_{c,k,t}}(q_t)) \mathbf{1}_{x_t > q_t}\big]$$
- Weights updated and normalized at each step
$$w_{c,k,t} = \frac{\hat{w}_{c,k,t}}{\sum_{k'=1}^K \hat{w}_{c,k',t}}$$
- $\theta$s are perturbated slightly to account for non-stationarities
- To avoid degeneracy $\Big(\sum_{k=1}^K w_{c,k,t}^2\big)^{-1}$, we resample $\theta$s
Thompson Sampling
- Sample a value $\theta$ from the posterior distribution $\pi_{c_i,t_{i-1}}$
- Compute the bid $q_i$ that would maximize the SSP's expected revenue if $x_i \sim f_{\theta}$ (see below)
- Observe the auction outcome $\mathbf{1}_{q_i \geq x_i}$ and update the posterior $\pi_{c_i,t_{i}}$
- As the particle filter provides a discrete approximation of the posterior distribution, the sampling step is straightforward