Optimal Header Bidding

Using Thompson Sampling

Proceedings of KDD 2018

http://www.kdd.org/kdd2018/

Ad tech

  • Automation of digital media sales
  • Creation of a spot market (real-time bidding)
  • ~$300bn digital ad spending
  • Soon (2020?) 50% of all media spending
  • Dominated by Google and Facebook
  • But constantly evolving

The idealized SSP

In practice

  • Lots of market imperfections
    • Integration costs
    • Business partnerships
    • Cross subsidies
  • Each SSP has some exclusive demand or supply
  • Some publishers don't want to rely on one tech stack
  • Hence the idea of Header Bidding

SSP competition

The header bidding problem

  • For each impression $i$
  • SSP closing ones auction at $p_i$
  • Advertizer will pay $p_i$ if winning
  • SSP returning $q_i$ to the publisher
  • $q_i$ competes with other SSPs (first price)
  • SSP wins if $q_i > x_i$ the largest of other SSPs responses
  • If winning SSP pays $q_i$ and charges $p_i$
  • What is the best $q_i$?

The payoff with full information

  • What is the best $q_i$?
  • Without per impression competition it could be
  • $$q_i = p_i\cdot(1-\text{fees})$$
  • If $x_i$ is known the SSP tries to maximize
  • $$R_i(q_i) = \mathbf{1}_{q_i \geq x_i}(p_i - q_i)$$
  • $q_i$ should be as small as possible
  • But if $q_i$ too small everything is lost

Payoff

  • Value $p_i$

  • Next $x_i$

  • Bid $q_i$

The stochastic setting

  • $x_i$ drawn from an unknown distribution $f_\theta$
  • $f_\theta$ parametrized by $\theta$
  • What is the best $q_i$?
  • Given $\theta$, the SSP tries to maximize
  • $$\mathbb{E}\left[R_i(q_i)\right] = (p_i - q_i) F_{\theta}(q_i)$$

Expected payoff

  • Value $p_i$

  • $\mu$/$\sigma$

  • Bid $q_i$

Learning $\theta$

  • Given $\theta$, the SSP tries to maximize
  • $$\mathbb{E}\left[R_i(q_i)\right] = (p_i - q_i) F_{\theta}(q_i)$$
  • When $\theta$ is unknown, we have a bandit problem
  • The SSP must find trade-off between exploration and exploitation

The bandit problem

  • Arms are possible $q_i$s
  • Everything conditional on some context $c$
  • >10k events/sec
  • But many contexts (levels of $p_i$)
  • Learning quickly is critical

Trying UCB and Exp3

  • Each arm payoff is assumed independent
  • Reducing the number of $q_i$ buckets helps
  • It captures some of the local correlations and accelerates learning speed

Introducing the auction structure

  • Idea
    • Learning $f_\theta$ from a parametric familly
    • Using Thomson sampling to solve explore vs exploit
  • Information really scarce (won or not), $\theta$ should be low-dimensional
  • Storing the sequence of bayesian updates impractical
  • $$\pi_{c,t}(\theta) \propto \pi_{c,0}(\theta) \prod_{i \in \mathcal{D}_{t,c}} \big[F_{\theta}(q_i) \mathbf{1}_{x_i \leq q_i} + (1 - F_{\theta}(q_i)) \mathbf{1}_{x_i > q_i}\big]$$

Particle filter

  • Posterior stored as a sample of the true distribution
  • $$\hat{w}_{c,k,t} = \hat{w}_{c,k,t-1} \times \big[F_{\theta_{c,k,t}}(q_t) \mathbf{1}_{x_t \leq q_t} + (1 - F_{\theta_{c,k,t}}(q_t)) \mathbf{1}_{x_t > q_t}\big]$$
  • Weights updated and normalized at each step
  • $$w_{c,k,t} = \frac{\hat{w}_{c,k,t}}{\sum_{k'=1}^K \hat{w}_{c,k',t}}$$
  • $\theta$s are perturbated slightly to account for non-stationarities
  • To avoid degeneracy $\Big(\sum_{k=1}^K w_{c,k,t}^2\big)^{-1}$, we resample $\theta$s

Thompson Sampling

  • Sample a value $\theta$ from the posterior distribution $\pi_{c_i,t_{i-1}}$
  • Compute the bid $q_i$ that would maximize the SSP's expected revenue if $x_i \sim f_{\theta}$ (see below)
  • Observe the auction outcome $\mathbf{1}_{q_i \geq x_i}$ and update the posterior $\pi_{c_i,t_{i}}$
  • As the particle filter provides a discrete approximation of the posterior distribution, the sampling step is straightforward

Particle Representation

  • $\mu$/$\sigma$

  • Noise/Resampling

  • Speed

Performance

  • $\mu$/$\sigma$

  • Noise/Resampling

  • J

  • Discount

  • Speed

Thank you

https://alephd.github.io/assets/header_bidding/slides