Proofs from the notebook Olivier's open lab book.

/ / / / /

Global bounds on the Jensen functional

Given a convex function $\varphi : \mathbb{R} \rightarrow \mathbb{R}$ and $X$ a random variable on $\mathbb{R}$, the Jensen functional of $\varphi$ and $X$ is defined as

The well-known Jensen inequality states that $\mathcal{J}(\varphi, X) \geq 0$. For instance, if $\varphi(x) = x^2$, then $\mathcal{J}(\varphi, X) = \text{Var}(X) \geq 0$. If $\mu$ and $\nu$ are two probability measures, $X \sim \nu$ and $\varphi$ is convex with $\varphi(1) = 0$, then $\mathcal{J}(\varphi, \tfrac{d\mu}{d\nu}(X)) =: D_\varphi(\mu, \nu)$ is a so-called $f$-divergence between probability measures such as the total variation distance, the Kullback-Leibler divergence, the $\chi^2$ divergence, etc.

If $X$ is bounded, then a converse to the Jensen inquality can be easily obtained as follows: let $m$ and $M$ be the infimum and maximum of $X$, and write $X = \alpha m + (1-\alpha)M$ for some random variable $\alpha$ taking values in $[0,1]$. Then $\mathbb{E}[\alpha] = (M - \mu)/(M-m)$ and consequently with $\mu:= \mathbb{E}[X]$,

When $\mu$ is unknown in practice, then maximizing the above over all possibilities is the bound

which is Theorem C in Simic (2011).

Some examples

Variance bound. Consider for example the case where $\varphi(x) = x^2$, so that $\mathcal{J}(\varphi, X) = \text{Var}(X)$. Then for $X$ taking values in say $[0,1]$, the above bounds read as

which is a well-known elementary result.

$f$-divergence bounds. In (Binette, 2019), I show how we can use similar ideas to get best-possible reverse Pinsker inequalities: upper bounds on $f$-divergences in terms of the total variation distance and likelihood ratio extremums. In particular, with $D(\mu|\nu) = \int \log\left(\frac{d\mu}{d\nu}\right) d\mu$ the Kullback-Leibler divergence between the probability measures $\mu$ and $\nu$, we find that if $a = \inf \frac{d\nu}{d\mu}$ and $b = \sup \frac{d\nu}{d\mu}$, then

Applying again the Jensen functional bound to $\sup_A \lvert \mu(A)-\nu(A) \rvert = \frac{1}{2}\int\left \lvert \frac{d\mu}{d\nu} - 1\right \rvert d\nu$, we obtain

and this implies the range of values theorem

Variations

In cases where $\mu$ is unknown and optimizing over all possibilities is not quite feasible, we can use the following trick.

Let $f(x) = x\varphi(m) + (1-x)\varphi(M) - \varphi(x m +(1-x)M)$ be the term involved in the maximization step of $(3)$. Then $f$ is concave with $f(0) = f(1) = 0$, and hence for any $p \in (0,1)$ we have that

In particular, taking $p = 1/ 2$, we obtain the result of Simic (2008) stating that

When $\varphi$ is differentiable (this assumption is not strictly necessary but it facilitate the statements), then we can use the concavity of $f$ (using the fact that $f(0) = f(1) = 0$) to very easily obtain

which is an inequality attributed to S.S. Dragomir (1999), although I haven’t managed to find the original paper yet (I sent a request at my university library; will post the original paper online if they find it).