I was programming a Gibbs sampler the other day and all hell broke loose: small errors were hard to trace back to the source of the problem and debugging was a pain. The bugs could have been caught much more early if I had properly validated the input arguments of my various helper functions. So I decided it was time for me to learn how to do this properly.

Validating function input arguments in R

The easiest way is to manually incorporate checks.

mySum<-function(a,b){if(!is.numeric(a)|!is.numeric(b)){stop("Arguments should be numeric.")}if(length(a)!=length(b)){stop("Arguments should be of the same length.")}return(a+b)}

This works well enough, but it takes up a lot of space and you have to manually write up the description of the errors.

This is neater, but the error messages are not very descriptive.

>mySum(1,"1")Error:bisnotanumericorintegervector

What is b here? What arguments in the function call caused the error? It’s a bit hard to tell, especially if the call to this function is hidden in some large Gibbs sampler.

The assert function

My solution is the assert function. You can find on my Github Gist and which I also included at the end of this post.

Consider the statistical learning framework where we have data $X\sim Q$ for some unknown distribution $Q$, a model $\Theta$ and a loss function $\ell_\theta(X)$ measuring a cost associated with fitting the data $X$ using a particular $\theta\in\Theta$. Our goal is to use the data to learn about parameters which minimize the risk $R(\theta) = \mathbb{E}[\ell_\theta(X)]$. Here are two standard examples.

Density estimation. Suppose we observe independent random variables $X_1, X_2, \dots, X_n$. Here the model $\Theta$ parametrizes a set $\mathcal{M} = {p_\theta : \theta \in \Theta }$ of probability density functions (with respect to some dominating measure on the sample space), and our loss for $X = (X_1, \dots, X_n)$ is defined as
If, for instance, the variables $X_i$ are independent with common distribution with density function $p_{\theta_0}$ for some $\theta_0 \in \mathbb{\Theta}$, then it follows from the positivity of the Kullback-Leibler divergence that $\theta_0 \in \arg\min _ \theta \mathbb{E}[\ell _ \theta(X)]$. That is, under identifiability conditions, our learning target is the true data-generating distribution.

If the model is misspecified, roughly meaning that there is no $\theta_0\in \Theta$ such that $p_{\theta_0}$ is a density of $X_i$, then our framework sets up the learning problem to be about the parameter $\theta_0$ which is such that $p_{\theta_0}$ mininizes the Kullback-Leibler divergence between $p_{\theta_0}$ and the true marginal distribution of the $X_i$’s.

Regression. Here our observations take the form $(Y_i, X_i)$, the model $\Theta$ parameterizes regression functions $f_\theta$ and we can consider a sum of squared errors loss

Gibbs posterior distributions

Gibbs Learning approaches this problem from a pseudo Bayesian point of view. While typically a Bayesian approach would require the specification of a full data-generating model, here we replace the likelihood function by the pseudo-likelihood function $\theta \mapsto e^{-\ell_\theta(X)}$. Given a prior $\pi$ on $\Theta$, the Gibbs posterior distribution is then given by
and satisfies
whenever these expressions are well defined.

In the context of integrable pseudo-likelihoods, the above can be re-interpreted as a regular posterior distributions built from density functions $f _ \theta(x) \propto e^{-\ell _ \theta(x)}$ and with a prior $\tilde \pi$ satisfying
However, the reason we cannot apply standard asymptotic theory to the analysis of Gibbs posterior is that the quantity $c(\theta)$ will typically be sample-size dependent. That is, if $X=X^n=(X_1, X_2, \dots, X_n)$ for i.i.d. random variables $X_i$ and if the loss $\ell_\theta$ separates as the sum
then $c(\theta) = \left(\int e^{-l_\theta(x_1)} \, dx_1\right)^n$. This data-dependent prior, tilting $\pi$ by the function $c(\theta)^n$, is what allows Gibbs learning to target general risk-minimizing parameters rather than likelihood Kullback-Leibler minimizers.

Some of my ongoing research, presented as a poster at the O’Bayes conference in Warwick last summer, focused on understand the theoretical behaviour of Gibbs posterior distributions. I studied the posterior convergence and finite sample concentration properties of Gibbs posterior distributions under the large sample regime with additive losses $\ell_\theta^{(n)}(X_1, \dots, X_n) = \sum_{i=1}^n\ell_\theta(X_i)$. I’ve attached the poster (joint work with Yu Luo) below and you can find the additional references here.

Note that this is very preliminary work. We’re still in the process of exploring interesting directions (and I have very limited time this semester with the beginning of my PhD at Duke).

I went to the “climate strike” in Raleigh this afternoon, an event which is part of this week’s protest of political inaction on climate change (or of denial of climate change). It was pretty small in Raleigh, but still interesting to hear from people with all kind of backgrounds and points of views. I also had the opportunity to write to North Carolina’s governor Roy Cooper about this pressing issue. It’s a rather personal letter, written quickly and without a deep knowledge of the topic. But I thought I’d share it here as it, I believe, carries common sentiments on this issue.

The label “95% confidence interval” evokes the idea that we should invest the interval with 95/5 (19:1) betting odds that the observed interval contains the true value (which would make the confidence interval a 95% bayesian posterior interval$^{11}$). This view may be harmless in a perfect randomized experiment with no background information to inform the bet (the original setting for the “confidence” concept); more often, however […]

It’s not true that “this view may is harmless in perfect randomized experiments”, and I’m not sure where this “original setting of the confidence concept” is coming from. In fact, even in the simplest possible cases, the posterior probability of a $95\%$ confidence interval can be pretty much anything.

Imagine a “perfect randomized experiment”, where we use a test of the hypothesis $H_0: \mu = 0$ for which, for some reason, has zero power. If $p < 0.05$, meaning that the associated confidence interval excludes $0$, then we are certain that $H_0$ holds and the posterior probability of the confidence interval is zero.

Let this sink in. For some (albeit trivial) statistical tests, observing $p < 0.05$ brings evidence in favor of the null.

The power of the test carries information, and the posterior probability of a confidence interval (or of an hypothesis), depends on this power among other things, even in perfect randomized experiments.