mininf Documentation Status PyPI#

mininf is a minimal library to infer the parameters of probabilistic programs and make predictions.

“Hello World” of Inference: The Biased Coin#

Consider the classic inference example of estimating the bias of a coin \(\theta\) given \(n = 10\) binary observations \(x\). Formally, the model is specified as

(1)#\[\begin{split}\theta &\sim \mathsf{Beta}\left(2, 2\right)\\ x &\sim \mathsf{Bernoulli}\left(\theta\right).\end{split}\]

We have used a weak beta prior centered at \(\theta = 0.5\) to encode our loosely held prior belief that the coin is fair. We encode heads as \(x = 1\) and tails as \(x = 0\). Using mininf, we declare the model as a probabilistic program.

import mininf
import torch
from torch.distributions import Bernoulli, Beta


def model():
    n = 10
    theta = mininf.sample("theta", Beta(2, 2))
    x = mininf.sample("x", Bernoulli(theta), sample_shape=[n])
    return theta, x

Each sample() statement is equivalent to \(\sim\) in (1), and the sample_shape argument specificies the number of independent samples to be drawn. Let us draw a sample from the prior predictive distribution by executing the probabilistic program.

torch.manual_seed(0)  # For reproducibility of this example.
theta, x = model()
print(f"bias: {theta:.3f}; proportion of heads: {x.mean():.3f}")
bias: 0.776; proportion of heads: 0.900

For this simple example, the posterior distribution is available in closed form: \(\theta\mid x \sim \mathsf{Beta}\left(2 + k, 2 + n - k\right)\), where \(k = \sum_{i = 1} ^ n x_i\) is the observed number of heads. But we want to learn about arbitrary probabilistic programs, and we’ll use black-box variational inference (BBVI) to do so. In short, BBVI learns an approximate posterior in two simple steps: First, we declare a parametric form for the posterior, e.g., a beta distribution distribution for the bias of the coin or a normal distribution for the intercept of a linear regression model. Second, we optimize the evidence lower bound (ELBO) of the approximation given the model and data using stochastic gradient descent.

# Step 1: Declare the parametric form of the approximate posterior initialized to the prior.
approximation = mininf.nn.ParameterizedDistribution(Beta, concentration0=2, concentration1=2)

# Step 2: Condition the model on data and optimize the ELBO.
conditioned = mininf.condition(model, x=x)
optimizer = torch.optim.Adam(approximation.parameters(), lr=0.02)
loss = mininf.nn.EvidenceLowerBoundLoss()

for _ in range(3 if mininf.util.IN_CI else 1000):
    optimizer.zero_grad()
    loss(conditioned, {"theta": approximation()}).backward()
    optimizer.step()

So what’s going on here? ParameterizedDistribution is a module with learnable parameters that, upon execution, returns a distribution of the desired type. condition() conditions the model on data such that any evaluation of the joint distribution of the model incorporates the data. EvidenceLowerBoundLoss() is a module that evaluates a differentiable unbiased estimate of the ELBO which can be optimized.

Let us compare the distributions after optimization.

from matplotlib import pyplot as plt


distributions = {
    "prior": Beta(2, 2),
    "posterior": Beta(2 + x.sum(), 2 + (1 - x).sum()),
    "approximation": approximation(),  # We must execute the module to get a distribution.
}

fig, ax = plt.subplots()

lin = torch.linspace(0, 1, 100)
for label, distribution in distributions.items():
    ax.plot(lin, distribution.log_prob(lin).detach().exp(), label=label)

ax.legend()
ax.set_xlabel(r"coin bias $\theta$")
fig.tight_layout()
Matplotlib is building the font cache; this may take a moment.
_images/fe412b9c8579ff43e0e85e266b244e88d048769a9a0d782dee00f23a25268f73.png

Optimizing the ELBO yields a good approximation of the true posterior. This is expected because the approximation has the same parametric form of the true posterior. While specifying models and estimating the ELBO is straightforward using mininf, the crux is often the optimization procedure: What is the best optimizer and learning rate, do we need learning rate decay, and when do we stop the optimization process?

You can find out more about mininf’s features and use cases in the Examples.

Installation#

mininf is available on PyPI and can be installed by executing pip install mininf from the command line.