Approximate distributions

When constructing a PosteriorEstimator, one must choose an approximate distribution $q(\boldsymbol{\theta}; \boldsymbol{\kappa})$. These distributions are implemented as subtypes of the abstract supertype ApproximateDistribution.

Distributions

NeuralEstimators.ApproximateDistributionType
ApproximateDistribution

An abstract supertype for approximate posterior distributions used in conjunction with a PosteriorEstimator.

Subtypes A <: ApproximateDistribution must implement the following methods:

  • logdensity(q::A, θ::AbstractMatrix, tz::AbstractMatrix)
    • Used during training and therefore must support automatic differentiation.
    • θ is a d × K matrix of parameter vectors.
    • tz is a dstar × K matrix of summary statistics obtained by applying the neural network in the PosteriorEstimator to a collection of K data sets.
    • Should return a 1 × K matrix, where each entry is the log density log q(θₖ | tₖ) for the k-th data set evaluated at the k-th parameter vector θ[:, k].
  • sampleposterior(q::A, tz::AbstractMatrix, N::Integer)
    • Used during inference and therefore does not need to be differentiable.
    • Should return a Vector of length K, where each element is a d × N matrix containing N samples from the approximate posterior q(θ | tₖ) for the k-th data set.
source
NeuralEstimators.GaussianMixtureType
GaussianMixture <: ApproximateDistribution
GaussianMixture(d::Integer, dstar::Integer; num_components::Integer = 10, kwargs...)

A mixture of Gaussian distributions for amortised posterior inference, where d is the dimension of the parameter vector.

The density of the distribution is:

\[q(\boldsymbol{\theta}; \boldsymbol{\kappa}) = \sum_{j=1}^{J} \pi_j \cdot \mathcal{N}(\boldsymbol{\theta}; \boldsymbol{\mu}_j, \boldsymbol{\Sigma}_j), \]

where the parameters $\boldsymbol{\kappa}$ comprise the mixture weights $\pi_j \in [0, 1]$ subject to $\sum_{j=1}^{J} \pi_j = 1$, the mean vector $\boldsymbol{\mu}_j$ of each component, and the variance parameters of the diagonal covariance matrix $\boldsymbol{\Sigma}_j$.

When using a GaussianMixture as the approximate distribution of a PosteriorEstimator, the neural network should be a mapping from the sample space to $\mathbb{R}^{d^*}$, where $d^*$ is an appropriate number of summary statistics for the parameter vector $\boldsymbol{\theta}$. The summary statistics are then mapped to the mixture parameters using a conventional multilayer perceptron (MLP) with approporiately chosen output activation functions (e.g., softmax for the mixture weights, softplus for the variance parameters).

Keyword arguments

  • num_components::Integer = 10: number of components in the mixture.
  • kwargs: additional keyword arguments passed to MLP.
source
NeuralEstimators.NormalisingFlowType
NormalisingFlow <: ApproximateDistribution
NormalisingFlow(d::Integer, dstar::Integer; num_coupling_layers::Integer = 6, kwargs...)

A normalising flow for amortised posterior inference (e.g., Ardizzone et al., 2019; Radev et al., 2022), where d is the dimension of the parameter vector and dstar is the dimension of the summary statistics for the data.

Normalising flows are diffeomorphisms (i.e., invertible, differentiable transformations with differentiable inverses) that map a simple base distribution (e.g., standard Gaussian) to a more complex target distribution (e.g., the posterior). They achieve this by applying a sequence of learned transformations, the forms of which are chosen to be invertible and allow for tractable density computation via the change of variables formula. This allows for efficient density evaluation during the training stage, and efficient sampling during the inference stage. For further details, see the reviews by Kobyzev et al. (2020) and Papamakarios (2021).

NormalisingFlow uses affine coupling blocks (see AffineCouplingBlock), with activation normalisation (Kingma and Dhariwal, 2018) and permutations used between each block. The base distribution is taken to be a standard multivariate Gaussian distribution.

When using a NormalisingFlow as the approximate distribution of a PosteriorEstimator, the neural network should be a mapping from the sample space to $\mathbb{R}^{d^*}$, where $d^*$ is an appropriate number of summary statistics for the given parameter vector (e.g., $d^* = d$). The summary statistics are then mapped to the parameters of the affine coupling blocks using conventional multilayer perceptrons (see AffineCouplingBlock).

Keyword arguments

  • num_coupling_layers::Integer = 6: number of coupling layers.
  • kwargs: additional keyword arguments passed to AffineCouplingBlock.
source

Methods

NeuralEstimators.numdistributionalparamsFunction
numdistributionalparams(q::ApproximateDistribution)
numdistributionalparams(estimator::PosteriorEstimator)

The number of distributional parameters (i.e., the dimension of the space $\mathcal{K}$ of approximate-distribution parameters $\boldsymbol{\kappa}$).

source

Building blocks

NeuralEstimators.AffineCouplingBlockType
AffineCouplingBlock(κ₁::MLP, κ₂::MLP)
AffineCouplingBlock(d₁::Integer, dstar::Integer, d₂; kwargs...)

An affine coupling block used in a NormalisingFlow.

An affine coupling block splits its input $\boldsymbol{\theta}$ into two disjoint components, $\boldsymbol{\theta}_1$ and $\boldsymbol{\theta}_2$, with dimensions $d_1$ and $d_2$, respectively. The block then applies the following transformation:

\[\begin{aligned} \tilde{\boldsymbol{\theta}}_1 &= \boldsymbol{\theta}_1,\\ \tilde{\boldsymbol{\theta}}_2 &= \boldsymbol{\theta}_2 \odot \exp\{\boldsymbol{\kappa}_{\boldsymbol{\gamma},1}(\tilde{\boldsymbol{\theta}}_1, \boldsymbol{T}(\boldsymbol{Z}))\} + \boldsymbol{\kappa}_{\boldsymbol{\gamma},2}(\tilde{\boldsymbol{\theta}}_1, \boldsymbol{T}(\boldsymbol{Z})), \end{aligned}\]

where $\boldsymbol{\kappa}_{\boldsymbol{\gamma},1}(\cdot)$ and $\boldsymbol{\kappa}_{\boldsymbol{\gamma},2}(\cdot)$ are generic, non-invertible multilayer perceptrons (MLPs) that are functions of both the (transformed) first input component $\tilde{\boldsymbol{\theta}}_1$ and the learned $d^*$-dimensional summary statistics $\boldsymbol{T}(\boldsymbol{Z})$ (see PosteriorEstimator).

To prevent numerical overflows and stabilise the training of the model, the scaling factors $\boldsymbol{\kappa}_{\boldsymbol{\gamma},1}(\cdot)$ are clamped using the function

\[f(\boldsymbol{s}) = \frac{2c}{\pi}\tan^{-1}(\frac{\boldsymbol{s}}{c}),\]

where $c = 1.9$ is a fixed clamping threshold. This transformation ensures that the scaling factors do not grow excessively large.

Additional keyword arguments kwargs are passed to the MLP constructor when creating κ₁ and κ₂.

source