Approximate distributions
When constructing a PosteriorEstimator, one must specify a parametric family of probability distributions used to approximate the posterior distribution. These families of distributions are implemented as subtypes of AbstractApproximateDistribution.
Distributions
NeuralEstimators.AbstractApproximateDistribution Type
AbstractApproximateDistributionAn abstract supertype for approximate distributions used in conjunction with a PosteriorEstimator.
Subtypes A <: AbstractApproximateDistribution must implement the following methods:
_logdensity(q::A, θ::AbstractMatrix, t::AbstractMatrix)Used during training and therefore must support automatic differentiation.
θis ad × Kmatrix of parameter vectors.tis adstar × Kmatrix of learned summary statistics obtained by applying the neural network in thePosteriorEstimatorto a collection ofKdata sets.Should return a
1 × Kmatrix, where each entry is the log densitylog q(θₖ | tₖ)for thek-th data set evaluated at thek-th parameter vectorθ[:, k].
sampleposterior(q::A, t::AbstractMatrix, N::Integer)Used during inference and therefore does not need to be differentiable.
Should return a
Vectorof lengthK, where each element is ad × Nmatrix containingNsamples from the approximate posteriorq(θ | tₖ)for thek-th data set.
NeuralEstimators.Gaussian Type
Gaussian <: AbstractApproximateDistribution
Gaussian(d::Integer, num_summaries::Integer; kwargs...)A Gaussian distribution for amortised inference with a PosteriorEstimator, where d is the dimension of the parameter vector.
The density of the distribution is:
where the parameters
When using a Gaussian distribution as the approximate distribution of a PosteriorEstimator, the (learned) summary statistics are mapped to the distribution parameters identity for
Keyword arguments
kwargs: additional keyword arguments passed toMLP.
NeuralEstimators.GaussianMixture Type
GaussianMixture <: AbstractApproximateDistribution
GaussianMixture(d::Integer, num_summaries::Integer; num_components::Integer = 10, kwargs...)A mixture of Gaussian distributions for amortised inference with a PosteriorEstimator, where d is the dimension of the parameter vector.
The density of the distribution is:
where the parameters
When using a GaussianMixture as the approximate distribution of a PosteriorEstimator, the (learned) summary statistics are mapped to the mixture parameters using a multilayer perceptron (MLP) with approporiately chosen output activation functions (e.g., softmax for the mixture weights, softplus for the variance parameters).
Keyword arguments
num_components::Integer = 10: number of components in the mixture.kwargs: additional keyword arguments passed toMLP.
NeuralEstimators.NormalisingFlow Type
NormalisingFlow <: AbstractApproximateDistribution
NormalisingFlow(d::Integer, num_summaries::Integer; num_coupling_layer = 6, kwargs...)A normalising flow for amortised inference with a PosteriorEstimator, where d is the dimension of the parameter vector and num_summaries is the dimension of the summary statistics for the data.
Normalising flows are diffeomorphisms (i.e., invertible, differentiable transformations with differentiable inverses) that map a simple base distribution (e.g., standard Gaussian) to a more complex target distribution (e.g., the posterior). They achieve this by applying a sequence of learned transformations, the forms of which are chosen to be invertible and allow for tractable density computation via the change of variables formula. This allows for efficient density evaluation during the training stage, and efficient sampling during the inference stage. For further details, see the reviews by Kobyzev et al. (2020) and Papamakarios (2021).
NormalisingFlow uses affine coupling blocks (see AffineCouplingBlock), with optional activation normalisation (ActNorm; Kingma and Dhariwal, 2018) and permutations applied between each block via CouplingLayer. The base distribution is taken to be a standard multivariate Gaussian distribution.
When using a NormalisingFlow as the approximate distribution of a PosteriorEstimator, the (learned) summary statistics are used to condition the affine coupling blocks at each layer.
Note
To use NormalisingFlow with Enzyme.jl, set adtype = AutoEnzyme(mode = set_runtime_activity(Enzyme.Reverse)) in train.
Keyword arguments
num_coupling_layers::Integer = 6: number of coupling layers.kwargs: additional keyword arguments passed toCouplingLayerandAffineCouplingBlock.
NeuralEstimators.SpikeAndSlab Type
SpikeAndSlab <: AbstractApproximateDistribution
SpikeAndSlab(num_parameters::Integer, num_summaries::Integer; slab = GaussianMixture, kwargs...)A spike-and-slab distribution for amortised inference with a PosteriorEstimator.
Note
SpikeAndSlab currently supports only univariate parameters (i.e., num_parameters == 1).
The density of the distribution is a two-component mixture of a point mass (the "spike") at a fixed value
where spike), AbstractApproximateDistribution, e.g., a GaussianMixture).
When using a SpikeAndSlab distribution as the approximate distribution of a PosteriorEstimator, the (learned) summary statistics are mapped to the spike probability classifier), and to the parameters of the slab as described in the documentation of the chosen slab distribution.
The slab models the parameter on the real line. The functions transform and invtransform map between the parameter space and the real line: transform is applied to parameters before evaluating the slab density, and invtransform is applied to slab samples to map them back to the parameter space. Both default to identity.
Keyword arguments
slab = GaussianMixture: the slab distribution. May be either a subtype ofAbstractApproximateDistribution(constructed internally withnum_parameters,num_summaries, and any additionalkwargs) or a pre-constructedAbstractApproximateDistribution.spike::Real = 0: the location of the spike (point mass) in the parameter space.transform = identity: a function mapping parameters to the real line.invtransform = identity: a function mapping the real line to the parameter space.classifier_kwargs = (;): keyword arguments passed to theMLPused for the classifier.kwargs: additional keyword arguments passed to the constructor of theslabdistribution.
The posterior probability of the spike (i.e., spikeprobability.
Examples
using NeuralEstimators, Flux
# Simple linear regression Z = (x, y) with y = βx + ε, ε ~ N(0, 0.1²) and covariate x ~ U(0, 1).
# Spike-and-slab prior on the slope β: β = 0 with probability 1/2, else β ~ U(-1, 1).
d = 1 # number of parameters (the slope β)
m = 30 # number of (x, y) pairs in each data set
function sampler(K)
spike = rand(K) .< 0.5
β = ifelse.(spike, 0f0, 2f0 .* rand(Float32, K) .- 1f0)
NamedMatrix(β = β)
end
function simulator(θ::AbstractMatrix, m::Integer)
map(eachcol(θ)) do θₖ
x = rand(Float32, 1, m)
y = θₖ["β"] .* x .+ 0.1f0 .* randn(Float32, 1, m)
vcat(x, y) # 2×m: each column is one (x, y) pair
end
end
# Summary network: a DeepSet over the exchangeable (x, y) pairs
num_summaries = 32
ψ = Chain(Dense(2, 64, relu), Dense(64, 64, relu))
ϕ = Chain(Dense(64, 64, relu), Dense(64, num_summaries))
summary_network = DeepSet(ψ, ϕ)
# Spike-and-slab posterior estimator (spike at β = 0)
estimator = PosteriorEstimator(summary_network, d; num_summaries = num_summaries, q = SpikeAndSlab)
estimator = train(estimator, sampler, simulator, simulator_args = m, K = 5000, epochs = 20)
# Inference for data simulated with β = 0 (the spike)
Z = simulator(NamedMatrix(β = [0f0]), m)
spikeprobability(estimator, Z) # posterior probability that β = 0
sampleposterior(estimator, Z) # posterior draws (a mix of exact zeros and slab draws)Methods
NeuralEstimators.numdistributionalparams Function
numdistributionalparams(q::AbstractApproximateDistribution)
numdistributionalparams(estimator::PosteriorEstimator)The number of distributional parameters (i.e., the dimension of the space
Building blocks
NeuralEstimators.CouplingLayer Type
CouplingLayer(d, num_summaries; use_act_norm = true, use_permutation = true, kwargs...)A coupling layer used in a NormalisingFlow, combining two AffineCouplingBlocks with optional activation normalisation and permutation.
The layer splits its d-dimensional input into a lower half of dimension d₁ = div(d, 2) and an upper half of dimension d₂ = d - d₁. The two halves are then passed through a pair of affine coupling blocks in sequence: the first block transforms the lower half conditioned on the upper, and the second block transforms the upper half conditioned on the already-transformed lower half. This ensures every component is updated in a single forward pass, unlike a standard coupling layer where one half is left unchanged. When d = 1, the layer reduces to a single affine transformation of the one component conditioned on the summary statistics.
Optionally, activation normalisation (ActNorm) is applied before the coupling blocks, and a random Permutation is applied after.
The argument num_summaries is the dimension of the conditioning summary statistics (see PosteriorEstimator) and kwargs are passed to AffineCouplingBlock.
NeuralEstimators.AffineCouplingBlock Type
AffineCouplingBlock(κ₁::MLP, κ₂::MLP)
AffineCouplingBlock(d₁::Integer, num_summaries::Integer, d₂; kwargs...)An affine coupling block used in a NormalisingFlow.
An affine coupling block splits its input
where PosteriorEstimator).
To prevent numerical overflows and stabilise the training of the model, the scaling factors
where
Additional keyword arguments kwargs are passed to the MLP constructor when creating κ₁ and κ₂.
NeuralEstimators.ActNorm Type
ActNorm(d::Integer)Activation normalisation layer Kingma and Dhariwal, 2018 for an input of dimension d.
NeuralEstimators.Permutation Type
Permutation(in::Integer)A layer that permutes the inputs (of dimension in) entering a coupling block.
Note that a permutation layer is invertible with Jacobian determinant |J| = 1.
source