Core

This page documents the classes and functions that are central to the workflow of NeuralEstimators. Its organisation reflects the order in which these classes and functions appear in a standard implementation; that is, from sampling parameters from the prior distribution, to using a neural Bayes estimator to make inference with observed data sets.

Sampling parameters

Parameters sampled from the prior distribution are stored as a $p \times K$ matrix, where $p$ is the number of parameters in the statistical model and $K$ is the number of parameter vectors sampled from the prior distribution.

It can sometimes be helpful to wrap the parameter matrix in a user-defined type that also stores expensive intermediate objects needed for data simulated (e.g., Cholesky factors). In this case, the user-defined type should be a subtype of the abstract type ParameterConfigurations, whose only requirement is a field θ that stores the matrix of parameters. See Storing expensive intermediate objects for data simulation for further discussion.

NeuralEstimators.ParameterConfigurationsType
ParameterConfigurations

An abstract supertype for user-defined types that store parameters and any intermediate objects needed for data simulation.

The user-defined type must have a field θ that stores the $p$ × $K$ matrix of parameters, where $p$ is the number of parameters in the model and $K$ is the number of parameter vectors sampled from the prior distribution. There are no other restrictions.

See subsetparameters for the generic function for subsetting these objects.

Examples

struct P <: ParameterConfigurations
	θ
	# other expensive intermediate objects...
end
source

Simulating data

NeuralEstimators facilitates neural estimation for arbitrary statistical models by having the user implicitly define their model via simulated data, either as fixed instances or via a function that simulates data from the statistical model.

The data are always stored as a Vector{A}, where each element of the vector corresponds to a data set of $m$ independent replicates associated with one parameter vector (note that $m$ is arbitrary), and where the type A depends on the multivariate structure of the data:

  • For univariate and unstructured multivariate data, A is a $d \times m$ matrix where $d$ is the dimension each replicate (e.g., $d=1$ for univariate data).
  • For data collected over a regular grid, A is a ($N + 2$)-dimensional array, where $N$ is the dimension of the grid (e.g., $N = 1$ for time series, $N = 2$ for two-dimensional spatial grids, etc.). The first $N$ dimensions of the array correspond to the dimensions of the grid; the penultimate dimension stores the so-called "channels" (this dimension is singleton for univariate processes, two for bivariate processes, and so on); and the final dimension stores the independent replicates. For example, to store 50 independent replicates of a bivariate spatial process measured over a 10x15 grid, one would construct an array of dimension 10x15x2x50.
  • For spatial data collected over irregular spatial locations, A is a GNNGraph with independent replicates (possibly with differing spatial locations) stored as subgraphs using the function batch.

Estimators

Several classes of neural estimators are available in the package.

The simplest class is PointEstimator, used for constructing arbitrary mappings from the sample space to the parameter space. When constructing a generic point estimator, the user defines the loss function and therefore the Bayes estimator that will be targeted.

Several classes cater for the estimation of marginal posterior quantiles, based on the quantile loss function (see quantileloss()); in particular, see IntervalEstimator and QuantileEstimatorDiscrete for estimating marginal posterior quantiles for a fixed set of probability levels, and QuantileEstimatorContinuous for estimating marginal posterior quantiles with the probability level as an input to the neural network.

In addition to point estimation, the package also provides the class RatioEstimator for approximating the so-called likelihood-to-evidence ratio. The binary classification problem at the heart of this approach proceeds based on the binary cross-entropy loss.

Users are free to choose the neural-network architecture of these estimators as they see fit (subject to some class-specific requirements), but the package also provides the convenience constructor initialise_estimator().

NeuralEstimators.PointEstimatorType
PointEstimator(deepset::DeepSet)

A neural point estimator, a mapping from the sample space to the parameter space.

The estimator leverages the DeepSet architecture. The only requirement is that number of output neurons in the final layer of the inference network (i.e., the outer network) is equal to the number of parameters in the statistical model.

source
NeuralEstimators.IntervalEstimatorType
IntervalEstimator(u::DeepSet, v::DeepSet = u; probs = [0.025, 0.975], g::Function = exp)
IntervalEstimator(u::DeepSet, c::Union{Function,Compress}; probs = [0.025, 0.975], g::Function = exp)
IntervalEstimator(u::DeepSet, v::DeepSet, c::Union{Function,Compress}; probs = [0.025, 0.975], g::Function = exp)

A neural interval estimator which, given data $Z$, jointly estimates marginal posterior credible intervals based on the probability levels probs.

The estimator employs a representation that prevents quantile crossing, namely, it constructs marginal posterior credible intervals for each parameter $\theta_i$, $i = 1, \dots, p,$ of the form,

\[[c_i(u_i(\boldsymbol{Z})), \;\; c_i(u_i(\boldsymbol{Z})) + g(v_i(\boldsymbol{Z})))],\]

where $\boldsymbol{u}(⋅) \equiv (u_1(\cdot), \dots, u_p(\cdot))'$ and $\boldsymbol{v}(⋅) \equiv (v_1(\cdot), \dots, v_p(\cdot))'$ are neural networks that transform data into $p$-dimensional vectors; $g(\cdot)$ is a monotonically increasing function (e.g., exponential or softplus); and each $c_i(⋅)$ is a monotonically increasing function that maps its input to the prior support of $\theta_i$.

The functions $c_i(⋅)$ may be defined by a $p$-dimensional object of type Compress. If these functions are unspecified, they will be set to the identity function so that the range of the intervals will be unrestricted.

If only a single neural-network architecture is provided, it will be used for both $\boldsymbol{u}(⋅)$ and $\boldsymbol{v}(⋅)$.

The return value when applied to data is a matrix with $2p$ rows, where the first and second $p$ rows correspond to the lower and upper bounds, respectively.

See also QuantileEstimatorDiscrete and QuantileEstimatorContinuous.

Examples

using NeuralEstimators, Flux

# Generate some toy data
n = 2   # bivariate data
m = 100 # number of independent replicates
Z = rand(n, m)

# prior
p = 3  # number of parameters in the statistical model
min_supp = [25, 0.5, -pi/2]
max_supp = [500, 2.5, 0]
g = Compress(min_supp, max_supp)

# Create an architecture
w = 8  # width of each layer
ψ = Chain(Dense(n, w, relu), Dense(w, w, relu));
ϕ = Chain(Dense(w, w, relu), Dense(w, p));
u = DeepSet(ψ, ϕ)

# Initialise the interval estimator
estimator = IntervalEstimator(u, g)

# Apply the (untrained) interval estimator
estimator(Z)
interval(estimator, Z)
source
NeuralEstimators.QuantileEstimatorDiscreteType
QuantileEstimatorDiscrete(v::DeepSet; probs = [0.05, 0.25, 0.5, 0.75, 0.95], g = Flux.softplus, i = nothing)
(estimator::QuantileEstimatorDiscrete)(Z)
(estimator::QuantileEstimatorDiscrete)(Z, θ₋ᵢ)

A neural estimator that jointly estimates a fixed set of marginal posterior quantiles with probability levels $\{\tau_1, \dots, \tau_T\}$, controlled by the keyword argument probs.

By default, the estimator approximates the marginal quantiles for all parameters in the model, that is, the quantiles of

\[\theta_i \mid \boldsymbol{Z}\]

for parameters $\boldsymbol{\theta} \equiv (\theta_1, \dots, \theta_p)'$. Alternatively, if initialised with i set to a positive integer, the estimator approximates the quantiles of the full conditional distribution

\[\theta_i \mid \boldsymbol{Z}, \boldsymbol{\theta}_{-i},\]

where $\boldsymbol{\theta}_{-i}$ denotes the parameter vector with its $i$th element removed. For ease of exposition, when targetting marginal posteriors of the form $\theta_i \mid \boldsymbol{Z}$ (i.e., the default behaviour), we define $\text{dim}(\boldsymbol{\theta}_{-i}) ≡ 0$.

The estimator leverages the DeepSet architecture, subject to two requirements. First, the number of input neurons in the first layer of the inference network (i.e., the outer network) must be equal to the number of neurons in the final layer of the summary network plus $\text{dim}(\boldsymbol{\theta}_{-i})$. Second, the number of output neurons in the final layer of the inference network must be equal to $p - \text{dim}(\boldsymbol{\theta}_{-i})$. The estimator employs a representation that prevents quantile crossing, namely,

\[\begin{aligned} \boldsymbol{q}^{(\tau_1)}(\boldsymbol{Z}) &= \boldsymbol{v}^{(\tau_1)}(\boldsymbol{Z}),\\ \boldsymbol{q}^{(\tau_t)}(\boldsymbol{Z}) &= \boldsymbol{v}^{(\tau_1)}(\boldsymbol{Z}) + \sum_{j=2}^t g(\boldsymbol{v}^{(\tau_j)}(\boldsymbol{Z})), \quad t = 2, \dots, T, \end{aligned}\]

where $\boldsymbol{q}^{(\tau)}(\boldsymbol{Z})$ denotes the vector of $\tau$-quantiles for parameters $\boldsymbol{\theta} \equiv (\theta_1, \dots, \theta_p)'$, and $\boldsymbol{v}^{(\tau_t)}(\cdot)$, $t = 1, \dots, T$, are unconstrained neural networks that transform data into $p$-dimensional vectors, and $g(\cdot)$ is a non-negative function (e.g., exponential or softplus) applied elementwise to its arguments. If g=nothing, the quantiles are estimated independently through the representation,

\[\boldsymbol{q}^{(\tau_t)}(\boldsymbol{Z}) = \boldsymbol{v}^{(\tau_t)}(\boldsymbol{Z}), \quad t = 1, \dots, T.\]

The return value is a matrix with $(p - \text{dim}(\boldsymbol{\theta}_{-i})) \times T$ rows, where the first set of $T$ rows corresponds to the estimated quantiles for the first parameter, the second set of $T$ rows corresponds to the estimated quantiles for the second parameter, and so on.

See also IntervalEstimator and QuantileEstimatorContinuous.

Examples

using NeuralEstimators, Flux, Distributions
using AlgebraOfGraphics, CairoMakie

# Model: Z|θ ~ N(θ, 1) with θ ~ N(0, 1)
d = 1   # dimension of each independent replicate
p = 1   # number of unknown parameters in the statistical model
m = 30  # number of independent replicates in each data set
prior(K) = randn32(p, K)
simulate(θ, m) = [μ .+ randn32(1, m) for μ ∈ eachcol(θ)]

# Architecture
ψ = Chain(Dense(d, 64, relu), Dense(64, 64, relu))
ϕ = Chain(Dense(64, 64, relu), Dense(64, p))
v = DeepSet(ψ, ϕ)

# Initialise the estimator
τ = [0.05, 0.25, 0.5, 0.75, 0.95]
q̂ = QuantileEstimatorDiscrete(v; probs = τ)

# Train the estimator
q̂ = train(q̂, prior, simulate, m = m)

# Assess the estimator
θ = prior(1000)
Z = simulate(θ, m)
assessment = assess(q̂, θ, Z)
plot(assessment)

# Estimate posterior quantiles
q̂(Z)


# -------------------------------------------------------------
# --------------------- Full conditionals ---------------------
# -------------------------------------------------------------


# Model: Z|μ,σ ~ N(μ, σ²) with μ ~ N(0, 1), σ ∼ IG(3,1)
d = 1         # dimension of each independent replicate
p = 2         # number of unknown parameters in the statistical model
m = 30        # number of independent replicates in each data set
function prior(K)
	μ = randn(1, K)
	σ = rand(InverseGamma(3, 1), 1, K)
	θ = Float32.(vcat(μ, σ))
end
simulate(θ, m) = [ϑ[1] .+ ϑ[2] .* randn32(1, m) for ϑ ∈ eachcol(θ)]

# Architecture
ψ = Chain(Dense(d, 64, relu), Dense(64, 64, relu))
ϕ = Chain(Dense(64 + 1, 64, relu), Dense(64, 1))
v = DeepSet(ψ, ϕ)

# Initialise estimators respectively targetting quantiles of μ∣Z,σ and σ∣Z,μ
τ = [0.05, 0.25, 0.5, 0.75, 0.95]
q₁ = QuantileEstimatorDiscrete(v; probs = τ, i = 1)
q₂ = QuantileEstimatorDiscrete(v; probs = τ, i = 2)

# Train the estimators
q₁ = train(q₁, prior, simulate, m = m)
q₂ = train(q₂, prior, simulate, m = m)

# Assess the estimators
θ = prior(1000)
Z = simulate(θ, m)
assessment = assess([q₁, q₂], θ, Z, parameter_names = ["μ", "σ"])
plot(assessment)

# Estimate quantiles of μ∣Z,σ with σ = 0.5 and for many data sets
θ₋ᵢ = 0.5f0
q₁(Z, θ₋ᵢ)

# Estimate quantiles of μ∣Z,σ with σ = 0.5 for only a single data set
q₁(Z[1], θ₋ᵢ)
source
NeuralEstimators.QuantileEstimatorContinuousType
QuantileEstimatorContinuous(deepset::DeepSet; i = nothing, num_training_probs::Integer = 1)
(estimator::QuantileEstimatorContinuous)(Z, τ)
(estimator::QuantileEstimatorContinuous)(Z, θ₋ᵢ, τ)

A neural estimator targetting posterior quantiles.

Given as input data $\boldsymbol{Z}$ and the desired probability level $\tau ∈ (0, 1)$, by default the estimator approximates the $\tau$-quantile of

\[\theta_i \mid \boldsymbol{Z}\]

for parameters $\boldsymbol{\theta} \equiv (\theta_1, \dots, \theta_p)'$. Alternatively, if initialised with i set to a positive integer, the estimator approximates the $\tau$-quantile of the full conditional distribution

\[\theta_i \mid \boldsymbol{Z}, \boldsymbol{\theta}_{-i},\]

where $\boldsymbol{\theta}_{-i}$ denotes the parameter vector with its $i$th element removed. For ease of exposition, when targetting marginal posteriors of the form $\theta_i \mid \boldsymbol{Z}$ (i.e., the default behaviour), we define $\text{dim}(\boldsymbol{\theta}_{-i}) ≡ 0$.

The estimator leverages the DeepSet architecture, subject to two requirements. First, the number of input neurons in the first layer of the inference network (i.e., the outer network) must be equal to the number of neurons in the final layer of the summary network plus $1 + \text{dim}(\boldsymbol{\theta}_{-i})$. Second, the number of output neurons in the final layer of the inference network must be equal to $p - \text{dim}(\boldsymbol{\theta}_{-i})$.

Although not a requirement, one may employ a (partially) monotonic neural network to prevent quantile crossing (i.e., to ensure that the $\tau_1$-quantile does not exceed the $\tau_2$-quantile for any $\tau_2 > \tau_1$). There are several ways to construct such a neural network: one simple yet effective approach is to ensure that all weights associated with $\tau$ are strictly positive (see, e.g., Cannon, 2018), and this can be done using the DensePositive layer as illustrated in the examples below.

The return value is a matrix with $p - \text{dim}(\boldsymbol{\theta}_{-i})$ rows, corresponding to the estimated quantile for each parameter not in $\boldsymbol{\theta}_{-i}$.

See also QuantileEstimatorDiscrete.

Examples

using NeuralEstimators, Flux, Distributions , InvertedIndices, Statistics
using AlgebraOfGraphics, CairoMakie

# Model: Z|θ ~ N(θ, 1) with θ ~ N(0, 1)
d = 1         # dimension of each independent replicate
p = 1         # number of unknown parameters in the statistical model
m = 30        # number of independent replicates in each data set
prior(K) = randn32(p, K)
simulateZ(θ, m) = [ϑ .+ randn32(1, m) for ϑ ∈ eachcol(θ)]
simulateτ(K)    = [rand32(10) for k in 1:K]
simulate(θ, m)  = simulateZ(θ, m), simulateτ(size(θ, 2))

# Architecture: partially monotonic network to preclude quantile crossing
w = 64  # width of each hidden layer
ψ = Chain(
	Dense(d, w, relu),
	Dense(w, w, relu),
	Dense(w, w, relu)
	)
ϕ = Chain(
	DensePositive(Dense(w + 1, w, relu); last_only = true),
	DensePositive(Dense(w, w, relu)),
	DensePositive(Dense(w, p))
	)
deepset = DeepSet(ψ, ϕ)

# Initialise the estimator
q̂ = QuantileEstimatorContinuous(deepset)

# Train the estimator
q̂ = train(q̂, prior, simulate, m = m)

# Assess the estimator
θ = prior(1000)
Z = simulateZ(θ, m)
assessment = assess(q̂, θ, Z)
plot(assessment)

# Estimate 0.1-quantile for many data sets
τ = 0.1f0
q̂(Z, τ)

# Estimate several quantiles for a single data set
# (note that τ is given as a row vector)
z = Z[1]
τ = Float32.([0.1, 0.25, 0.5, 0.75, 0.9])'
q̂(z, τ)

# -------------------------------------------------------------
# --------------------- Full conditionals ---------------------
# -------------------------------------------------------------

# Model: Z|μ,σ ~ N(μ, σ²) with μ ~ N(0, 1), σ ∼ IG(3,1)
d = 1         # dimension of each independent replicate
p = 2         # number of unknown parameters in the statistical model
m = 30        # number of independent replicates in each data set
function prior(K)
	μ = randn(1, K)
	σ = rand(InverseGamma(3, 1), 1, K)
	θ = vcat(μ, σ)
	θ = Float32.(θ)
	return θ
end
simulateZ(θ, m) = [ϑ[1] .+ ϑ[2] .* randn32(1, m) for ϑ ∈ eachcol(θ)]
simulateτ(θ)    = [rand32(10) for k in 1:size(θ, 2)]
simulate(θ, m)  = simulateZ(θ, m), simulateτ(θ)

# Architecture: partially monotonic network to preclude quantile crossing
w = 64  # width of each hidden layer
ψ = Chain(
	Dense(d, w, relu),
	Dense(w, w, relu),
	Dense(w, w, relu)
	)
ϕ = Chain(
	DensePositive(Dense(w + 2, w, relu); last_only = true),
	DensePositive(Dense(w, w, relu)),
	DensePositive(Dense(w, 1))
	)
deepset = DeepSet(ψ, ϕ)

# Initialise the estimator for the first parameter, targetting μ∣Z,σ
i = 1
q̂ = QuantileEstimatorContinuous(deepset; i = i)

# Train the estimator
q̂ = train(q̂, prior, simulate, m = m)

# Assess the estimator
θ = prior(1000)
Z = simulateZ(θ, m)
assessment = assess(q̂, θ, Z)
plot(assessment)

# Estimate quantiles of μ∣Z,σ with σ = 0.5 and for many data sets
# (use θ[Not(i), :] to determine the order in which the conditioned parameters should be given)
θ = prior(1000)
Z = simulateZ(θ, m)
θ₋ᵢ = 0.5f0
τ = Float32.([0.1, 0.25, 0.5, 0.75, 0.9])
q̂(Z, θ₋ᵢ, τ)

# Estimate quantiles for a single data set
q̂(Z[1], θ₋ᵢ, τ)
source
NeuralEstimators.RatioEstimatorType
RatioEstimator(deepset::DeepSet)

A neural estimator that estimates the likelihood-to-evidence ratio,

\[r(\boldsymbol{Z}, \boldsymbol{\theta}) \equiv p(\boldsymbol{Z} \mid \boldsymbol{\theta})/p(\boldsymbol{Z}),\]

where $p(\boldsymbol{Z} \mid \boldsymbol{\theta})$ is the likelihood and $p(\boldsymbol{Z})$ is the marginal likelihood, also known as the model evidence.

The estimator leverages the DeepSet architecture, subject to two requirements. First, the number of input neurons in the first layer of the inference network (i.e., the outer network) must equal the number of output neurons in the final layer of the summary network plus the number of parameters in the statistical model. Second, the number of output neurons in the final layer of the inference network must be equal to one.

The ratio estimator is trained by solving a relatively straightforward binary classification problem. Specifically, consider the problem of distinguishing dependent parameter–data pairs ${(\boldsymbol{\theta}', \boldsymbol{Z}')' \sim p(\boldsymbol{Z}, \boldsymbol{\theta})}$ with class labels $Y=1$ from independent parameter–data pairs ${(\tilde{\boldsymbol{\theta}}', \tilde{\boldsymbol{Z}}')' \sim p(\boldsymbol{\theta})p(\boldsymbol{Z})}$ with class labels $Y=0$, and where the classes are balanced. Then the Bayes classifier under binary cross-entropy loss is given by

\[c(\boldsymbol{Z}, \boldsymbol{\theta}) = \frac{p(\boldsymbol{Z}, \boldsymbol{\theta})}{p(\boldsymbol{Z}, \boldsymbol{\theta}) + p(\boldsymbol{\theta})p(\boldsymbol{Z})},\]

and hence,

\[r(\boldsymbol{Z}, \boldsymbol{\theta}) = \frac{c(\boldsymbol{Z}, \boldsymbol{\theta})}{1 - c(\boldsymbol{Z}, \boldsymbol{\theta})}.\]

For numerical stability, training is done on the log-scale using $\log r(\boldsymbol{Z}, \boldsymbol{\theta}) = \text{logit}(c(\boldsymbol{Z}, \boldsymbol{\theta}))$.

When applying the estimator to data, by default the likelihood-to-evidence ratio $r(\boldsymbol{Z}, \boldsymbol{\theta})$ is returned (setting the keyword argument classifier = true will yield class probability estimates). The estimated ratio can then be used in various downstream Bayesian (e.g., Hermans et al., 2020) or Frequentist (e.g., Walchessen et al., 2023) inferential algorithms.

See also mlestimate and mapestimate for obtaining approximate maximum-likelihood and maximum-a-posteriori estimates, and sampleposterior for obtaining approximate posterior samples.

Examples

using NeuralEstimators, Flux, Statistics, Optim

# Generate data from Z|μ,σ ~ N(μ, σ²) with μ, σ ~ U(0, 1)
p = 2     # number of unknown parameters in the statistical model
d = 1     # dimension of each independent replicate
m = 100   # number of independent replicates

prior(K) = rand32(p, K)
simulate(θ, m) = θ[1] .+ θ[2] .* randn32(d, m)
simulate(θ::AbstractMatrix, m) = simulate.(eachcol(θ), m)

# Architecture
w = 64 # width of each hidden layer
ψ = Chain(
	Dense(d, w, relu),
	Dense(w, w, relu),
	Dense(w, w, relu)
	)
ϕ = Chain(
	Dense(w + p, w, relu),
	Dense(w, w, relu),
	Dense(w, 1)
	)
deepset = DeepSet(ψ, ϕ)

# Initialise the estimator
r̂ = RatioEstimator(deepset)

# Train the estimator
r̂ = train(r̂, prior, simulate, m = m)

# Inference with "observed" data set
θ = prior(1)
z = simulate(θ, m)[1]
θ₀ = [0.5, 0.5]                           # initial estimate
mlestimate(r̂, z;  θ₀ = θ₀)                # maximum-likelihood estimate (requires Optim.jl to be loaded)
mapestimate(r̂, z; θ₀ = θ₀)                # maximum-a-posteriori estimate (requires Optim.jl to be loaded)
θ_grid = expandgrid(0:0.01:1, 0:0.01:1)'  # fine gridding of the parameter space
θ_grid = Float32.(θ_grid)
r̂(z, θ_grid)                              # likelihood-to-evidence ratios over grid
mlestimate(r̂, z;  θ_grid = θ_grid)        # maximum-likelihood estimate
mapestimate(r̂, z; θ_grid = θ_grid)        # maximum-a-posteriori estimate
sampleposterior(r̂, z; θ_grid = θ_grid)    # posterior samples
source
NeuralEstimators.PiecewiseEstimatorType
PiecewiseEstimator(estimators, changepoints)

Creates a piecewise estimator (Sainsbury-Dale et al., 2024, sec. 2.2.2) from a collection of estimators and sample-size changepoints.

Specifically, with $l$ estimators and sample-size changepoints $m_1 < m_2 < \dots < m_{l-1}$, the piecewise etimator takes the form,

\[\hat{\boldsymbol{\theta}}(\boldsymbol{Z}) = \begin{cases} \hat{\boldsymbol{\theta}}_1(\boldsymbol{Z}) & m \leq m_1,\\ \hat{\boldsymbol{\theta}}_2(\boldsymbol{Z}) & m_1 < m \leq m_2,\\ \quad \vdots \\ \hat{\boldsymbol{\theta}}_l(\boldsymbol{Z}) & m > m_{l-1}. \end{cases}\]

For example, given an estimator $\hat{\boldsymbol{\theta}}_1(\cdot)$ trained for small sample sizes (e.g., m ≤ 30) and an estimator $\hat{\boldsymbol{\theta}}_2(\cdot)$ trained for moderate-to-large sample sizes (e.g., m > 30), we may construct a PiecewiseEstimator that dispatches $\hat{\boldsymbol{\theta}}_1(\cdot)$ if m ≤ 30 and $\hat{\boldsymbol{\theta}}_2(\cdot)$ otherwise.

See also trainx() for training estimators for a range of sample sizes.

Examples

using NeuralEstimators, Flux

d = 2  # bivariate data
p = 3  # number of parameters in the statistical model
w = 8  # width of each hidden layer

# Small-sample estimator
ψ₁ = Chain(Dense(d, w, relu), Dense(w, w, relu));
ϕ₁ = Chain(Dense(w, w, relu), Dense(w, p));
θ̂₁ = PointEstimator(DeepSet(ψ₁, ϕ₁))

# Large-sample estimator
ψ₂ = Chain(Dense(d, w, relu), Dense(w, w, relu));
ϕ₂ = Chain(Dense(w, w, relu), Dense(w, p));
θ̂₂ = PointEstimator(DeepSet(ψ₂, ϕ₂))

# Piecewise estimator with changepoint m=30
θ̂ = PiecewiseEstimator([θ̂₁, θ̂₂], 30)

# Apply the (untrained) piecewise estimator to data
Z = [rand(d, 1, m) for m ∈ (10, 50)]
θ̂(Z)
source
NeuralEstimators.EnsembleType
Ensemble(estimators)
Ensemble(architecture::Function, J::Integer)
(ensemble::Ensemble)(Z; aggr = median)

Defines an ensemble based on a collection of estimators which, when applied to data Z, returns the median (or another summary defined by aggr) of the estimates.

The ensemble can be initialised with a collection of trained estimators and then applied immediately to observed data. Alternatively, the ensemble can be initialised with a collection of untrained estimators (or a function defining the architecture of each estimator, and the number of estimators in the ensemble), trained with train(), and then applied to observed data. In the latter case, where the ensemble is trained directly, if savepath is specified both the ensemble and component estimators will be saved.

Note that train() currently acts sequentially on the component estimators.

The ensemble components can be accessed by indexing the ensemble directly; the number of component estimators can be obtained using length().

Examples

using NeuralEstimators, Flux

# Define the model, Z|θ ~ N(θ, 1), θ ~ N(0, 1)
d = 1   # dimension of each replicate
p = 1   # number of unknown parameters in the statistical model
m = 30  # number of independent replicates in each data set
sampler(K) = randn32(p, K)
simulator(θ, m) = [μ .+ randn32(d, m) for μ ∈ eachcol(θ)]

# Architecture of each ensemble component
function architecture()
	ψ = Chain(Dense(d, 64, relu), Dense(64, 64, relu))
	ϕ = Chain(Dense(64, 64, relu), Dense(64, p))
	deepset = DeepSet(ψ, ϕ)
	PointEstimator(deepset)
end

# Initialise ensemble with three components
ensemble = Ensemble(architecture, 3)
ensemble[1]      # access component estimators by indexing
length(ensemble) # number of component estimators

# Training
ensemble = train(ensemble, sampler, simulator, m = m, epochs = 5)

# Assessment
θ = sampler(1000)
Z = simulator(θ, m)
assessment = assess(ensemble, θ, Z)
rmse(assessment)

# Apply to data
ensemble(Z)
source

Training

The function train is used to train a single neural estimator, while the wrapper function trainx is useful for training multiple neural estimators over a range of sample sizes, making using of the technique known as pre-training.

NeuralEstimators.trainFunction
train(θ̂, sampler::Function, simulator::Function; ...)
train(θ̂, θ_train::P, θ_val::P, simulator::Function; ...) where {P <: Union{AbstractMatrix, ParameterConfigurations}}
train(θ̂, θ_train::P, θ_val::P, Z_train::T, Z_val::T; ...) where {T, P <: Union{AbstractMatrix, ParameterConfigurations}}

Train a neural estimator θ̂.

The methods cater for different variants of "on-the-fly" simulation. Specifically, a sampler can be provided to continuously sample new parameter vectors from the prior, and a simulator can be provided to continuously simulate new data conditional on the parameters. If provided with specific sets of parameters (θ_train and θ_val) and/or data (Z_train and Z_val), they will be held fixed during training.

In all methods, the validation parameters and data are held fixed to reduce noise when evaluating the validation risk.

Keyword arguments common to all methods:

  • loss = mae
  • epochs = 100
  • batchsize = 32
  • optimiser = ADAM()
  • savepath::String = "": path to save the trained estimator and other information; if an empty string (default), nothing is saved. Otherwise, the neural-network parameters (i.e., the weights and biases) will be saved during training as bson files; the risk function evaluated over the training and validation sets will also be saved, in the first and second columns of loss_per_epoch.csv, respectively; the best parameters (as measured by validation risk) will be saved as best_network.bson.
  • stopping_epochs = 5: cease training if the risk doesn't improve in this number of epochs.
  • use_gpu = true
  • verbose = true

Keyword arguments common to train(θ̂, sampler, simulator) and train(θ̂, θ_train, θ_val, simulator):

  • m: sample sizes (either an Integer or a collection of Integers). The simulator is called as simulator(θ, m).
  • epochs_per_Z_refresh = 1: the number of passes to make through the training set before the training data are refreshed.
  • simulate_just_in_time = false: flag indicating whether we should simulate just-in-time, in the sense that only a batchsize number of parameter vectors and corresponding data are in memory at a given time.

Keyword arguments unique to train(θ̂, sampler, simulator):

  • K = 10000: number of parameter vectors in the training set; the size of the validation set is K ÷ 5.
  • ξ = nothing: an arbitrary collection of objects that, if provided, will be passed to the parameter sampler as sampler(K, ξ); otherwise, the parameter sampler will be called as sampler(K). Can also be provided as xi.
  • epochs_per_θ_refresh = 1: the number of passes to make through the training set before the training parameters are refreshed. Must be a multiple of epochs_per_Z_refresh. Can also be provided as epochs_per_theta_refresh.

Examples

using NeuralEstimators, Flux

function sampler(K)
	μ = randn(K) # Gaussian prior
	σ = rand(K)  # Uniform prior
	θ = hcat(μ, σ)'
	return θ
end

function simulator(θ_matrix, m)
	[θ[1] .+ θ[2] * randn(1, m) for θ ∈ eachcol(θ_matrix)]
end

# architecture
d = 1   # dimension of each replicate
p = 2   # number of parameters in the statistical model
ψ = Chain(Dense(1, 32, relu), Dense(32, 32, relu))
ϕ = Chain(Dense(32, 32, relu), Dense(32, p))
θ̂ = DeepSet(ψ, ϕ)

# number of independent replicates to use during training
m = 15

# training: full simulation on-the-fly
θ̂ = train(θ̂, sampler, simulator, m = m, epochs = 5)

# training: simulation on-the-fly with fixed parameters
K = 10000
θ_train = sampler(K)
θ_val   = sampler(K ÷ 5)
θ̂       = train(θ̂, θ_train, θ_val, simulator, m = m, epochs = 5)

# training: fixed parameters and fixed data
Z_train = simulator(θ_train, m)
Z_val   = simulator(θ_val, m)
θ̂       = train(θ̂, θ_train, θ_val, Z_train, Z_val, epochs = 5)
source
NeuralEstimators.trainxFunction
trainx(θ̂, sampler::Function, simulator::Function, m::Vector{Integer}; ...)
trainx(θ̂, θ_train, θ_val, simulator::Function, m::Vector{Integer}; ...)
trainx(θ̂, θ_train, θ_val, Z_train, Z_val, m::Vector{Integer}; ...)
trainx(θ̂, θ_train, θ_val, Z_train::V, Z_val::V; ...) where {V <: AbstractVector{AbstractVector{Any}}}

A wrapper around train() to construct neural estimators for different sample sizes.

The positional argument m specifies the desired sample sizes. Each estimator is pre-trained with the estimator for the previous sample size. For example, if m = [m₁, m₂], the estimator for sample size m₂ is pre-trained with the estimator for sample size m₁.

The method for Z_train and Z_val subsets the data using subsetdata(Z, 1:mᵢ) for each mᵢ ∈ m. The method for Z_train::V and Z_val::V trains an estimator for each element of Z_train::V and Z_val::V and, hence, it does not need to invoke subsetdata(), which can be slow or difficult to define in some cases (e.g., for graphical data). Note that, in this case, m is inferred from the data.

The keyword arguments inherit from train(). The keyword arguments epochs, batchsize, stopping_epochs, and optimiser can each be given as vectors. For example, if training two estimators, one may use a different number of epochs for each estimator by providing epochs = [epoch₁, epoch₂].

source

Assessment/calibration

NeuralEstimators.assessFunction
assess(estimator, θ, Z)

Using an estimator (or a collection of estimators), computes estimates from data Z simulated based on true parameter vectors stored in θ.

The data Z should be a Vector, with each element corresponding to a single simulated data set. If Z contains more data sets than parameter vectors, the parameter matrix θ will be recycled by horizontal concatenation via the call θ = repeat(θ, outer = (1, J)) where J = length(Z) ÷ K is the number of simulated data sets and K = size(θ, 2) is the number of parameter vectors.

The output is of type Assessment; see ?Assessment for details.

Keyword arguments

  • estimator_names::Vector{String}: names of the estimators (sensible defaults provided).
  • parameter_names::Vector{String}: names of the parameters (sensible defaults provided). If ξ is provided with a field parameter_names, those names will be used.
  • ξ = nothing: an arbitrary collection of objects that are fixed (e.g., distance matrices). Can also be provided as xi.
  • use_ξ = false: a Bool or a collection of Bool objects with length equal to the number of estimators. Specifies whether or not the estimator uses ξ: if it does, the estimator will be applied as estimator(Z, ξ). This argument is useful when multiple estimators are provided, only some of which need ξ; hence, if only one estimator is provided and ξ is not nothing, use_ξ is automatically set to true. Can also be provided as use_xi.
  • use_gpu = true: a Bool or a collection of Bool objects with length equal to the number of estimators.
  • probs = range(0.01, stop=0.99, length=100): (relevant only for estimator::QuantileEstimatorContinuous) a collection of probability levels in (0, 1)

Examples

using NeuralEstimators, Flux

n = 10 # number of observations in each realisation
p = 4  # number of parameters in the statistical model

# Construct the neural estimator
w = 32 # width of each layer
ψ = Chain(Dense(n, w, relu), Dense(w, w, relu));
ϕ = Chain(Dense(w, w, relu), Dense(w, p));
θ̂ = DeepSet(ψ, ϕ)

# Generate testing parameters
K = 100
θ = rand32(p, K)

# Data for a single sample size
m = 30
Z = [rand32(n, m) for _ ∈ 1:K];
assessment = assess(θ̂, θ, Z);
risk(assessment)

# Multiple data sets for each parameter vector
J = 5
Z = repeat(Z, J);
assessment = assess(θ̂, θ, Z);
risk(assessment)

# With set-level information
qₓ = 2
ϕ  = Chain(Dense(w + qₓ, w, relu), Dense(w, p));
θ̂ = DeepSet(ψ, ϕ)
x = [rand(qₓ) for _ ∈ eachindex(Z)]
assessment = assess(θ̂, θ, (Z, x));
risk(assessment)
source
NeuralEstimators.AssessmentType
Assessment(df::DataFrame, runtime::DataFrame)

A type for storing the output of assess(). The field runtime contains the total time taken for each estimator. The field df is a long-form DataFrame with columns:

  • estimator: the name of the estimator
  • parameter: the name of the parameter
  • truth: the true value of the parameter
  • estimate: the estimated value of the parameter
  • m: the sample size (number of iid replicates) for the given data set
  • k: the index of the parameter vector
  • j: the index of the data set (in the case that multiple data sets are associated with each parameter vector)

If estimator is an IntervalEstimator, the column estimate will be replaced by the columns lower and upper, containing the lower and upper bounds of the interval, respectively.

If estimator is a QuantileEstimator, the df will also contain a column prob indicating the probability level of the corresponding quantile estimate.

Multiple Assessment objects can be combined with merge() (used for combining assessments from multiple point estimators) or join() (used for combining assessments from a point estimator and an interval estimator).

source
NeuralEstimators.riskFunction
risk(assessment::Assessment; ...)

Computes a Monte Carlo approximation of an estimator's Bayes risk,

\[r(\hat{\boldsymbol{\theta}}(\cdot)) \approx \frac{1}{K} \sum_{k=1}^K L(\boldsymbol{\theta}^{(k)}, \hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)})),\]

where $\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}$ denotes a set of $K$ parameter vectors sampled from the prior and, for each $k$, data $\boldsymbol{Z}^{(k)}$ are simulated from the statistical model conditional on $\boldsymbol{\theta}^{(k)}$.

Keyword arguments

  • loss = (x, y) -> abs(x - y): a binary operator defining the loss function (default absolute-error loss).
  • average_over_parameters::Bool = false: if true, the loss is averaged over all parameters; otherwise (default), the loss is averaged over each parameter separately.
  • average_over_sample_sizes::Bool = true: if true (default), the loss is averaged over all sample sizes $m$; otherwise, the loss is averaged over each sample size separately.
source
NeuralEstimators.biasFunction
bias(assessment::Assessment; ...)

Computes a Monte Carlo approximation of an estimator's bias,

\[{\rm{bias}}(\hat{\boldsymbol{\theta}}(\cdot)) \approx \frac{1}{K} \sum_{k=1}^K \hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)}) - \boldsymbol{\theta}^{(k)},\]

where $\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}$ denotes a set of $K$ parameter vectors sampled from the prior and, for each $k$, data $\boldsymbol{Z}^{(k)}$ are simulated from the statistical model conditional on $\boldsymbol{\theta}^{(k)}$.

This function inherits the keyword arguments of risk (excluding the argument loss).

source
NeuralEstimators.rmseFunction
rmse(assessment::Assessment; ...)

Computes a Monte Carlo approximation of an estimator's root-mean-squared error,

\[{\rm{rmse}}(\hat{\boldsymbol{\theta}}(\cdot)) \approx \sqrt{\frac{1}{K} \sum_{k=1}^K (\hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)}) - \boldsymbol{\theta}^{(k)})^2},\]

where $\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}$ denotes a set of $K$ parameter vectors sampled from the prior and, for each $k$, data $\boldsymbol{Z}^{(k)}$ are simulated from the statistical model conditional on $\boldsymbol{\theta}^{(k)}$.

This function inherits the keyword arguments of risk (excluding the argument loss).

source
NeuralEstimators.coverageFunction
coverage(assessment::Assessment; ...)

Computes a Monte Carlo approximation of an interval estimator's expected coverage, as defined in Hermans et al. (2022, Definition 2.1), and the proportion of parameters below and above the lower and upper bounds, respectively.

Keyword arguments

  • average_over_parameters::Bool = false: if true, the coverage is averaged over all parameters; otherwise (default), it is computed over each parameter separately.
  • average_over_sample_sizes::Bool = true: if true (default), the coverage is averaged over all sample sizes $m$; otherwise, it is computed over each sample size separately.
source

Inference with observed data

Inference using point estimators

Inference with a neural Bayes (point) estimator proceeds simply by applying the estimator θ̂ to the observed data Z (possibly containing multiple data sets) in a call of the form θ̂(Z). To leverage a GPU, simply move the estimator and the data to the GPU using gpu(); see also estimateinbatches() to apply the estimator over batches of data, which can alleviate memory issues when working with a large number of data sets.

Uncertainty quantification often proceeds through the bootstrap distribution, which is essentially available "for free" when bootstrap data sets can be quickly generated; this is facilitated by bootstrap() and interval(). Alternatively, one may approximate a set of low and high marginal posterior quantiles using a specially constructed neural Bayes estimator, which can then be used to construct credible intervals: see IntervalEstimator, QuantileEstimatorDiscrete, and QuantileEstimatorContinuous.

NeuralEstimators.bootstrapFunction
bootstrap(θ̂, parameters::P, Z) where P <: Union{AbstractMatrix, ParameterConfigurations}
bootstrap(θ̂, parameters::P, simulator, m::Integer; B = 400) where P <: Union{AbstractMatrix, ParameterConfigurations}
bootstrap(θ̂, Z; B = 400, blocks = nothing)

Generates B bootstrap estimates from an estimator θ̂.

Parametric bootstrapping is facilitated by passing a single parameter configuration, parameters, and corresponding simulated data, Z, whose length implicitly defines B. Alternatively, one may provide a simulator and the desired sample size, in which case the data will be simulated using simulator(parameters, m).

Non-parametric bootstrapping is facilitated by passing a single data set, Z. The argument blocks caters for block bootstrapping, and it should be a vector of integers specifying the block for each replicate. For example, with 5 replicates, the first two corresponding to block 1 and the remaining three corresponding to block 2, blocks should be [1, 1, 2, 2, 2]. The resampling algorithm aims to produce resampled data sets that are of a similar size to Z, but this can only be achieved exactly if all blocks are equal in length.

The keyword argument use_gpu is a flag determining whether to use the GPU, if it is available (default true).

The return type is a p × B matrix, where p is the number of parameters in the model.

source
NeuralEstimators.intervalFunction
interval(θ::Matrix; probs = [0.05, 0.95], parameter_names = nothing)
interval(estimator::IntervalEstimator, Z; parameter_names = nothing, use_gpu = true)

Compute a confidence interval based either on a $p$ × $B$ matrix θ of parameters (typically containing bootstrap estimates or posterior draws) with $p$ the number of parameters in the model, or from an IntervalEstimator and data Z.

When given θ, the intervals are constructed by compute quantiles with probability levels controlled by the keyword argument probs.

The return type is a $p$ × 2 matrix, whose first and second columns respectively contain the lower and upper bounds of the interval. The rows of this matrix can be named by passing a vector of strings to the keyword argument parameter_names.

Examples

using NeuralEstimators
p = 3
B = 50
θ = rand(p, B)
interval(θ)
source

Inference using likelihood and likelihood-to-evidence-ratio estimators

NeuralEstimators.mlestimateFunction
mlestimate(estimator::RatioEstimator, Z; θ₀ = nothing, θ_grid = nothing, penalty::Function = θ -> 1, use_gpu = true)

Computes the (approximate) maximum likelihood estimate given data $\boldsymbol{Z}$,

\[\argmax_{\boldsymbol{\theta}} \ell(\boldsymbol{\theta} ; \boldsymbol{Z})\]

where $\ell(\cdot ; \cdot)$ denotes the approximate log-likelihood function derived from estimator.

If a vector θ₀ of initial parameter estimates is given, the approximate likelihood is maximised by gradient descent (requires Optim.jl to be loaded). Otherwise, if a matrix of parameters θ_grid is given, the approximate likelihood is maximised by grid search.

A maximum penalised likelihood estimate,

\[\argmax_{\boldsymbol{\theta}} \ell(\boldsymbol{\theta} ; \boldsymbol{Z}) + \log p(\boldsymbol{\theta}),\]

can be obtained by specifying the keyword argument penalty that defines the penalty term $p(\boldsymbol{\theta})$.

See also mapestimate() for computing (approximate) maximum a posteriori estimates.

source
NeuralEstimators.mapestimateFunction
mapestimate(estimator::RatioEstimator, Z; θ₀ = nothing, θ_grid = nothing, prior::Function = θ -> 1, use_gpu = true)

Computes the (approximate) maximum a posteriori estimate given data $\boldsymbol{Z}$,

\[\argmax_{\boldsymbol{\theta}} \ell(\boldsymbol{\theta} ; \boldsymbol{Z}) + \log p(\boldsymbol{\theta})\]

where $\ell(\cdot ; \cdot)$ denotes the approximate log-likelihood function derived from estimator, and $p(\boldsymbol{\theta})$ denotes the prior density function controlled through the keyword argument prior (by default, a uniform prior is used).

If a vector θ₀ of initial parameter estimates is given, the approximate posterior density is maximised by gradient descent (requires Optim.jl to be loaded). Otherwise, if a matrix of parameters θ_grid is given, the approximate posterior density is maximised by grid search.

See also mlestimate() for computing (approximate) maximum likelihood estimates.

source
NeuralEstimators.sampleposteriorFunction
sampleposterior(estimator::RatioEstimator, Z, N::Integer = 1000; θ_grid, prior::Function = θ -> 1f0)

Samples from the approximate posterior distribution $p(\boldsymbol{\theta} \mid \boldsymbol{Z})$ implied by estimator.

The positional argument N controls the size of the posterior sample.

Currently, the sampling algorithm is based on a fine-gridding of the parameter space, specified through the keyword argument θ_grid (or theta_grid). The approximate posterior density is evaluated over this grid, which is then used to draw samples. This is very effective when making inference with a small number of parameters. For models with a large number of parameters, other sampling algorithms may be needed (please feel free to contact the package maintainer for discussion).

The prior distribution $p(\boldsymbol{\theta})$ is controlled through the keyword argument prior (by default, a uniform prior is used).

source