Core

This page documents the classes and functions that are central to the workflow of NeuralEstimators. Its organisation reflects the order in which these classes and functions appear in a standard implementation: from sampling parameters from the prior distribution, to making inference with observed data.

Sampling parameters

Parameters sampled from the prior distribution are stored as a $d \times K$ matrix, where $d$ is the dimension of the parameter vector to make inference on and $K$ is the number of sampled parameter vectors.

It can sometimes be helpful to wrap the parameter matrix in a user-defined type that also stores expensive intermediate objects needed for data simulated (e.g., Cholesky factors). The user-defined type should be a subtype of ParameterConfigurations, whose only requirement is a field θ that stores the matrix of parameters. See Storing expensive intermediate objects for data simulation for further discussion.

NeuralEstimators.ParameterConfigurations — Type

ParameterConfigurations

An abstract supertype for user-defined types that store parameters and any intermediate objects needed for data simulation.

The user-defined type must have a field θ that stores the $d$ × $K$ matrix of parameters, where $d$ is the dimension of the parameter vector to make inference on and $K$ is the number of sampled parameter vectors. There are no other requirements.

See subsetparameters() for the generic function for subsetting these objects.

Examples

struct P <: ParameterConfigurations
	θ
	# other expensive intermediate objects...
end

Simulating data

The package accommodates any model for which simulation is feasible by allowing users to define their model implicitly through simulated data.

Simulated data sets are stored as mini-batches in a format amenable to the chosen neural-network architecture. For example, when constructing an estimator from data collected over a grid, one may use a generic CNN, with each data set stored in the final dimension of a four-dimensional array. When performing inference from replicated data, a DeepSet architecture may be used, where simulated data sets are stored in a vector, and conditionally independent replicates are stored as mini-batches within each element of the vector.

Estimators

The package provides several classes of neural estimators, organised within a type hierarchy. At the top-level of the hierarchy is NeuralEstimator, an abstract supertype for all neural estimators in the package.

Neural Bayes estimators are implemented as subtypes of the abstract supertype BayesEstimator. The simple type PointEstimator is used for constructing neural Bayes estimators under general, user-defined loss functions. Several specialised types cater for the estimation of posterior quantiles based on the quantile loss function: see IntervalEstimator and its generalisation QuantileEstimator for estimating posterior quantiles for a fixed set of probability levels.

The type PosteriorEstimator can be used to approximate the posterior distribution, and RatioEstimator can be used to approximate the likelihood-to-evidence ratio.

Several types serve as wrappers around the aforementioned estimators, enhancing their functionality. PiecewiseEstimator applies different estimators based on the sample size of the data (see the discussion on Variable sample sizes). Finally, Ensemble combines multiple estimators, aggregating their individual estimates to improve accuracy.

NeuralEstimators.NeuralEstimator — Type

NeuralEstimator

An abstract supertype for all neural estimators.

NeuralEstimators.BayesEstimator — Type

BayesEstimator <: NeuralEstimator

An abstract supertype for neural Bayes estimators.

NeuralEstimators.PointEstimator — Type

PointEstimator <: BayesEstimator
PointEstimator(network)

A neural point estimator, where the neural network is a mapping from the sample space to the parameter space.

NeuralEstimators.PosteriorEstimator — Type

PosteriorEstimator <: NeuralEstimator
PosteriorEstimator(q::ApproximateDistribution, network)

A neural estimator that approximates the posterior distribution $p(\boldsymbol{\theta} \mid \boldsymbol{Z})$, based on a neural network and an approximate distribution q (see the available in-built Approximate distributions).

The neural network is a mapping from the sample space to a space determined by the chosen approximate distribution q. Often, the output space is the space $\mathcal{K}$ of the approximate-distribution parameters $\boldsymbol{\kappa}$. However, for certain distributions (notably, NormalisingFlow), the neural network outputs summary statistics of suitable dimension (e.g., the dimension $d$ of the parameter vector), which are then transformed into parameters of the approximate distribution using conventional multilayer perceptrons (see NormalisingFlow).

Examples

using NeuralEstimators, Flux

# Data Z|μ,σ ~ N(μ, σ²) with priors μ ~ U(0, 1) and σ ~ U(0, 1)
d = 2     # dimension of the parameter vector θ
n = 1     # dimension of each independent replicate of Z
m = 30    # number of independent replicates in each data set
sample(K) = rand32(d, K)
simulate(θ, m) = [ϑ[1] .+ ϑ[2] .* randn32(n, m) for ϑ in eachcol(θ)]

# Distribution used to approximate the posterior 
q = NormalisingFlow(d, d) 

# Neural network (outputs d summary statistics)
w = 128   
ψ = Chain(Dense(n, w, relu), Dense(w, w, relu), Dense(w, w, relu))
ϕ = Chain(Dense(w, w, relu), Dense(w, w, relu), Dense(w, d))
network = DeepSet(ψ, ϕ)

# Initialise the estimator
estimator = PosteriorEstimator(q, network)

# Train the estimator
estimator = train(estimator, sample, simulate, m = m)

# Inference with observed data 
θ = [0.8f0 0.1f0]'
Z = simulate(θ, m)
sampleposterior(estimator, Z) # posterior draws 
posteriormean(estimator, Z)   # point estimate

NeuralEstimators.RatioEstimator — Type

RatioEstimator <: NeuralEstimator
RatioEstimator(network)

A neural estimator that estimates the likelihood-to-evidence ratio,

\[r(\boldsymbol{Z}, \boldsymbol{\theta}) \equiv p(\boldsymbol{Z} \mid \boldsymbol{\theta})/p(\boldsymbol{Z}),\]

where $p(\boldsymbol{Z} \mid \boldsymbol{\theta})$ is the likelihood and $p(\boldsymbol{Z})$ is the marginal likelihood, also known as the model evidence.

For numerical stability, training is done on the log-scale using the relation $\log r(\boldsymbol{Z}, \boldsymbol{\theta}) = \text{logit}(c^*(\boldsymbol{Z}, \boldsymbol{\theta}))$, where $c^*(\cdot, \cdot)$ denotes the Bayes classifier as described in the Methodology section. Hence, the neural network should be a mapping from $\mathcal{Z} \times \Theta$ to $\mathbb{R}$, where $\mathcal{Z}$ and $\Theta$ denote the sample and parameter spaces, respectively. The neural network must implement a method network(::Tuple), where the first element of the tuple contains the data sets and the second element contains the parameter matrices.

When the neural network is a DeepSet, two requirements must be met. First, the number of input neurons in the first layer of the outer network must equal $d$ plus the number of output neurons in the final layer of the inner network. Second, the number of output neurons in the final layer of the outer network must be one.

When applying the estimator to data Z, by default the likelihood-to-evidence ratio $r(\boldsymbol{Z}, \boldsymbol{\theta})$ is returned (setting the keyword argument classifier = true will yield class probability estimates). The estimated ratio can then be used in various Bayesian (e.g., Hermans et al., 2020) or frequentist (e.g., Walchessen et al., 2024) inferential algorithms.

Examples

using NeuralEstimators, Flux

# Data Z|μ,σ ~ N(μ, σ²) with priors μ ~ U(0, 1) and σ ~ U(0, 1)
d = 2     # dimension of the parameter vector θ
n = 1     # dimension of each independent replicate of Z
m = 30    # number of independent replicates in each data set
sample(K) = rand32(d, K)
simulate(θ, m) = [ϑ[1] .+ ϑ[2] .* randn32(n, m) for ϑ in eachcol(θ)]

# Neural network
w = 128 
ψ = Chain(Dense(n, w, relu), Dense(w, w, relu), Dense(w, w, relu))
ϕ = Chain(Dense(w + d, w, relu), Dense(w, w, relu), Dense(w, 1))
network = DeepSet(ψ, ϕ)

# Initialise the estimator
r̂ = RatioEstimator(network)

# Train the estimator
r̂ = train(r̂, sample, simulate, m = m)

# Inference with "observed" data (grid-based optimisation and sampling)
θ = sample(1)
z = simulate(θ, m)[1]
θ_grid = f32(expandgrid(0:0.01:1, 0:0.01:1))'  # fine gridding of the parameter space
r̂(z, θ_grid)                                   # likelihood-to-evidence ratios over grid
mlestimate(r̂, z; θ_grid = θ_grid)              # maximum-likelihood estimate
posteriormode(r̂, z; θ_grid = θ_grid)           # posterior mode 
sampleposterior(r̂, z; θ_grid = θ_grid)         # posterior samples

# Inference with "observed" data (gradient-based optimisation using Optim.jl)
using Optim
θ₀ = [0.5, 0.5]                                # initial estimate
mlestimate(r̂, z; θ₀ = θ₀)                      # maximum-likelihood estimate
posteriormode(r̂, z; θ₀ = θ₀)                   # posterior mode

NeuralEstimators.IntervalEstimator — Type

IntervalEstimator <: BayesEstimator
IntervalEstimator(u, v = u, c::Union{Function, Compress} = identity; probs = [0.025, 0.975], g = exp)
IntervalEstimator(u, c::Union{Function, Compress}; probs = [0.025, 0.975], g = exp)

A neural estimator that jointly estimates marginal posterior credible intervals based on the probability levels probs (by default, 95% central credible intervals).

The estimator employs a representation that prevents quantile crossing. Specifically, given data $\boldsymbol{Z}$, it constructs intervals for each parameter $\theta_i$, $i = 1, \dots, d,$ of the form,

\[[c_i(u_i(\boldsymbol{Z})), \;\; c_i(u_i(\boldsymbol{Z})) + g(v_i(\boldsymbol{Z})))],\]

where $\boldsymbol{u}(⋅) \equiv (u_1(\cdot), \dots, u_d(\cdot))'$ and $\boldsymbol{v}(⋅) \equiv (v_1(\cdot), \dots, v_d(\cdot))'$ are neural networks that map from the sample space to $\mathbb{R}^d$; $g(\cdot)$ is a monotonically increasing function (e.g., exponential or softplus); and each $c_i(⋅)$ is a monotonically increasing function that maps its input to the prior support of $\theta_i$.

The functions $c_i(⋅)$ may be collectively defined by a $d$-dimensional Compress object, which can constrain the interval estimator's output to the prior support. If these functions are unspecified, they will be set to the identity function so that the range of the intervals will be unrestricted. If only a single neural-network architecture is provided, it will be used for both $\boldsymbol{u}(⋅)$ and $\boldsymbol{v}(⋅)$.

The return value when applied to data using estimate() is a matrix with $2d$ rows, where the first and second $d$ rows correspond to the lower and upper bounds, respectively. The function interval() can be used to format this output in a readable $d$ × 2 matrix.

See also QuantileEstimator.

Examples

using NeuralEstimators, Flux

# Data Z|μ,σ ~ N(μ, σ²) with priors μ ~ U(0, 1) and σ ~ U(0, 1)
d = 2     # dimension of the parameter vector θ
n = 1     # dimension of each independent replicate of Z
m = 100   # number of independent replicates
sample(K) = rand32(d, K)
simulate(θ, m) = [ϑ[1] .+ ϑ[2] .* randn(n, m) for ϑ in eachcol(θ)]

# Neural network
w = 128   # width of each hidden layer
ψ = Chain(Dense(n, w, relu), Dense(w, w, relu))
ϕ = Chain(Dense(w, w, relu), Dense(w, d))
u = DeepSet(ψ, ϕ)

# Initialise the estimator
estimator = IntervalEstimator(u)

# Train the estimator
estimator = train(estimator, sample, simulate, m = m)

# Inference with "observed" data 
θ = [0.8f0; 0.1f0]
Z = simulate(θ, m)
estimate(estimator, Z) 
interval(estimator, Z)

NeuralEstimators.QuantileEstimator — Type

QuantileEstimator <: BayesEstimator
QuantileEstimator(v; probs = [0.025, 0.5, 0.975], g = Flux.softplus, i = nothing)

A neural estimator that jointly estimates a fixed set of marginal posterior quantiles, with probability levels $\{\tau_1, \dots, \tau_T\}$ controlled by the keyword argument probs. This generalises IntervalEstimator to support an arbitrary number of probability levels.

Given data $\boldsymbol{Z}$, by default the estimator approximates quantiles of the distributions of

\[\theta_i \mid \boldsymbol{Z}, \quad i = 1, \dots, d, \]

for parameters $\boldsymbol{\theta} \equiv (\theta_1, \dots, \theta_d)'$. Alternatively, if initialised with i set to a positive integer, the estimator approximates quantiles of the full conditional distribution of

\[\theta_i \mid \boldsymbol{Z}, \boldsymbol{\theta}_{-i},\]

where $\boldsymbol{\theta}_{-i}$ denotes the parameter vector with its $i$th element removed.

The estimator employs a representation that prevents quantile crossing, namely,

\[\begin{aligned} \boldsymbol{q}^{(\tau_1)}(\boldsymbol{Z}) &= \boldsymbol{v}^{(\tau_1)}(\boldsymbol{Z}),\\ \boldsymbol{q}^{(\tau_t)}(\boldsymbol{Z}) &= \boldsymbol{v}^{(\tau_1)}(\boldsymbol{Z}) + \sum_{j=2}^t g(\boldsymbol{v}^{(\tau_j)}(\boldsymbol{Z})), \quad t = 2, \dots, T, \end{aligned}\]

where $\boldsymbol{q}^{(\tau)}(\boldsymbol{Z})$ denotes the vector of $\tau$-quantiles for parameters $\boldsymbol{\theta} \equiv (\theta_1, \dots, \theta_d)'$; $\boldsymbol{v}^{(\tau_t)}(\cdot)$, $t = 1, \dots, T$, are neural networks that map from the sample space to $\mathbb{R}^d$; and $g(\cdot)$ is a monotonically increasing function (e.g., exponential or softplus) applied elementwise to its arguments. If g = nothing, the quantiles are estimated independently through the representation

\[\boldsymbol{q}^{(\tau_t)}(\boldsymbol{Z}) = \boldsymbol{v}^{(\tau_t)}(\boldsymbol{Z}), \quad t = 1, \dots, T.\]

When the neural networks are DeepSet objects, two requirements must be met. First, the number of input neurons in the first layer of the outer network must equal the number of neurons in the final layer of the inner network plus $\text{dim}(\boldsymbol{\theta}_{-i})$, where we define $\text{dim}(\boldsymbol{\theta}_{-i}) \equiv 0$ when targetting marginal posteriors of the form $\theta_i \mid \boldsymbol{Z}$ (the default behaviour). Second, the number of output neurons in the final layer of the outer network must equal $d - \text{dim}(\boldsymbol{\theta}_{-i})$.

The return value is a matrix with $\{d - \text{dim}(\boldsymbol{\theta}_{-i})\} \times T$ rows, where the first $T$ rows correspond to the estimated quantiles for the first parameter, the second $T$ rows corresponds to the estimated quantiles for the second parameter, and so on.

Examples

using NeuralEstimators, Flux

# Data Z|μ,σ ~ N(μ, σ²) with priors μ ~ U(0, 1) and σ ~ U(0, 1)
d = 2     # dimension of the parameter vector θ
n = 1     # dimension of each independent replicate of Z
m = 30    # number of independent replicates in each data set
sample(K) = rand32(d, K)
simulate(θ, m) = [ϑ[1] .+ ϑ[2] .* randn32(n, m) for ϑ in eachcol(θ)]

# ---- Quantiles of θᵢ ∣ 𝐙, i = 1, …, d ----

# Neural network
w = 64   # width of each hidden layer
ψ = Chain(Dense(n, w, relu), Dense(w, w, relu))
ϕ = Chain(Dense(w, w, relu), Dense(w, d))
v = DeepSet(ψ, ϕ)

# Initialise the estimator
estimator = QuantileEstimator(v)

# Train the estimator
estimator = train(estimator, sample, simulate, m = m)

# Inference with "observed" data 
θ = [0.8f0; 0.1f0]
Z = simulate(θ, m)
estimate(estimator, Z) 

# ---- Quantiles of θᵢ ∣ 𝐙, θ₋ᵢ ----

# Neural network
w = 64  # width of each hidden layer
ψ = Chain(Dense(n, w, relu), Dense(w, w, relu))
ϕ = Chain(Dense(w + 1, w, relu), Dense(w, d - 1))
v = DeepSet(ψ, ϕ)

# Initialise estimators respectively targetting quantiles of μ∣Z,σ and σ∣Z,μ
q₁ = QuantileEstimator(v; i = 1)
q₂ = QuantileEstimator(v; i = 2)

# Train the estimators
q₁ = train(q₁, sample, simulate, m = m)
q₂ = train(q₂, sample, simulate, m = m)

# Estimate quantiles of μ∣Z,σ with σ = 0.5 and for many data sets
θ₋ᵢ = 0.5f0
q₁(Z, θ₋ᵢ)

# Estimate quantiles of μ∣Z,σ with σ = 0.5 for a single data set
q₁(Z[1], θ₋ᵢ)

NeuralEstimators.PiecewiseEstimator — Type

PiecewiseEstimator <: NeuralEstimator
PiecewiseEstimator(estimators::Vector{BayesEstimator}, changepoints::Vector{Integer})

Creates a piecewise estimator (Sainsbury-Dale et al., 2024, Sec. 2.2.2) from a collection of neural Bayes estimators and sample-size changepoints.

This allows different estimators to be applied to different ranges of sample sizes. For instance, you may wish to use one estimator for small samples and another for larger ones. Given changepoints $m_1 < m_2 < \dots < m_{l-1}$, the piecewise estimator selects from $l$ trained estimators based on the observed sample size $m$ as follows:

\[\hat{\boldsymbol{\theta}}(\boldsymbol{Z}) = \begin{cases} \hat{\boldsymbol{\theta}}_1(\boldsymbol{Z}) & m \leq m_1,\\ \hat{\boldsymbol{\theta}}_2(\boldsymbol{Z}) & m_1 < m \leq m_2,\\ \quad \vdots \\ \hat{\boldsymbol{\theta}}_l(\boldsymbol{Z}) & m > m_{l-1}. \end{cases}\]

where $\hat{\boldsymbol{\theta}}_1(\cdot)$ is a neural Bayes estimator trained to be near-optimal over the range of sample sizes in which it is applied.

Although this strategy requires training multiple neural networks, it is computationally efficient in practice when combined with pre-training (see Sainsbury-Dale at al., 2024, Sec 2.3.3), which can be automated using trainmultiple().

Examples

using NeuralEstimators, Flux

n = 2    # bivariate data
d = 3    # dimension of parameter vector 
w = 128  # width of each hidden layer

# Small-sample estimator
ψ₁ = Chain(Dense(n, w, relu), Dense(w, w, relu));
ϕ₁ = Chain(Dense(w, w, relu), Dense(w, d));
θ̂₁ = PointEstimator(DeepSet(ψ₁, ϕ₁))

# Large-sample estimator
ψ₂ = Chain(Dense(n, w, relu), Dense(w, w, relu));
ϕ₂ = Chain(Dense(w, w, relu), Dense(w, d));
θ̂₂ = PointEstimator(DeepSet(ψ₂, ϕ₂))

# Piecewise estimator with changepoint m=30
θ̂ = PiecewiseEstimator([θ̂₁, θ̂₂], 30)

# Apply the (untrained) piecewise estimator to data
Z = [rand(n, m) for m ∈ (10, 50)]
estimate(θ̂, Z)

NeuralEstimators.Ensemble — Type

Ensemble <: NeuralEstimator
Ensemble(estimators)
Ensemble(architecture::Function, J::Integer)
(ensemble::Ensemble)(Z; aggr = median)

Defines an ensemble of estimators which, when applied to data Z, returns the median (or another summary defined by aggr) of the individual estimates (see, e.g., Sainsbury-Dale et al., 2025, Sec. S3).

The ensemble can be initialised with a collection of trained estimators and then applied immediately to observed data. Alternatively, the ensemble can be initialised with a collection of untrained estimators (or a function defining the architecture of each estimator, and the number of estimators in the ensemble), trained with train(), and then applied to observed data. In the latter case, where the ensemble is trained directly, if savepath is specified both the ensemble and component estimators will be saved.

Note that train() currently acts sequentially on the component estimators, using the Adam optimiser.

The ensemble components can be accessed by indexing the ensemble; the number of component estimators can be obtained using length().

See also Parallel, which can be used to mimic ensemble methods with an appropriately chosen connection.

Examples

using NeuralEstimators, Flux

# Data Z|θ ~ N(θ, 1) with θ ~ N(0, 1)
d = 1     # dimension of the parameter vector θ
n = 1     # dimension of each independent replicate of Z
m = 30    # number of independent replicates in each data set
sampler(K) = randn32(d, K)
simulator(θ, m) = [μ .+ randn32(n, m) for μ ∈ eachcol(θ)]

# Neural-network architecture of each ensemble component
function architecture()
	ψ = Chain(Dense(n, 64, relu), Dense(64, 64, relu))
	ϕ = Chain(Dense(64, 64, relu), Dense(64, d))
	network = DeepSet(ψ, ϕ)
	PointEstimator(network)
end

# Initialise ensemble with three component estimators 
ensemble = Ensemble(architecture, 3)
ensemble[1]      # access component estimators by indexing
ensemble[1:2]    # indexing with an iterable collection returns the corresponding ensemble 
length(ensemble) # number of component estimators

# Training
ensemble = train(ensemble, sampler, simulator, m = m, epochs = 5)

# Assessment
θ = sampler(1000)
Z = simulator(θ, m)
assessment = assess(ensemble, θ, Z)
rmse(assessment)

# Apply to data
ensemble(Z)

Training

The function train() is used to train a single neural estimator, while the wrapper function trainmultiple() is useful for training multiple neural estimators over a range of sample sizes, making using of the technique known as pre-training.

NeuralEstimators.train — Function

train(estimator, sampler::Function, simulator::Function; ...)
train(estimator, θ_train::P, θ_val::P, simulator::Function; ...) where {P <: Union{AbstractMatrix, ParameterConfigurations}}
train(estimator, θ_train::P, θ_val::P, Z_train::T, Z_val::T; ...) where {T, P <: Union{AbstractMatrix, ParameterConfigurations}}

Trains a neural estimator.

The methods cater for different variants of "on-the-fly" simulation. Specifically, a sampler can be provided to continuously sample new parameter vectors from the prior, and a simulator can be provided to continuously simulate new data conditional on the parameters. If provided with specific sets of parameters (θ_train and θ_val) and/or data (Z_train and Z_val), they will be held fixed during training.

In all methods, the validation parameters and data are held fixed.

The estimator is returned on the CPU so that it can be easily saved post training.

Keyword arguments common to all methods:

loss = mae (applicable only to PointEstimator): loss function used to train the neural network. In addition to the standard loss functions provided by Flux (e.g., mae, mse), see Loss functions for further options.
epochs = 100: number of epochs to train the neural network. An epoch is one complete pass through the entire training data set when doing stochastic gradient descent.
stopping_epochs = 5: cease training if the risk does not improve in this number of epochs.
batchsize = 32: the batchsize to use when performing stochastic gradient descent, that is, the number of training samples processed between each update of the neural-network parameters.
optimiser = Flux.setup(Adam(5e-4), estimator): any Optimisers.jl optimisation rule for updating the neural-network parameters. When the training data and/or parameters are held fixed, one may wish to employ regularisation to prevent overfitting; for example, optimiser = Flux.setup(OptimiserChain(WeightDecay(1e-4), Adam()), estimator), which corresponds to L₂ regularisation with penalty coefficient λ=10⁻⁴.
lr_schedule::Union{Nothing, ParameterSchedulers.AbstractSchedule}: defines the learning-rate schedule for adaptively changing the learning rate during training. Accepts either a ParameterSchedulers.jl object or nothing for a fixed learning rate. By default, it uses CosAnneal with a maximum set to the initial learning rate from optimiser, a minimum of zero, and a period equal to the number of epochs. The learning rate is updated at the end of each epoch.
use_gpu = true: flag indicating whether to use a GPU if one is available.
savepath::Union{Nothing, String} = nothing: path to save the trained estimator and other information; if nothing (default), nothing is saved. Otherwise, the neural-network parameters (i.e., the weights and biases) will be saved during training as bson files; the risk function evaluated over the training and validation sets will also be saved, in the first and second columns of loss_per_epoch.csv, respectively; the best parameters (as measured by validation risk) will be saved as best_network.bson.
verbose = true: flag indicating whether information, including empirical risk values and timings, should be printed to the console during training.

Keyword arguments common to train(estimator, sampler, simulator) and train(estimator, θ_train, θ_val, simulator):

m = nothing: arguments to the simulator (e.g., the number of replicates in each data set). The simulator is called as simulator(θ, m) if m is given and as simulator(θ) otherwise.
epochs_per_Z_refresh = 1: the number of passes to make through the training set before the training data are refreshed.
simulate_just_in_time = false: flag indicating whether we should simulate just-in-time, in the sense that only a batchsize number of parameter vectors and corresponding data are in memory at a given time.

Keyword arguments unique to train(estimator, sampler, simulator):

K = 10000: number of parameter vectors in the training set.
K_val = K ÷ 5 number of parameter vectors in the validation set.
ξ = nothing: an arbitrary collection of objects that, if provided, will be passed to the parameter sampler as sampler(K, ξ); otherwise, the parameter sampler will be called as sampler(K). Can also be provided as xi.
epochs_per_θ_refresh = 1: the number of passes to make through the training set before the training parameters are refreshed. Must be a multiple of epochs_per_Z_refresh. Can also be provided as epochs_per_theta_refresh.

Examples

using NeuralEstimators, Flux

# Data Z|μ,σ ~ N(μ, σ²) with priors μ ~ N(0, 1) and σ ~ U(0, 1)
function sampler(K)
	μ = randn(K) # Gaussian prior
	σ = rand(K)  # Uniform prior
	θ = vcat(μ', σ')
	return θ
end
function simulator(θ, m)
	[ϑ[1] .+ ϑ[2] * randn(1, m) for ϑ ∈ eachcol(θ)]
end

# Neural network 
d = 2     # dimension of the parameter vector θ
n = 1     # dimension of each independent replicate of Z
w = 128   # width of each hidden layer 
ψ = Chain(Dense(n, w, relu), Dense(w, w, relu))
ϕ = Chain(Dense(w, w, relu), Dense(w, d))
network = DeepSet(ψ, ϕ)

# Initialise the estimator
estimator = PointEstimator(network)

# Number of independent replicates to use during training
m = 15

# Training: simulation on-the-fly
estimator  = train(estimator, sampler, simulator, m = m)

# Training: simulation on-the-fly with fixed parameters
K = 10000
θ_train = sampler(K)
θ_val   = sampler(K)
estimator = train(estimator, θ_train, θ_val, simulator, m = m)

# Training: fixed parameters and fixed data
Z_train   = simulator(θ_train, m)
Z_val     = simulator(θ_val, m)
estimator = train(estimator, θ_train, θ_val, Z_train, Z_val)

NeuralEstimators.trainmultiple — Function

trainmultiple(estimator, sampler::Function, simulator::Function, m::Vector{Integer}; ...)
trainmultiple(estimator, θ_train, θ_val, simulator::Function, m::Vector{Integer}; ...)
trainmultiple(estimator, θ_train, θ_val, Z_train, Z_val, m::Vector{Integer}; ...)
trainmultiple(estimator, θ_train, θ_val, Z_train::V, Z_val::V; ...) where {V <: AbstractVector{AbstractVector{Any}}}

A wrapper around train() to construct multiple neural estimators for different sample sizes m.

The positional argument m specifies the desired sample sizes. Each estimator is pre-trained with the estimator for the previous sample size (see Sainsbury-Dale at al., 2024, Sec 2.3.3). For example, if m = [m₁, m₂], the estimator for sample size m₂ is pre-trained with the estimator for sample size m₁.

The method for Z_train and Z_val subsets the data using subsetdata(Z, 1:mᵢ) for each mᵢ ∈ m. The method for Z_train::V and Z_val::V trains an estimator for each element of Z_train::V and Z_val::V and, hence, it does not need to invoke subsetdata(), which can be slow or difficult to define in some cases (e.g., for graphical data). Note that, in this case, m is inferred from the data.

The keyword arguments inherit from train(). The keyword arguments epochs, batchsize, stopping_epochs, and optimiser can each be given as vectors. For example, if training two estimators, one may use a different number of epochs for each estimator by providing epochs = [epoch₁, epoch₂].

The function returns a vector of neural estimators, each corresponding to a sample size in m.

See also PiecewiseEstimator.

Assessment/calibration

NeuralEstimators.assess — Function

assess(estimator, θ, Z; ...)
assess(estimators::Vector, θ, Z; ...)

Assesses an estimator (or a collection of estimators) based on true parameters θ and corresponding simulated data Z.

The parameters θ should be given as a $d$ × $K$ matrix, where $d$ is the parameter dimension and $K$ is the number of sampled parameter vectors.

When Z contain more simulated data sets than the number $K$ of sampled parameter vectors, θ will be recycled via horizontal concatenation: θ = repeat(θ, outer = (1, J)), where J = numobs(Z) ÷ K is the number of simulated data sets for each parameter vector. This allows assessment of the estimator's sampling distribution under fixed parameters.

The return value is of type Assessment.

Keyword arguments

parameter_names::Vector{String}: names of the parameters (sensible defaults provided).
estimator_names::Vector{String}: names of the estimators (sensible defaults provided).
use_gpu = true: Bool or collection of Bool objects with length equal to the number of estimators.
probs = nothing (applicable only to PointEstimator): probability levels taking values between 0 and 1. By default, no bootstrap uncertainty quantification is done; if probs is provided, it must be a two-element vector specifying the lower and upper probability levels for the non-parametric bootstrap intervals (note that parametric bootstrap is not currently supported with assess()).
B::Integer = 400 (applicable only to PointEstimator): number of bootstrap samples.

NeuralEstimators.Assessment — Type

Assessment(df::DataFrame, runtime::DataFrame)

A type for storing the output of assess(). The field runtime contains the total time taken for each estimator. The field df is a long-form DataFrame with columns:

estimator: the name of the estimator
parameter: the name of the parameter
truth: the true value of the parameter
estimate: the estimated value of the parameter
m: the sample size (number of exchangeable replicates) for the given data set
k: the index of the parameter vector
j: the index of the data set (in the case that multiple data sets are associated with each parameter vector)

If the estimator is an IntervalEstimator, the column estimate will be replaced by the columns lower and upper, containing the lower and upper bounds of the interval, respectively.

If the estimator is a QuantileEstimator, there will also be a column prob indicating the probability level of the corresponding quantile estimate.

Use merge() to combine assessments from multiple estimators of the same type or join() to combine assessments from a PointEstimator and an IntervalEstimator.

NeuralEstimators.risk — Function

risk(assessment::Assessment; ...)

Computes a Monte Carlo approximation of an estimator's Bayes risk,

\[r(\hat{\boldsymbol{\theta}}(\cdot)) \approx \frac{1}{K} \sum_{k=1}^K L(\boldsymbol{\theta}^{(k)}, \hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)})),\]

where $\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}$ denotes a set of $K$ parameter vectors sampled from the prior and, for each $k$, data $\boldsymbol{Z}^{(k)}$ are simulated from the statistical model conditional on $\boldsymbol{\theta}^{(k)}$.

Keyword arguments

loss = (x, y) -> abs(x - y): a binary operator defining the loss function (default absolute-error loss).
average_over_parameters::Bool = false: if true, the loss is averaged over all parameters; otherwise (default), the loss is averaged over each parameter separately.
average_over_sample_sizes::Bool = true: if true (default), the loss is averaged over all sample sizes $m$; otherwise, the loss is averaged over each sample size separately.

NeuralEstimators.bias — Function

bias(assessment::Assessment; ...)

Computes a Monte Carlo approximation of an estimator's bias,

\[{\textrm{bias}}(\hat{\boldsymbol{\theta}}(\cdot)) \approx \frac{1}{K} \sum_{k=1}^K \{\hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)}) - \boldsymbol{\theta}^{(k)}\},\]

where $\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}$ denotes a set of $K$ parameter vectors sampled from the prior and, for each $k$, data $\boldsymbol{Z}^{(k)}$ are simulated from the statistical model conditional on $\boldsymbol{\theta}^{(k)}$.

This function inherits the keyword arguments of risk (excluding the argument loss).

NeuralEstimators.rmse — Function

rmse(assessment::Assessment; ...)

Computes a Monte Carlo approximation of an estimator's root-mean-squared error,

\[{\textrm{rmse}}(\hat{\boldsymbol{\theta}}(\cdot)) \approx \sqrt{\frac{1}{K} \sum_{k=1}^K \{\hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)}) - \boldsymbol{\theta}^{(k)}\}^2},\]

where $\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}$ denotes a set of $K$ parameter vectors sampled from the prior and, for each $k$, data $\boldsymbol{Z}^{(k)}$ are simulated from the statistical model conditional on $\boldsymbol{\theta}^{(k)}$.

This function inherits the keyword arguments of risk (excluding the argument loss).

NeuralEstimators.coverage — Function

coverage(assessment::Assessment; ...)

Computes a Monte Carlo approximation of an interval estimator's expected coverage, as defined in Hermans et al. (2022, Definition 2.1), and the proportion of parameters below and above the lower and upper bounds, respectively.

Keyword arguments

average_over_parameters::Bool = false: if true, the coverage is averaged over all parameters; otherwise (default), it is computed over each parameter separately.
average_over_sample_sizes::Bool = true: if true (default), the coverage is averaged over all sample sizes $m$; otherwise, it is computed over each sample size separately.

Inference with observed data

The following functions facilitate the use of a trained neural estimator with observed data.

NeuralEstimators.estimate — Function

estimate(estimator, Z; batchsize::Integer = 32, use_gpu::Bool = true, kwargs...)

Applies estimator to data Z.

If given multiple data sets, the estimator is applied over minibatches of size batchsize, which can prevent memory issues with large data sets.

NeuralEstimators.bootstrap — Function

bootstrap(estimator::PointEstimator, parameters::P, Z; use_gpu = true) where P <: Union{AbstractMatrix, ParameterConfigurations}
bootstrap(estimator::PointEstimator, parameters::P, simulator, m::Integer; B = 400, use_gpu = true) where P <: Union{AbstractMatrix, ParameterConfigurations}
bootstrap(estimator::PointEstimator, Z; B = 400, blocks = nothing, trim = true, use_gpu = true)

Generates B bootstrap estimates using estimator.

Parametric bootstrapping is facilitated by passing a single parameter configuration, parameters, and corresponding simulated data, Z, whose length implicitly defines B. Alternatively, one may provide a simulator and the desired sample size, in which case the data will be simulated using simulator(parameters, m).

Non-parametric bootstrapping is facilitated by passing a single data set, Z. The argument blocks caters for block bootstrapping, and it should be a vector of integers specifying the block for each replicate. For example, with 5 replicates, the first two corresponding to block 1 and the remaining three corresponding to block 2, blocks should be [1, 1, 2, 2, 2]. The resampling algorithm generates resampled data sets by sampling blocks with replacement. If trim = true, the final block is trimmed as needed to ensure that the resampled data set matches the original size of Z.

The return type is a $d$ × B matrix, where $d$ is the dimension of the parameter vector.

NeuralEstimators.interval — Function

interval(θ::Matrix; probs = [0.05, 0.95], parameter_names = nothing)
interval(estimator::IntervalEstimator, Z; parameter_names = nothing, use_gpu = true)

Computes a confidence/credible interval based either on a $d$ × $B$ matrix θ of parameters (typically containing bootstrap estimates or posterior draws), where $d$ denotes the number of parameters to make inference on, or from an IntervalEstimator and data Z.

When given θ, the intervals are constructed by computing quantiles with probability levels controlled by the keyword argument probs.

The return type is a $d$ × 2 matrix, whose first and second columns respectively contain the lower and upper bounds of the interval. The rows of this matrix can be named by passing a vector of strings to the keyword argument parameter_names.

NeuralEstimators.sampleposterior — Function

sampleposterior(estimator::PosteriorEstimator, Z, N::Integer = 1000)
sampleposterior(estimator::RatioEstimator, Z, N::Integer = 1000; θ_grid, prior::Function = θ -> 1f0)

Samples from the approximate posterior distribution implied by estimator.

The positional argument N controls the size of the posterior sample.

If Z represents a single data set as determined by Flux.numobs(), returns a $d$ × N matrix of posterior samples, where $d$ is the dimension of the parameter vector. Otherwise, if Z contains multiple data sets, a vector of matrices will be returned.

When using a RatioEstimator, the prior distribution $p(\boldsymbol{\theta})$ is controlled through the keyword argument prior (by default, a uniform prior is used). The sampling algorithm is based on a fine-gridding of the parameter space, specified through the keyword argument θ_grid. The approximate posterior density is evaluated over this grid, which is then used to draw samples. This is effective when making inference with a small number of parameters. For models with a large number of parameters, other sampling algorithms (e.g., MCMC) may be needed (please contact the package maintainer).

NeuralEstimators.posteriormean — Function

posteriormean(θ::AbstractMatrix)	
posteriormean(estimator::Union{PosteriorEstimator, RatioEstimator}, Z, N::Integer = 1000; kwargs...)

Computes the posterior mean based either on a $d$ × $N$ matrix θ of posterior draws, where $d$ denotes the number of parameters to make inference on, or directly from an estimator that allows for posterior sampling via sampleposterior().

See also posteriormedian(), posteriormode().

NeuralEstimators.posteriormedian — Function

posteriormedian(θ::AbstractMatrix)	
posteriormedian(estimator::Union{PosteriorEstimator, RatioEstimator}, Z, N::Integer = 1000; kwargs...)

Computes the vector of marginal posterior medians based either on a $d$ × $N$ matrix θ of posterior draws, where $d$ denotes the number of parameters to make inference on, or directly from an estimator that allows for posterior sampling via sampleposterior().

See also posteriormean(), posteriorquantile().

NeuralEstimators.posteriorquantile — Function

posteriorquantile(θ::AbstractMatrix, probs)	
posteriormedian(estimator::Union{PosteriorEstimator, RatioEstimator}, Z, probs, N::Integer = 1000; kwargs...)

Computes the vector of marginal posterior quantiles with (a collection of) probability levels probs, based either on a $d$ × $N$ matrix θ of posterior draws, where $d$ denotes the number of parameters to make inference on, or directly from an estimator that allows for posterior sampling via sampleposterior().

The return value is a $d$ × length(probs) matrix.

See also posteriormedian().

NeuralEstimators.posteriormode — Function

posteriormode(estimator::RatioEstimator, Z; θ₀ = nothing, θ_grid = nothing, prior::Function = θ -> 1, use_gpu = true)

Computes the (approximate) posterior mode (maximum a posteriori estimate) given data $\boldsymbol{Z}$,

\[\underset{\boldsymbol{\theta}}{\mathrm{arg\,max\;}} \ell(\boldsymbol{\theta} ; \boldsymbol{Z}) + \log p(\boldsymbol{\theta}),\]

where $\ell(\cdot ; \cdot)$ denotes the approximate log-likelihood function implied by estimator, and $p(\boldsymbol{\theta})$ denotes the prior density function controlled through the keyword argument prior. Note that this estimate can be viewed as an approximate maximum penalised likelihood estimate, with penalty term $p(\boldsymbol{\theta})$.

If a vector θ₀ of initial parameter estimates is given, the approximate posterior density is maximised by gradient descent (requires Optim.jl to be loaded). Otherwise, if a matrix of parameters θ_grid is given, the approximate posterior density is maximised by grid search.

See also posteriormedian(), posteriormean().