Core

This page documents the classes and functions that are central to the workflow of NeuralEstimators. Its organisation reflects the order in which these classes and functions appear in a standard implementation: from sampling parameters from the prior distribution, to making inference with observed data.

Sampling parameters

Parameters sampled from the prior distribution are stored as a $d \times K$ matrix, where $d$ is the dimension of the parameter vector to make inference on and $K$ is the number of sampled parameter vectors.

It can sometimes be helpful to wrap the parameter matrix in a user-defined type that also stores expensive intermediate objects needed for data simulated (e.g., Cholesky factors). The user-defined type should be a subtype of ParameterConfigurations, whose only requirement is a field θ that stores the matrix of parameters. See Storing expensive intermediate objects for data simulation for further discussion.

NeuralEstimators.ParameterConfigurationsType
ParameterConfigurations

An abstract supertype for user-defined types that store parameters and any intermediate objects needed for data simulation.

The user-defined type must have a field θ that stores the $d$ × $K$ matrix of parameters, where $d$ is the dimension of the parameter vector to make inference on and $K$ is the number of sampled parameter vectors. There are no other requirements.

See subsetparameters() for the generic function for subsetting these objects.

Examples

struct P <: ParameterConfigurations
	θ
	# other expensive intermediate objects...
end
source

Simulating data

The package accommodates any model for which simulation is feasible by allowing users to define their model implicitly through simulated data.

Simulated data sets are stored as mini-batches in a format amenable to the chosen neural-network architecture. For example, when constructing an estimator from data collected over a grid, one may use a generic CNN, with each data set stored in the final dimension of a four-dimensional array. When performing inference from replicated data, a DeepSet architecture may be used, where simulated data sets are stored in a vector, and conditionally independent replicates are stored as mini-batches within each element of the vector.

Estimators

The package provides several classes of neural estimators, organised within a type hierarchy. At the top-level of the hierarchy is NeuralEstimator, an abstract supertype for all neural estimators in the package.

Neural Bayes estimators are implemented as subtypes of the abstract supertype BayesEstimator. The simple type PointEstimator is used for constructing neural Bayes estimators under general, user-defined loss functions. Several specialised types cater for the estimation of posterior quantiles based on the quantile loss function: see IntervalEstimator and its generalisation QuantileEstimator for estimating posterior quantiles for a fixed set of probability levels.

The type PosteriorEstimator can be used to approximate the posterior distribution, and RatioEstimator can be used to approximate the likelihood-to-evidence ratio.

Finally, Ensemble combines multiple estimators, aggregating their individual estimates to improve accuracy.

NeuralEstimators.PointEstimatorType
PointEstimator <: BayesEstimator
PointEstimator(network)

A neural point estimator, where the neural network is a mapping from the sample space to the parameter space.

source
NeuralEstimators.PosteriorEstimatorType
PosteriorEstimator <: NeuralEstimator
PosteriorEstimator(network, q::ApproximateDistribution)
PosteriorEstimator(network, d::Integer, dstar::Integer = d; q::ApproximateDistribution = NormalisingFlow, kwargs...)

A neural estimator that approximates the posterior distribution $p(\boldsymbol{\theta} \mid \boldsymbol{Z})$, based on a neural network and an approximate distribution q (see the available in-built Approximate distributions).

The neural network is a mapping from the sample space to $\mathbb{R}^{d^*}$, where $d^*$ is a user-specified number of summary statistics. The learned summary statistics are then transformed into parameters $\boldsymbol{\kappa} \in \mathcal{K}$ of the approximate distribution using a conventional multilayer perceptron (MLP).

The convenience constructor PosteriorEstimator(network, d, dstar; q::ApproximateDistribution) builds the approximate distribution q internally, with the keyword arguments passed onto the constructor of q.

Examples

using NeuralEstimators, Flux, CairoMakie

# Data Z|μ,σ ~ N(μ, σ²) with priors μ ~ U(0, 1) and σ ~ U(0, 1)
d = 2     # dimension of the parameter vector θ
n = 1     # dimension of each independent replicate of Z
m = 50    # number of independent replicates in each data set
sample(K) = rand32(d, K)
simulate(θ, m) = [ϑ[1] .+ ϑ[2] .* randn32(n, m) for ϑ in eachcol(θ)]

# Distribution used to approximate the posterior 
q = NormalisingFlow(d, d) 

# Neural network (outputs d summary statistics)
w = 128   
ψ = Chain(Dense(n, w, relu), Dense(w, w, relu), Dense(w, w, relu))
ϕ = Chain(Dense(w, w, relu), Dense(w, w, relu), Dense(w, d))
network = DeepSet(ψ, ϕ)

estimator = PosteriorEstimator(network, GaussianMixture(d, d))

# Initialise the estimator
estimator = PosteriorEstimator(network, q)

# Train the estimator
estimator = train(estimator, sample, simulate, simulator_args = m, K = 3000)

# Plot the risk history
plotrisk()

# Assess the estimator
θ_test = sample(500)
Z_test = simulate(θ_test, m);
assessment = assess(estimator, θ_test, Z_test)
plot(assessment)

# Inference with observed data 
θ = [0.8f0 0.1f0]'
Z = simulate(θ, m)
sampleposterior(estimator, Z) # posterior draws 
posteriormean(estimator, Z)   # point estimate
source
NeuralEstimators.RatioEstimatorType
RatioEstimator <: NeuralEstimator
RatioEstimator(network)

A neural estimator that estimates the likelihood-to-evidence ratio,

\[r(\boldsymbol{Z}, \boldsymbol{\theta}) \equiv p(\boldsymbol{Z} \mid \boldsymbol{\theta})/p(\boldsymbol{Z}),\]

where $p(\boldsymbol{Z} \mid \boldsymbol{\theta})$ is the likelihood and $p(\boldsymbol{Z})$ is the marginal likelihood, also known as the model evidence.

For numerical stability, training is done on the log-scale using the relation $\log r(\boldsymbol{Z}, \boldsymbol{\theta}) = \text{logit}(c^*(\boldsymbol{Z}, \boldsymbol{\theta}))$, where $c^*(\cdot, \cdot)$ denotes the Bayes classifier as described in the methodology section. Hence, the neural network should be a mapping from $\mathcal{Z} \times \Theta$ to $\mathbb{R}$, where $\mathcal{Z}$ and $\Theta$ denote the sample and parameter spaces, respectively.

Network input

The neural network must implement a method network(::Tuple), where the first element of the tuple contains the data sets and the second element contains the parameter matrices.

When the neural network is a DeepSet (which implements the above method), two requirements must be met. First, the number of input neurons in the first layer of the outer network must equal $d$ plus the number of output neurons in the final layer of the inner network. Second, the number of output neurons in the final layer of the outer network must be one.

When applying the estimator to data, the log of the likelihood-to-evidence ratio is returned. The estimated ratio can then be used in various Bayesian (e.g., Hermans et al., 2020) or frequentist (e.g., Walchessen et al., 2024) inferential algorithms.

Examples

using NeuralEstimators, Flux, CairoMakie

# Data Z|μ,σ ~ N(μ, σ²) with priors μ ~ U(0, 1) and σ ~ U(0, 1)
d = 2     # dimension of the parameter vector θ
n = 1     # dimension of each independent replicate of Z
m = 30    # number of independent replicates in each data set
sample(K) = rand32(d, K)
simulate(θ, m) = [ϑ[1] .+ ϑ[2] .* randn32(n, m) for ϑ in eachcol(θ)]

# Neural network
w = 128 
ψ = Chain(Dense(n, w, relu), Dense(w, w, relu), Dense(w, w, relu))
ϕ = Chain(Dense(w + d, w, relu), Dense(w, w, relu), Dense(w, 1))
network = DeepSet(ψ, ϕ)

# Initialise the estimator
r̂ = RatioEstimator(network)

# Train the estimator
r̂ = train(r̂, sample, simulate, simulator_args = m, K = 1000)

# Plot the risk history
plotrisk()

# Assess the estimator
θ_test = sample(500)
Z_test = simulate(θ_test, m);
θ_grid = f32(expandgrid(0:0.01:1, 0:0.01:1))'  # fine gridding of the parameter space
assessment = assess(r̂, θ_test, Z_test; θ_grid = θ_grid)
plot(assessment)

# Generate "observed" data 
θ = sample(1)
z = simulate(θ, 200)

# Grid-based optimization and sampling
estimate(r̂, z, θ_grid)                         # log of likelihood-to-evidence ratios
posteriormode(r̂, z; θ_grid = θ_grid)           # posterior mode 
sampleposterior(r̂, z; θ_grid = θ_grid)         # posterior samples

# Gradient-based optimization
using Optim
θ₀ = [0.5, 0.5]                                # initial estimate
posteriormode(r̂, z; θ₀ = θ₀)                   # posterior mode 
source
NeuralEstimators.IntervalEstimatorType
IntervalEstimator <: BayesEstimator
IntervalEstimator(u, v = u, c::Union{Function, Compress} = identity; probs = [0.025, 0.975], g = exp)
IntervalEstimator(u, c::Union{Function, Compress}; probs = [0.025, 0.975], g = exp)

A neural estimator that jointly estimates marginal posterior credible intervals based on the probability levels probs (by default, 95% central credible intervals).

The estimator employs a representation that prevents quantile crossing. Specifically, given data $\boldsymbol{Z}$, it constructs intervals for each parameter $\theta_i$, $i = 1, \dots, d,$ of the form,

\[[c_i(u_i(\boldsymbol{Z})), \;\; c_i(u_i(\boldsymbol{Z})) + g(v_i(\boldsymbol{Z})))],\]

where $\boldsymbol{u}(⋅) \equiv (u_1(\cdot), \dots, u_d(\cdot))'$ and $\boldsymbol{v}(⋅) \equiv (v_1(\cdot), \dots, v_d(\cdot))'$ are neural networks that map from the sample space to $\mathbb{R}^d$; $g(\cdot)$ is a monotonically increasing function (e.g., exponential or softplus); and each $c_i(⋅)$ is a monotonically increasing function that maps its input to the prior support of $\theta_i$.

The functions $c_i(⋅)$ may be collectively defined by a $d$-dimensional Compress object, which can constrain the interval estimator's output to the prior support. If these functions are unspecified, they will be set to the identity function so that the range of the intervals will be unrestricted. If only a single neural-network architecture is provided, it will be used for both $\boldsymbol{u}(⋅)$ and $\boldsymbol{v}(⋅)$.

The return value when applied to data using estimate() is a matrix with $2d$ rows, where the first and second $d$ rows correspond to the lower and upper bounds, respectively. The function interval() can be used to format this output in a readable $d$ × 2 matrix.

See also QuantileEstimator.

Examples

using NeuralEstimators, Flux, CairoMakie

# Data Z|μ,σ ~ N(μ, σ²) with priors μ ~ U(0, 1) and σ ~ U(0, 1)
d = 2     # dimension of the parameter vector θ
n = 1     # dimension of each independent replicate of Z
m = 100   # number of independent replicates
sample(K) = rand32(d, K)
simulate(θ, m) = [ϑ[1] .+ ϑ[2] .* randn(n, m) for ϑ in eachcol(θ)]

# Neural network
w = 128   # width of each hidden layer
ψ = Chain(Dense(n, w, relu), Dense(w, w, relu))
ϕ = Chain(Dense(w, w, relu), Dense(w, d))
u = DeepSet(ψ, ϕ)

# Initialise the estimator
estimator = IntervalEstimator(u)

# Train the estimator
estimator = train(estimator, sample, simulate, simulator_args = m, K = 3000)

# Plot the risk history
plotrisk()

# Assess the estimator
θ_test = sample(500)
Z_test = simulate(θ_test, m);
assessment = assess(estimator, θ_test, Z_test)
plot(assessment)

# Inference with "observed" data 
θ = [0.8f0; 0.1f0]
Z = simulate(θ, m)
estimate(estimator, Z) 
interval(estimator, Z)
source
NeuralEstimators.QuantileEstimatorType
QuantileEstimator <: BayesEstimator
QuantileEstimator(v; probs = [0.025, 0.5, 0.975], g = Flux.softplus, i = nothing)

A neural estimator that jointly estimates a fixed set of marginal posterior quantiles, with probability levels $\{\tau_1, \dots, \tau_T\}$ controlled by the keyword argument probs. This generalises IntervalEstimator to support an arbitrary number of probability levels.

Given data $\boldsymbol{Z}$, by default the estimator approximates quantiles of the distributions of

\[\theta_i \mid \boldsymbol{Z}, \quad i = 1, \dots, d, \]

for parameters $\boldsymbol{\theta} \equiv (\theta_1, \dots, \theta_d)'$. Alternatively, if initialised with i set to a positive integer, the estimator approximates quantiles of the full conditional distribution of

\[\theta_i \mid \boldsymbol{Z}, \boldsymbol{\theta}_{-i},\]

where $\boldsymbol{\theta}_{-i}$ denotes the parameter vector with its $i$th element removed.

The estimator employs a representation that prevents quantile crossing, namely,

\[\begin{aligned} \boldsymbol{q}^{(\tau_1)}(\boldsymbol{Z}) &= \boldsymbol{v}^{(\tau_1)}(\boldsymbol{Z}),\\ \boldsymbol{q}^{(\tau_t)}(\boldsymbol{Z}) &= \boldsymbol{v}^{(\tau_1)}(\boldsymbol{Z}) + \sum_{j=2}^t g(\boldsymbol{v}^{(\tau_j)}(\boldsymbol{Z})), \quad t = 2, \dots, T, \end{aligned}\]

where $\boldsymbol{q}^{(\tau)}(\boldsymbol{Z})$ denotes the vector of $\tau$-quantiles for parameters $\boldsymbol{\theta} \equiv (\theta_1, \dots, \theta_d)'$; $\boldsymbol{v}^{(\tau_t)}(\cdot)$, $t = 1, \dots, T$, are neural networks that map from the sample space to $\mathbb{R}^d$; and $g(\cdot)$ is a monotonically increasing function (e.g., exponential or softplus) applied elementwise to its arguments. If g = nothing, the quantiles are estimated independently through the representation

\[\boldsymbol{q}^{(\tau_t)}(\boldsymbol{Z}) = \boldsymbol{v}^{(\tau_t)}(\boldsymbol{Z}), \quad t = 1, \dots, T.\]

When the neural networks are DeepSet objects, two requirements must be met. First, the number of input neurons in the first layer of the outer network must equal the number of neurons in the final layer of the inner network plus $\text{dim}(\boldsymbol{\theta}_{-i})$, where we define $\text{dim}(\boldsymbol{\theta}_{-i}) \equiv 0$ when targetting marginal posteriors of the form $\theta_i \mid \boldsymbol{Z}$ (the default behaviour). Second, the number of output neurons in the final layer of the outer network must equal $d - \text{dim}(\boldsymbol{\theta}_{-i})$.

The return value is a matrix with $\{d - \text{dim}(\boldsymbol{\theta}_{-i})\} \times T$ rows, where the first $T$ rows correspond to the estimated quantiles for the first parameter, the second $T$ rows corresponds to the estimated quantiles for the second parameter, and so on.

Examples

using NeuralEstimators, Flux

# Data Z|μ,σ ~ N(μ, σ²) with priors μ ~ U(0, 1) and σ ~ U(0, 1)
d = 2     # dimension of the parameter vector θ
n = 1     # dimension of each independent replicate of Z
m = 30    # number of independent replicates in each data set
sample(K) = rand32(d, K)
simulate(θ, m) = [ϑ[1] .+ ϑ[2] .* randn32(n, m) for ϑ in eachcol(θ)]

# ---- Quantiles of θᵢ ∣ 𝐙, i = 1, …, d ----

# Neural network
w = 64   # width of each hidden layer
ψ = Chain(Dense(n, w, relu), Dense(w, w, relu))
ϕ = Chain(Dense(w, w, relu), Dense(w, d))
v = DeepSet(ψ, ϕ)

# Initialise the estimator
estimator = QuantileEstimator(v)

# Train the estimator
estimator = train(estimator, sample, simulate, simulator_args = m)

# Inference with "observed" data 
θ = [0.8f0; 0.1f0]
Z = simulate(θ, m)
estimate(estimator, Z) 

# ---- Quantiles of θᵢ ∣ 𝐙, θ₋ᵢ ----

# Neural network
w = 64  # width of each hidden layer
ψ = Chain(Dense(n, w, relu), Dense(w, w, relu))
ϕ = Chain(Dense(w + 1, w, relu), Dense(w, d - 1))
v = DeepSet(ψ, ϕ)

# Initialise estimators respectively targetting quantiles of μ∣Z,σ and σ∣Z,μ
q₁ = QuantileEstimator(v; i = 1)
q₂ = QuantileEstimator(v; i = 2)

# Train the estimators
q₁ = train(q₁, sample, simulate, simulator_args = m)
q₂ = train(q₂, sample, simulate, simulator_args = m)

# Estimate quantiles of μ∣Z,σ with σ = 0.5 and for many data sets
θ₋ᵢ = 0.5f0
q₁(Z, θ₋ᵢ)

# Estimate quantiles of μ∣Z,σ with σ = 0.5 for a single data set
q₁(Z[1], θ₋ᵢ)
source
NeuralEstimators.EnsembleType
Ensemble <: NeuralEstimator
Ensemble(estimators)
Ensemble(architecture::Function, J::Integer)
(ensemble::Ensemble)(Z; aggr = mean)

Defines an ensemble of estimators which, when applied to data Z, returns the mean (or another summary defined by aggr) of the individual estimates (see, e.g., Sainsbury-Dale et al., 2025, Sec. S5).

The ensemble can be initialised with a collection of trained estimators and then applied immediately to observed data. Alternatively, the ensemble can be initialised with a collection of untrained estimators (or a function defining the architecture of each estimator, and the number of estimators in the ensemble), trained with train(), and then applied to observed data. In the latter case, where the ensemble is trained directly, if savepath is specified both the ensemble and component estimators will be saved.

Note that train() currently acts sequentially on the component estimators, using the Adam optimiser.

The ensemble components can be accessed by indexing the ensemble; the number of component estimators can be obtained using length().

See also Parallel, which can be used to mimic ensemble methods with an appropriately chosen connection.

Examples

using NeuralEstimators, Flux

# Data Z|θ ~ N(θ, 1) with θ ~ N(0, 1)
d = 1     # dimension of the parameter vector θ
n = 1     # dimension of each independent replicate of Z
m = 30    # number of independent replicates in each data set
sampler(K) = randn32(d, K)
simulator(θ, m) = [μ .+ randn32(n, m) for μ ∈ eachcol(θ)]

# Neural-network architecture of each ensemble component
function architecture()
	ψ = Chain(Dense(n, 64, relu), Dense(64, 64, relu))
	ϕ = Chain(Dense(64, 64, relu), Dense(64, d))
	network = DeepSet(ψ, ϕ)
	PointEstimator(network)
end

# Initialise ensemble with three component estimators 
ensemble = Ensemble(architecture, 3)
ensemble[1]      # access component estimators by indexing
ensemble[1:2]    # indexing with an iterable collection returns the corresponding ensemble 
length(ensemble) # number of component estimators

# Training
ensemble = train(ensemble, sampler, simulator, m = m, epochs = 5)

# Assessment
θ = sampler(1000)
Z = simulator(θ, m)
assessment = assess(ensemble, θ, Z)
rmse(assessment)

# Apply to data
ensemble(Z)
source

Training

The function train() is used to train a neural estimator.

After training, the risk history and optimiser state can be accessed and inspected using loadrisk(), plotrisk(), and loadoptimiser().

NeuralEstimators.trainFunction
train(estimator, θ_train::P, θ_val::P, Z_train::T, Z_val::T; ...) where {T, P <: Union{AbstractMatrix, ParameterConfigurations}}
train(estimator, θ_train::P, θ_val::P, simulator::Function; ...) where {P <: Union{AbstractMatrix, ParameterConfigurations}}
train(estimator, sampler::Function, simulator::Function; ...)

Trains a neural estimator.

The methods cater for different variants of "on-the-fly" simulation. Specifically, a sampler can be provided to continuously sample new parameter vectors from the prior, and a simulator can be provided to continuously simulate new data conditional on the parameters. If provided with specific sets of parameters (θ_train and θ_val) and/or data (Z_train and Z_val), they will be held fixed during training.

The validation parameters and data are always held fixed,

The trained estimator is always returned on the CPU.

Keyword arguments common to all methods:

  • loss = Flux.mae (applicable only to PointEstimator): loss function used to train the neural network.

  • epochs = 100: number of epochs to train the neural network. An epoch is one complete pass through the entire training data set when doing stochastic gradient descent.

  • stopping_epochs = 5: cease training if the risk does not improve in this number of epochs.

  • batchsize = 32: the batchsize to use when performing stochastic gradient descent, that is, the number of training samples processed between each update of the neural-network parameters.

  • optimiser = Flux.setup(Adam(5e-4), estimator): any Optimisers.jl optimisation rule for updating the neural-network parameters. When the training data and/or parameters are held fixed, one may wish to employ regularisation to prevent overfitting; for example, optimiser = Flux.setup(OptimiserChain(WeightDecay(1e-4), Adam()), estimator), which corresponds to L₂ regularisation with penalty coefficient λ=10⁻⁴.

  • lr_schedule::Union{Nothing, ParameterSchedulers.AbstractSchedule}: defines the learning-rate schedule for adaptively changing the learning rate during training. Accepts either a ParameterSchedulers.jl object or nothing for a fixed learning rate. By default, it uses CosAnneal with a maximum set to the initial learning rate from optimiser, a minimum of zero, and a period equal to the number of epochs. The learning rate is updated at the end of each epoch.

  • use_gpu = true: flag indicating whether to use a GPU if one is available.

  • savepath::Union{Nothing, String} = tempdir(): path to save information generated during training. If nothing, nothing is saved. Otherwise, the following files are always saved to both savepath and tempdir() (the latter for convenient within-session access via loadrisk, plotrisk, and loadoptimiser):

    • loss_per_epoch.csv: training and validation risk at each epoch, in the first and second columns respectively.
    • best_optimiser.bson: optimiser state corresponding to the best validation risk.
    • final_optimiser.bson: optimiser state at the final epoch.

    If additionally savepath != tempdir(), the following files are also saved to savepath:

    • best_network.bson: neural-network parameters corresponding to the best validation risk.
    • final_network.bson: neural-network parameters at the final epoch.
    • train_time.csv: total training time in seconds.
  • risk_history::Union{Nothing, Matrix} = nothing: a matrix with two columns containing the training and validation risk from a previous call to train(), used to initialise the risk history when continuing training. Can be loaded from a previous call to train using loadrisk.

  • verbose = true: flag indicating whether information, including empirical risk values and timings, should be printed to the console during training.

Keyword arguments common to train(estimator, sampler, simulator) and train(estimator, θ_train, θ_val, simulator):

  • simulator_args = (): positional arguments passed to the simulator as simulator(θ, simulator_args...).
  • simulator_kwargs::NamedTuple = (;): keyword arguments passed to the simulator as simulator(...; simulator_kwargs...).
  • epochs_per_Z_refresh = 1: the number of passes to make through the training set before the training data are refreshed.
  • simulate_just_in_time = false: flag indicating whether we should simulate just-in-time, in the sense that only a batchsize number of parameter vectors and corresponding data are in memory at a given time.

Keyword arguments unique to train(estimator, sampler, simulator):

  • sampler_args = (): positional arguments passed to the parameter sampler as sampler(K, sampler_args...).
  • sampler_kwargs::NamedTuple = (;): keyword arguments passed to the parameter sampler as sampler(...; sampler_kwargs...).
  • K = 10000: number of parameter vectors in the training set.
  • K_val = K ÷ 2 number of parameter vectors in the validation set.
  • epochs_per_θ_refresh = 1: the number of passes to make through the training set before the training parameters are refreshed. Must be a multiple of epochs_per_Z_refresh. Can also be provided as epochs_per_theta_refresh.

Examples

using NeuralEstimators, Flux

# Data Z|μ,σ ~ N(μ, σ²) with priors μ ~ N(0, 1) and σ ~ U(0, 1)
function sampler(K)
	μ = randn(K) # Gaussian prior
	σ = rand(K)  # Uniform prior
	θ = vcat(μ', σ')
	return θ
end
function simulator(θ, m)
	[ϑ[1] .+ ϑ[2] * randn(1, m) for ϑ ∈ eachcol(θ)]
end

# Neural network 
d = 2     # dimension of the parameter vector θ
n = 1     # dimension of each independent replicate of Z
w = 128   # width of each hidden layer 
ψ = Chain(Dense(n, w, relu), Dense(w, w, relu))
ϕ = Chain(Dense(w, w, relu), Dense(w, d))
network = DeepSet(ψ, ϕ)

# Initialise the estimator
estimator = PointEstimator(network)

# Number of data sets in each epoch and number of independent replicates in each data set
K = 1000
m = 30

# Training: simulation on-the-fly
estimator  = train(estimator, sampler, simulator, simulator_args = m, K = K)

# Plot the risk history (using any plotting backend)
using Plots
unicodeplots()
plotrisk()

# Training: simulation on-the-fly with fixed parameters 
θ_train = sampler(K)
θ_val   = sampler(K)
estimator = train(estimator, θ_train, θ_val, simulator, simulator_args = m, optimiser = loadoptimiser(), risk_history = loadrisk())

# Training: fixed parameters and fixed data
Z_train   = simulator(θ_train, m)
Z_val     = simulator(θ_val, m)
estimator = train(estimator, θ_train, θ_val, Z_train, Z_val, optimiser = loadoptimiser(), risk_history = loadrisk())
source
NeuralEstimators.loadriskFunction
loadrisk(savepath::String = tempdir())

Loads the training and validation risk history saved during the most recent call to train(). By default, loads from the temporary directory used during the current session. If a savepath was provided to train(), that path can be passed here to reload risk history from a previous session.

Returns a matrix with two columns: training risk (column 1) and validation risk (column 2).

See also plotrisk.

source
NeuralEstimators.plotriskFunction
plotrisk(savepath::String = tempdir())

Plots the training and validation risk as a function of epoch, loaded from savepath. By default, loads from the temporary directory used during the current session. If a savepath was provided to train(), that path can be passed here to plot risk history from a previous session.

Requires a plotting package (e.g., using Plots) to be loaded.

See also loadrisk.

source
NeuralEstimators.loadoptimiserFunction
loadoptimiser(savepath::String = tempdir(); best::Bool = true)

Loads the optimiser saved during the most recent call to train().

By default, loads from the temporary directory used during the current session. If a savepath was provided to train(), that path can be passed here to reload the optimiser from a previous session.

By default, loads the optimiser corresponding to the best network (as measured by validation loss). Set best = false to load the optimiser from the final epoch instead, which can be useful for resuming training from exactly where it left off.

The returned optimizer can be passed directly to train() via the optimiser keyword argument to resume training with the correct optimiser state.

source

Assessment/calibration

NeuralEstimators.assessFunction
assess(estimator, θ, Z; ...)
assess(estimators::Vector, θ, Z; ...)

Assesses an estimator (or a collection of estimators) based on true parameters θ and corresponding simulated data Z.

The parameters θ should be given as a $d$ × $K$ matrix, where $d$ is the parameter dimension and $K$ is the number of sampled parameter vectors.

When Z contains more simulated data sets than the number $K$ of sampled parameter vectors, θ will be recycled via horizontal concatenation: θ = repeat(θ, outer = (1, J)), where J = numobs(Z) ÷ K is the number of simulated data sets for each parameter vector. This allows assessment of the estimator's sampling distribution under fixed parameters.

The return value is of type Assessment.

Keyword arguments

  • estimator_name::String (or estimator_names::Vector{String} for multiple estimators): name(s) of the estimator(s) (sensible defaults provided).
  • parameter_names::Vector{String}: names of the parameters (sensible default provided).
  • use_gpu = true: Bool or Vector{Bool} with length equal to the number of estimators.
  • probs = nothing (applicable only to PointEstimator): probability levels taking values between 0 and 1. By default, no bootstrap uncertainty quantification is done; if probs is provided, it must be a two-element vector specifying the lower and upper probability levels for non-parametric bootstrap intervals (note that parametric bootstrap is not currently supported with assess()).
  • B::Integer = 400 (applicable only to PointEstimator): number of bootstrap samples.
  • pointsummary::Function = mean (applicable only to estimators that yield posterior samples): a function that summarises a vector of posterior samples into a single point estimate for each marginal; any function mapping a vector to a scalar is valid (e.g., median for the posterior median).
  • N::Integer = 1000 (applicable only to estimators that yield posterior samples): number of posterior samples drawn for each data set.
  • kwargs... (applicable only to estimators that yield posterior samples): additional keyword arguments passed to sampleposterior.
source
NeuralEstimators.AssessmentType
Assessment

A type for storing the output of assess(). The field runtime contains the total time taken for each estimator. The field estimates is a long-form DataFrame with columns:

  • parameter: the name of the parameter
  • truth: the true value of the parameter
  • estimate: the estimated value of the parameter
  • k: the index of the parameter vector
  • j: the index of the data set (only relevant in the case that multiple data sets are associated with each parameter vector)

If the estimator is an IntervalEstimator, the column estimate will be replaced by the columns lower and upper, containing the lower and upper bounds of the interval, respectively.

If the estimator is a QuantileEstimator, there will also be a column prob indicating the probability level of the corresponding quantile estimate.

If the estimator is a PosteriorEstimator, in addition to the fields listed above, the field samples stores the posterior samples as a long-form DataFrame with the columns parameter, truth, k, j (as given above), as well as:

  • draw: the index of the draw within the posterior samples
  • value: the value of the posterior sample for a given parameter and draw.

Use merge() to combine assessments from multiple estimators of the same type or join() to combine assessments from a PointEstimator and an IntervalEstimator.

source
Makie.plotMethod
plot(assessment::Assessment; prob = 0.99)

Visualise the performance of a neural estimator. Accepts the Assessment object returned by assess.

Extension

This function is defined in the NeuralEstimatorsPlottingMakieExt extension and requires CairoMakie (or another Makie backend) to be loaded.

The plot produced depends on the type of estimator being assessed:

PointEstimator: produces a scatter plot of estimates vs. true values, faceted by parameter.

IntervalEstimator: produces a plot of estimated credible intervals vs. true values, faceted by parameter. Each interval is drawn as a vertical line segment from lower to upper bound, with tick marks at the endpoints.

QuantileEstimator: produces a calibration plot of the empirical coverage probability vs. the nominal probability level τ, faceted by parameter. A well-calibrated estimator will follow the red diagonal line. Specifically, the diagnostic is constructed as follows:

  1. For k = 1,…,K, sample pairs (θᵏ, Zᵏ) with θᵏ ∼ p(θ), Zᵏ ∼ p(Z ∣ θᵏ). This gives K "posterior draws", θᵏ ∼ p(θ ∣ Zᵏ).
  2. For each k and each τ ∈ {τⱼ : j = 1,…,J}, estimate the posterior quantile Q(Zᵏ, τ).
  3. For each τ, compute the proportion of quantiles Q(Zᵏ, τ) exceeding the corresponding θᵏ, and plot this proportion against τ.

PosteriorEstimator: produces a three-row figure:

  1. Recovery plot: posterior mean vs. true value (scatter), with vertical line segments showing the 95% posterior credible interval, faceted by parameter.
  2. ECDF plot: for each parameter, the empirical CDF of the fractional rank of the true value within the posterior samples, together with a simultaneous prob-level confidence band. A well-calibrated posterior yields an ECDF that stays within the band.
  3. Z-score / contraction plot: posterior z-score (posterior mean − truth) / posterior SD vs. posterior contraction 1 − Var(posterior) / Var(prior), faceted by parameter. Ideally z-scores are centred near zero and contractions are near one.

Keyword arguments

  • prob = 0.99: nominal simultaneous coverage level for the SBC confidence band. Only used when assessment contains posterior samples.
source
NeuralEstimators.riskFunction
risk(assessment::Assessment; ...)

Computes a Monte Carlo approximation of an estimator's Bayes risk,

\[r(\hat{\boldsymbol{\theta}}(\cdot)) \approx \frac{1}{K} \sum_{k=1}^K L(\boldsymbol{\theta}^{(k)}, \hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)})),\]

where $\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}$ denotes a set of $K$ parameter vectors sampled from the prior and, for each $k$, data $\boldsymbol{Z}^{(k)}$ are simulated from the statistical model conditional on $\boldsymbol{\theta}^{(k)}$.

If the Assessment object corresponds to an estimator with a self-defined loss (e.g., PosteriorEstimator), the precomputed risk is returned directly. Otherwise, the risk is computed from the estimates and true parameters using the provided loss function.

Keyword arguments

  • loss = (x, y) -> abs(x - y): a binary operator defining the loss function (default: absolute-error loss)
  • average_over_parameters::Bool = false: if true, the loss is averaged over all parameters; otherwise (default), it is computed separately for each parameter.
source
NeuralEstimators.biasFunction
bias(assessment::Assessment; average_over_parameters = false)

Computes a Monte Carlo approximation of an estimator's bias,

\[{\textrm{bias}}(\hat{\boldsymbol{\theta}}(\cdot)) \approx \frac{1}{K} \sum_{k=1}^K \{\hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)}) - \boldsymbol{\theta}^{(k)}\},\]

where $\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}$ denotes a set of $K$ parameter vectors sampled from the prior and, for each $k$, data $\boldsymbol{Z}^{(k)}$ are simulated from the statistical model conditional on $\boldsymbol{\theta}^{(k)}$.

source
NeuralEstimators.rmseFunction
rmse(assessment::Assessment; average_over_parameters = false)

Computes a Monte Carlo approximation of an estimator's root-mean-squared error,

\[{\textrm{rmse}}(\hat{\boldsymbol{\theta}}(\cdot)) \approx \sqrt{\frac{1}{K} \sum_{k=1}^K \{\hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)}) - \boldsymbol{\theta}^{(k)}\}^2},\]

where $\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}$ denotes a set of $K$ parameter vectors sampled from the prior and, for each $k$, data $\boldsymbol{Z}^{(k)}$ are simulated from the statistical model conditional on $\boldsymbol{\theta}^{(k)}$.

source
NeuralEstimators.coverageFunction
coverage(assessment::Assessment; ...)

Computes a Monte Carlo approximation of an interval estimator's expected coverage, as defined in Hermans et al. (2022, Definition 2.1), and the proportion of parameters below and above the lower and upper bounds, respectively.

Keyword arguments

  • average_over_parameters::Bool = false: if true, the coverage is averaged over all parameters; otherwise (default), it is computed over each parameter separately.
source

Inference with observed data

The following functions facilitate the use of a trained neural estimator with observed data.

NeuralEstimators.estimateFunction
estimate(estimator, Z, X = nothing; batchsize::Integer = 32, use_gpu::Bool = true, kwargs...)

Applies estimator to data Z.

If X is provided, the estimator will be applied to the tuple (Z, X).

source
NeuralEstimators.bootstrapFunction
bootstrap(estimator::PointEstimator, parameters::P, Z; use_gpu = true) where P <: Union{AbstractMatrix, ParameterConfigurations}
bootstrap(estimator::PointEstimator, parameters::P, simulator, m::Integer; B = 400, use_gpu = true) where P <: Union{AbstractMatrix, ParameterConfigurations}
bootstrap(estimator::PointEstimator, Z; B = 400, blocks = nothing, trim = true, use_gpu = true)

Generates B bootstrap estimates using estimator.

Parametric bootstrapping is facilitated by passing a single parameter configuration, parameters, and corresponding simulated data, Z, whose length implicitly defines B. Alternatively, one may provide a simulator and the desired sample size, in which case the data will be simulated using simulator(parameters, m).

Non-parametric bootstrapping is facilitated by passing a single data set, Z. The argument blocks caters for block bootstrapping, and it should be a vector of integers specifying the block for each replicate. For example, with 5 replicates, the first two corresponding to block 1 and the remaining three corresponding to block 2, blocks should be [1, 1, 2, 2, 2]. The resampling algorithm generates resampled data sets by sampling blocks with replacement. If trim = true, the final block is trimmed as needed to ensure that the resampled data set matches the original size of Z.

The return type is a $d$ × B matrix, where $d$ is the dimension of the parameter vector.

source
NeuralEstimators.intervalFunction
interval(θ::Matrix; probs = [0.05, 0.95], parameter_names = nothing)
interval(estimator::IntervalEstimator, Z; parameter_names = nothing, use_gpu = true)

Computes a confidence/credible interval based either on a $d$ × $B$ matrix θ of parameters (typically containing bootstrap estimates or posterior draws), where $d$ denotes the number of parameters to make inference on, or from an IntervalEstimator and data Z.

When given θ, the intervals are constructed by computing quantiles with probability levels controlled by the keyword argument probs.

The return type is a $d$ × 2 matrix, whose first and second columns respectively contain the lower and upper bounds of the interval. The rows of this matrix can be named by passing a vector of strings to the keyword argument parameter_names.

source
NeuralEstimators.sampleposteriorFunction
sampleposterior(estimator::PosteriorEstimator, Z, N::Integer = 1000; use_gpu::Bool = true)
sampleposterior(estimator::RatioEstimator, Z, N::Integer = 1000; θ_grid, logprior::Function = θ -> 0f0, use_gpu::Bool = true)

Samples from the approximate posterior distribution implied by estimator.

The positional argument N controls the size of the posterior sample.

If Z represents a single data set as determined by Flux.numobs(), returns a $d$ × N matrix of posterior samples, where $d$ is the dimension of the parameter vector. Otherwise, if Z contains multiple data sets, a vector of matrices will be returned.

When using a RatioEstimator, the prior distribution $p(\boldsymbol{\theta})$ is controlled through the keyword argument logprior (by default, a uniform prior is used). The sampling algorithm is based on a fine-gridding of the parameter space, specified through the keyword argument θ_grid. The approximate posterior density is evaluated over this grid, which is then used to draw samples. This is effective when making inference with a small number of parameters. For models with a large number of parameters, other sampling algorithms (e.g., MCMC) may be needed (please contact the package maintainer).

source
NeuralEstimators.posteriormedianFunction
posteriormedian(θ::AbstractMatrix)	
posteriormedian(estimator, Z, N::Integer = 1000; kwargs...)

Computes the vector of marginal posterior medians based either on a $d$ × $N$ matrix θ of posterior draws, where $d$ denotes the number of parameters to make inference on, or directly from an estimator that allows for posterior sampling via sampleposterior().

See also posteriormean(), posteriorquantile().

source
NeuralEstimators.posteriorquantileFunction
posteriorquantile(θ::AbstractMatrix, probs)	
posteriorquantile(estimator, Z, probs, N::Integer = 1000; kwargs...)

Computes the vector of marginal posterior quantiles with (a collection of) probability levels probs, based either on a $d$ × $N$ matrix θ of posterior draws, where $d$ denotes the number of parameters to make inference on, or directly from an estimator that allows for posterior sampling via sampleposterior().

The return value is a $d$ × length(probs) matrix.

See also posteriormedian().

source
NeuralEstimators.posteriormodeFunction
posteriormode(estimator::RatioEstimator, Z; θ₀ = nothing, θ_grid = nothing, logprior::Function = θ -> 0f0, use_gpu = true)

Computes the (approximate) posterior mode (maximum a posteriori estimate) given data $\boldsymbol{Z}$,

\[\underset{\boldsymbol{\theta}}{\mathrm{arg\,max\;}} \ell(\boldsymbol{\theta} ; \boldsymbol{Z}) + \log p(\boldsymbol{\theta}),\]

where $\ell(\cdot ; \cdot)$ denotes the approximate log-likelihood function implied by estimator, and $p(\boldsymbol{\theta})$ denotes the prior density function controlled through the keyword argument prior. Note that this estimate can be viewed as an approximate maximum penalised likelihood estimate, with penalty term $p(\boldsymbol{\theta})$.

If a vector θ₀ of initial parameter estimates is given, the approximate posterior density is maximised by gradient descent (requires Optim.jl to be loaded). Otherwise, if a matrix of parameters θ_grid is given, the approximate posterior density is maximised by grid search.

See also posteriormedian(), posteriormean().

source