Core
This page documents the classes and functions that are central to the workflow of NeuralEstimators. Its organisation reflects the order in which these classes and functions appear in a standard implementation: from sampling parameters from the prior distribution, to making inference with observed data.
Sampling parameters
Parameters sampled from the prior distribution are stored as a $d \times K$ matrix, where $d$ is the dimension of the parameter vector to make inference on and $K$ is the number of sampled parameter vectors.
It can sometimes be helpful to wrap the parameter matrix in a user-defined type that also stores expensive intermediate objects needed for data simulated (e.g., Cholesky factors). The user-defined type should be a subtype of ParameterConfigurations, whose only requirement is a field θ that stores the matrix of parameters. See Storing expensive intermediate objects for data simulation for further discussion.
NeuralEstimators.ParameterConfigurations — Type
ParameterConfigurationsAn abstract supertype for user-defined types that store parameters and any intermediate objects needed for data simulation.
The user-defined type must have a field θ that stores the $d$ × $K$ matrix of parameters, where $d$ is the dimension of the parameter vector to make inference on and $K$ is the number of sampled parameter vectors. There are no other requirements.
See subsetparameters() for the generic function for subsetting these objects.
Examples
struct P <: ParameterConfigurations
θ
# other expensive intermediate objects...
endSimulating data
The package accommodates any model for which simulation is feasible by allowing users to define their model implicitly through simulated data.
Simulated data sets are stored as mini-batches in a format amenable to the chosen neural-network architecture. For example, when constructing an estimator from data collected over a grid, one may use a generic CNN, with each data set stored in the final dimension of a four-dimensional array. When performing inference from replicated data, a DeepSet architecture may be used, where simulated data sets are stored in a vector, and conditionally independent replicates are stored as mini-batches within each element of the vector.
Estimators
The package provides several classes of neural estimators, organised within a type hierarchy. At the top-level of the hierarchy is NeuralEstimator, an abstract supertype for all neural estimators in the package.
Neural Bayes estimators are implemented as subtypes of the abstract supertype BayesEstimator. The simple type PointEstimator is used for constructing neural Bayes estimators under general, user-defined loss functions. Several specialised types cater for the estimation of posterior quantiles based on the quantile loss function: see IntervalEstimator and its generalisation QuantileEstimator for estimating posterior quantiles for a fixed set of probability levels.
The type PosteriorEstimator can be used to approximate the posterior distribution, and RatioEstimator can be used to approximate the likelihood-to-evidence ratio.
Finally, Ensemble combines multiple estimators, aggregating their individual estimates to improve accuracy.
NeuralEstimators.NeuralEstimator — Type
NeuralEstimatorAn abstract supertype for all neural estimators.
NeuralEstimators.BayesEstimator — Type
BayesEstimator <: NeuralEstimatorAn abstract supertype for neural Bayes estimators.
NeuralEstimators.PointEstimator — Type
PointEstimator <: BayesEstimator
PointEstimator(network)A neural point estimator, where the neural network is a mapping from the sample space to the parameter space.
NeuralEstimators.PosteriorEstimator — Type
PosteriorEstimator <: NeuralEstimator
PosteriorEstimator(network, q::ApproximateDistribution)
PosteriorEstimator(network, d::Integer, dstar::Integer = d; q::ApproximateDistribution = NormalisingFlow, kwargs...)A neural estimator that approximates the posterior distribution $p(\boldsymbol{\theta} \mid \boldsymbol{Z})$, based on a neural network and an approximate distribution q (see the available in-built Approximate distributions).
The neural network is a mapping from the sample space to $\mathbb{R}^{d^*}$, where $d^*$ is a user-specified number of summary statistics. The learned summary statistics are then transformed into parameters $\boldsymbol{\kappa} \in \mathcal{K}$ of the approximate distribution using a conventional multilayer perceptron (MLP).
The convenience constructor PosteriorEstimator(network, d, dstar; q::ApproximateDistribution) builds the approximate distribution q internally, with the keyword arguments passed onto the constructor of q.
Examples
using NeuralEstimators, Flux, CairoMakie
# Data Z|μ,σ ~ N(μ, σ²) with priors μ ~ U(0, 1) and σ ~ U(0, 1)
d = 2 # dimension of the parameter vector θ
n = 1 # dimension of each independent replicate of Z
m = 50 # number of independent replicates in each data set
sample(K) = rand32(d, K)
simulate(θ, m) = [ϑ[1] .+ ϑ[2] .* randn32(n, m) for ϑ in eachcol(θ)]
# Distribution used to approximate the posterior
q = NormalisingFlow(d, d)
# Neural network (outputs d summary statistics)
w = 128
ψ = Chain(Dense(n, w, relu), Dense(w, w, relu), Dense(w, w, relu))
ϕ = Chain(Dense(w, w, relu), Dense(w, w, relu), Dense(w, d))
network = DeepSet(ψ, ϕ)
estimator = PosteriorEstimator(network, GaussianMixture(d, d))
# Initialise the estimator
estimator = PosteriorEstimator(network, q)
# Train the estimator
estimator = train(estimator, sample, simulate, simulator_args = m, K = 3000)
# Plot the risk history
plotrisk()
# Assess the estimator
θ_test = sample(500)
Z_test = simulate(θ_test, m);
assessment = assess(estimator, θ_test, Z_test)
plot(assessment)
# Inference with observed data
θ = [0.8f0 0.1f0]'
Z = simulate(θ, m)
sampleposterior(estimator, Z) # posterior draws
posteriormean(estimator, Z) # point estimateNeuralEstimators.RatioEstimator — Type
RatioEstimator <: NeuralEstimator
RatioEstimator(network)A neural estimator that estimates the likelihood-to-evidence ratio,
\[r(\boldsymbol{Z}, \boldsymbol{\theta}) \equiv p(\boldsymbol{Z} \mid \boldsymbol{\theta})/p(\boldsymbol{Z}),\]
where $p(\boldsymbol{Z} \mid \boldsymbol{\theta})$ is the likelihood and $p(\boldsymbol{Z})$ is the marginal likelihood, also known as the model evidence.
For numerical stability, training is done on the log-scale using the relation $\log r(\boldsymbol{Z}, \boldsymbol{\theta}) = \text{logit}(c^*(\boldsymbol{Z}, \boldsymbol{\theta}))$, where $c^*(\cdot, \cdot)$ denotes the Bayes classifier as described in the methodology section. Hence, the neural network should be a mapping from $\mathcal{Z} \times \Theta$ to $\mathbb{R}$, where $\mathcal{Z}$ and $\Theta$ denote the sample and parameter spaces, respectively.
The neural network must implement a method network(::Tuple), where the first element of the tuple contains the data sets and the second element contains the parameter matrices.
When the neural network is a DeepSet (which implements the above method), two requirements must be met. First, the number of input neurons in the first layer of the outer network must equal $d$ plus the number of output neurons in the final layer of the inner network. Second, the number of output neurons in the final layer of the outer network must be one.
When applying the estimator to data, the log of the likelihood-to-evidence ratio is returned. The estimated ratio can then be used in various Bayesian (e.g., Hermans et al., 2020) or frequentist (e.g., Walchessen et al., 2024) inferential algorithms.
Examples
using NeuralEstimators, Flux, CairoMakie
# Data Z|μ,σ ~ N(μ, σ²) with priors μ ~ U(0, 1) and σ ~ U(0, 1)
d = 2 # dimension of the parameter vector θ
n = 1 # dimension of each independent replicate of Z
m = 30 # number of independent replicates in each data set
sample(K) = rand32(d, K)
simulate(θ, m) = [ϑ[1] .+ ϑ[2] .* randn32(n, m) for ϑ in eachcol(θ)]
# Neural network
w = 128
ψ = Chain(Dense(n, w, relu), Dense(w, w, relu), Dense(w, w, relu))
ϕ = Chain(Dense(w + d, w, relu), Dense(w, w, relu), Dense(w, 1))
network = DeepSet(ψ, ϕ)
# Initialise the estimator
r̂ = RatioEstimator(network)
# Train the estimator
r̂ = train(r̂, sample, simulate, simulator_args = m, K = 1000)
# Plot the risk history
plotrisk()
# Assess the estimator
θ_test = sample(500)
Z_test = simulate(θ_test, m);
θ_grid = f32(expandgrid(0:0.01:1, 0:0.01:1))' # fine gridding of the parameter space
assessment = assess(r̂, θ_test, Z_test; θ_grid = θ_grid)
plot(assessment)
# Generate "observed" data
θ = sample(1)
z = simulate(θ, 200)
# Grid-based optimization and sampling
estimate(r̂, z, θ_grid) # log of likelihood-to-evidence ratios
posteriormode(r̂, z; θ_grid = θ_grid) # posterior mode
sampleposterior(r̂, z; θ_grid = θ_grid) # posterior samples
# Gradient-based optimization
using Optim
θ₀ = [0.5, 0.5] # initial estimate
posteriormode(r̂, z; θ₀ = θ₀) # posterior mode NeuralEstimators.IntervalEstimator — Type
IntervalEstimator <: BayesEstimator
IntervalEstimator(u, v = u, c::Union{Function, Compress} = identity; probs = [0.025, 0.975], g = exp)
IntervalEstimator(u, c::Union{Function, Compress}; probs = [0.025, 0.975], g = exp)A neural estimator that jointly estimates marginal posterior credible intervals based on the probability levels probs (by default, 95% central credible intervals).
The estimator employs a representation that prevents quantile crossing. Specifically, given data $\boldsymbol{Z}$, it constructs intervals for each parameter $\theta_i$, $i = 1, \dots, d,$ of the form,
\[[c_i(u_i(\boldsymbol{Z})), \;\; c_i(u_i(\boldsymbol{Z})) + g(v_i(\boldsymbol{Z})))],\]
where $\boldsymbol{u}(⋅) \equiv (u_1(\cdot), \dots, u_d(\cdot))'$ and $\boldsymbol{v}(⋅) \equiv (v_1(\cdot), \dots, v_d(\cdot))'$ are neural networks that map from the sample space to $\mathbb{R}^d$; $g(\cdot)$ is a monotonically increasing function (e.g., exponential or softplus); and each $c_i(⋅)$ is a monotonically increasing function that maps its input to the prior support of $\theta_i$.
The functions $c_i(⋅)$ may be collectively defined by a $d$-dimensional Compress object, which can constrain the interval estimator's output to the prior support. If these functions are unspecified, they will be set to the identity function so that the range of the intervals will be unrestricted. If only a single neural-network architecture is provided, it will be used for both $\boldsymbol{u}(⋅)$ and $\boldsymbol{v}(⋅)$.
The return value when applied to data using estimate() is a matrix with $2d$ rows, where the first and second $d$ rows correspond to the lower and upper bounds, respectively. The function interval() can be used to format this output in a readable $d$ × 2 matrix.
See also QuantileEstimator.
Examples
using NeuralEstimators, Flux, CairoMakie
# Data Z|μ,σ ~ N(μ, σ²) with priors μ ~ U(0, 1) and σ ~ U(0, 1)
d = 2 # dimension of the parameter vector θ
n = 1 # dimension of each independent replicate of Z
m = 100 # number of independent replicates
sample(K) = rand32(d, K)
simulate(θ, m) = [ϑ[1] .+ ϑ[2] .* randn(n, m) for ϑ in eachcol(θ)]
# Neural network
w = 128 # width of each hidden layer
ψ = Chain(Dense(n, w, relu), Dense(w, w, relu))
ϕ = Chain(Dense(w, w, relu), Dense(w, d))
u = DeepSet(ψ, ϕ)
# Initialise the estimator
estimator = IntervalEstimator(u)
# Train the estimator
estimator = train(estimator, sample, simulate, simulator_args = m, K = 3000)
# Plot the risk history
plotrisk()
# Assess the estimator
θ_test = sample(500)
Z_test = simulate(θ_test, m);
assessment = assess(estimator, θ_test, Z_test)
plot(assessment)
# Inference with "observed" data
θ = [0.8f0; 0.1f0]
Z = simulate(θ, m)
estimate(estimator, Z)
interval(estimator, Z)NeuralEstimators.QuantileEstimator — Type
QuantileEstimator <: BayesEstimator
QuantileEstimator(v; probs = [0.025, 0.5, 0.975], g = Flux.softplus, i = nothing)A neural estimator that jointly estimates a fixed set of marginal posterior quantiles, with probability levels $\{\tau_1, \dots, \tau_T\}$ controlled by the keyword argument probs. This generalises IntervalEstimator to support an arbitrary number of probability levels.
Given data $\boldsymbol{Z}$, by default the estimator approximates quantiles of the distributions of
\[\theta_i \mid \boldsymbol{Z}, \quad i = 1, \dots, d, \]
for parameters $\boldsymbol{\theta} \equiv (\theta_1, \dots, \theta_d)'$. Alternatively, if initialised with i set to a positive integer, the estimator approximates quantiles of the full conditional distribution of
\[\theta_i \mid \boldsymbol{Z}, \boldsymbol{\theta}_{-i},\]
where $\boldsymbol{\theta}_{-i}$ denotes the parameter vector with its $i$th element removed.
The estimator employs a representation that prevents quantile crossing, namely,
\[\begin{aligned} \boldsymbol{q}^{(\tau_1)}(\boldsymbol{Z}) &= \boldsymbol{v}^{(\tau_1)}(\boldsymbol{Z}),\\ \boldsymbol{q}^{(\tau_t)}(\boldsymbol{Z}) &= \boldsymbol{v}^{(\tau_1)}(\boldsymbol{Z}) + \sum_{j=2}^t g(\boldsymbol{v}^{(\tau_j)}(\boldsymbol{Z})), \quad t = 2, \dots, T, \end{aligned}\]
where $\boldsymbol{q}^{(\tau)}(\boldsymbol{Z})$ denotes the vector of $\tau$-quantiles for parameters $\boldsymbol{\theta} \equiv (\theta_1, \dots, \theta_d)'$; $\boldsymbol{v}^{(\tau_t)}(\cdot)$, $t = 1, \dots, T$, are neural networks that map from the sample space to $\mathbb{R}^d$; and $g(\cdot)$ is a monotonically increasing function (e.g., exponential or softplus) applied elementwise to its arguments. If g = nothing, the quantiles are estimated independently through the representation
\[\boldsymbol{q}^{(\tau_t)}(\boldsymbol{Z}) = \boldsymbol{v}^{(\tau_t)}(\boldsymbol{Z}), \quad t = 1, \dots, T.\]
When the neural networks are DeepSet objects, two requirements must be met. First, the number of input neurons in the first layer of the outer network must equal the number of neurons in the final layer of the inner network plus $\text{dim}(\boldsymbol{\theta}_{-i})$, where we define $\text{dim}(\boldsymbol{\theta}_{-i}) \equiv 0$ when targetting marginal posteriors of the form $\theta_i \mid \boldsymbol{Z}$ (the default behaviour). Second, the number of output neurons in the final layer of the outer network must equal $d - \text{dim}(\boldsymbol{\theta}_{-i})$.
The return value is a matrix with $\{d - \text{dim}(\boldsymbol{\theta}_{-i})\} \times T$ rows, where the first $T$ rows correspond to the estimated quantiles for the first parameter, the second $T$ rows corresponds to the estimated quantiles for the second parameter, and so on.
Examples
using NeuralEstimators, Flux
# Data Z|μ,σ ~ N(μ, σ²) with priors μ ~ U(0, 1) and σ ~ U(0, 1)
d = 2 # dimension of the parameter vector θ
n = 1 # dimension of each independent replicate of Z
m = 30 # number of independent replicates in each data set
sample(K) = rand32(d, K)
simulate(θ, m) = [ϑ[1] .+ ϑ[2] .* randn32(n, m) for ϑ in eachcol(θ)]
# ---- Quantiles of θᵢ ∣ 𝐙, i = 1, …, d ----
# Neural network
w = 64 # width of each hidden layer
ψ = Chain(Dense(n, w, relu), Dense(w, w, relu))
ϕ = Chain(Dense(w, w, relu), Dense(w, d))
v = DeepSet(ψ, ϕ)
# Initialise the estimator
estimator = QuantileEstimator(v)
# Train the estimator
estimator = train(estimator, sample, simulate, simulator_args = m)
# Inference with "observed" data
θ = [0.8f0; 0.1f0]
Z = simulate(θ, m)
estimate(estimator, Z)
# ---- Quantiles of θᵢ ∣ 𝐙, θ₋ᵢ ----
# Neural network
w = 64 # width of each hidden layer
ψ = Chain(Dense(n, w, relu), Dense(w, w, relu))
ϕ = Chain(Dense(w + 1, w, relu), Dense(w, d - 1))
v = DeepSet(ψ, ϕ)
# Initialise estimators respectively targetting quantiles of μ∣Z,σ and σ∣Z,μ
q₁ = QuantileEstimator(v; i = 1)
q₂ = QuantileEstimator(v; i = 2)
# Train the estimators
q₁ = train(q₁, sample, simulate, simulator_args = m)
q₂ = train(q₂, sample, simulate, simulator_args = m)
# Estimate quantiles of μ∣Z,σ with σ = 0.5 and for many data sets
θ₋ᵢ = 0.5f0
q₁(Z, θ₋ᵢ)
# Estimate quantiles of μ∣Z,σ with σ = 0.5 for a single data set
q₁(Z[1], θ₋ᵢ)NeuralEstimators.Ensemble — Type
Ensemble <: NeuralEstimator
Ensemble(estimators)
Ensemble(architecture::Function, J::Integer)
(ensemble::Ensemble)(Z; aggr = mean)Defines an ensemble of estimators which, when applied to data Z, returns the mean (or another summary defined by aggr) of the individual estimates (see, e.g., Sainsbury-Dale et al., 2025, Sec. S5).
The ensemble can be initialised with a collection of trained estimators and then applied immediately to observed data. Alternatively, the ensemble can be initialised with a collection of untrained estimators (or a function defining the architecture of each estimator, and the number of estimators in the ensemble), trained with train(), and then applied to observed data. In the latter case, where the ensemble is trained directly, if savepath is specified both the ensemble and component estimators will be saved.
Note that train() currently acts sequentially on the component estimators, using the Adam optimiser.
The ensemble components can be accessed by indexing the ensemble; the number of component estimators can be obtained using length().
See also Parallel, which can be used to mimic ensemble methods with an appropriately chosen connection.
Examples
using NeuralEstimators, Flux
# Data Z|θ ~ N(θ, 1) with θ ~ N(0, 1)
d = 1 # dimension of the parameter vector θ
n = 1 # dimension of each independent replicate of Z
m = 30 # number of independent replicates in each data set
sampler(K) = randn32(d, K)
simulator(θ, m) = [μ .+ randn32(n, m) for μ ∈ eachcol(θ)]
# Neural-network architecture of each ensemble component
function architecture()
ψ = Chain(Dense(n, 64, relu), Dense(64, 64, relu))
ϕ = Chain(Dense(64, 64, relu), Dense(64, d))
network = DeepSet(ψ, ϕ)
PointEstimator(network)
end
# Initialise ensemble with three component estimators
ensemble = Ensemble(architecture, 3)
ensemble[1] # access component estimators by indexing
ensemble[1:2] # indexing with an iterable collection returns the corresponding ensemble
length(ensemble) # number of component estimators
# Training
ensemble = train(ensemble, sampler, simulator, m = m, epochs = 5)
# Assessment
θ = sampler(1000)
Z = simulator(θ, m)
assessment = assess(ensemble, θ, Z)
rmse(assessment)
# Apply to data
ensemble(Z)Training
The function train() is used to train a neural estimator.
After training, the risk history and optimiser state can be accessed and inspected using loadrisk(), plotrisk(), and loadoptimiser().
NeuralEstimators.train — Function
train(estimator, θ_train::P, θ_val::P, Z_train::T, Z_val::T; ...) where {T, P <: Union{AbstractMatrix, ParameterConfigurations}}
train(estimator, θ_train::P, θ_val::P, simulator::Function; ...) where {P <: Union{AbstractMatrix, ParameterConfigurations}}
train(estimator, sampler::Function, simulator::Function; ...)Trains a neural estimator.
The methods cater for different variants of "on-the-fly" simulation. Specifically, a sampler can be provided to continuously sample new parameter vectors from the prior, and a simulator can be provided to continuously simulate new data conditional on the parameters. If provided with specific sets of parameters (θ_train and θ_val) and/or data (Z_train and Z_val), they will be held fixed during training.
The validation parameters and data are always held fixed,
The trained estimator is always returned on the CPU.
Keyword arguments common to all methods:
loss = Flux.mae(applicable only toPointEstimator): loss function used to train the neural network.epochs = 100: number of epochs to train the neural network. An epoch is one complete pass through the entire training data set when doing stochastic gradient descent.stopping_epochs = 5: cease training if the risk does not improve in this number of epochs.batchsize = 32: the batchsize to use when performing stochastic gradient descent, that is, the number of training samples processed between each update of the neural-network parameters.optimiser = Flux.setup(Adam(5e-4), estimator): any Optimisers.jl optimisation rule for updating the neural-network parameters. When the training data and/or parameters are held fixed, one may wish to employ regularisation to prevent overfitting; for example,optimiser = Flux.setup(OptimiserChain(WeightDecay(1e-4), Adam()), estimator), which corresponds to L₂ regularisation with penalty coefficient λ=10⁻⁴.lr_schedule::Union{Nothing, ParameterSchedulers.AbstractSchedule}: defines the learning-rate schedule for adaptively changing the learning rate during training. Accepts either a ParameterSchedulers.jl object ornothingfor a fixed learning rate. By default, it usesCosAnnealwith a maximum set to the initial learning rate fromoptimiser, a minimum of zero, and a period equal to the number of epochs. The learning rate is updated at the end of each epoch.use_gpu = true: flag indicating whether to use a GPU if one is available.savepath::Union{Nothing, String} = tempdir(): path to save information generated during training. Ifnothing, nothing is saved. Otherwise, the following files are always saved to bothsavepathandtempdir()(the latter for convenient within-session access vialoadrisk,plotrisk, andloadoptimiser):loss_per_epoch.csv: training and validation risk at each epoch, in the first and second columns respectively.best_optimiser.bson: optimiser state corresponding to the best validation risk.final_optimiser.bson: optimiser state at the final epoch.
If additionally
savepath != tempdir(), the following files are also saved tosavepath:best_network.bson: neural-network parameters corresponding to the best validation risk.final_network.bson: neural-network parameters at the final epoch.train_time.csv: total training time in seconds.
risk_history::Union{Nothing, Matrix} = nothing: a matrix with two columns containing the training and validation risk from a previous call totrain(), used to initialise the risk history when continuing training. Can be loaded from a previous call totrainusingloadrisk.verbose = true: flag indicating whether information, including empirical risk values and timings, should be printed to the console during training.
Keyword arguments common to train(estimator, sampler, simulator) and train(estimator, θ_train, θ_val, simulator):
simulator_args = (): positional arguments passed to the simulator assimulator(θ, simulator_args...).simulator_kwargs::NamedTuple = (;): keyword arguments passed to the simulator assimulator(...; simulator_kwargs...).epochs_per_Z_refresh = 1: the number of passes to make through the training set before the training data are refreshed.simulate_just_in_time = false: flag indicating whether we should simulate just-in-time, in the sense that only abatchsizenumber of parameter vectors and corresponding data are in memory at a given time.
Keyword arguments unique to train(estimator, sampler, simulator):
sampler_args = (): positional arguments passed to the parameter sampler assampler(K, sampler_args...).sampler_kwargs::NamedTuple = (;): keyword arguments passed to the parameter sampler assampler(...; sampler_kwargs...).K = 10000: number of parameter vectors in the training set.K_val = K ÷ 2number of parameter vectors in the validation set.epochs_per_θ_refresh = 1: the number of passes to make through the training set before the training parameters are refreshed. Must be a multiple ofepochs_per_Z_refresh. Can also be provided asepochs_per_theta_refresh.
Examples
using NeuralEstimators, Flux
# Data Z|μ,σ ~ N(μ, σ²) with priors μ ~ N(0, 1) and σ ~ U(0, 1)
function sampler(K)
μ = randn(K) # Gaussian prior
σ = rand(K) # Uniform prior
θ = vcat(μ', σ')
return θ
end
function simulator(θ, m)
[ϑ[1] .+ ϑ[2] * randn(1, m) for ϑ ∈ eachcol(θ)]
end
# Neural network
d = 2 # dimension of the parameter vector θ
n = 1 # dimension of each independent replicate of Z
w = 128 # width of each hidden layer
ψ = Chain(Dense(n, w, relu), Dense(w, w, relu))
ϕ = Chain(Dense(w, w, relu), Dense(w, d))
network = DeepSet(ψ, ϕ)
# Initialise the estimator
estimator = PointEstimator(network)
# Number of data sets in each epoch and number of independent replicates in each data set
K = 1000
m = 30
# Training: simulation on-the-fly
estimator = train(estimator, sampler, simulator, simulator_args = m, K = K)
# Plot the risk history (using any plotting backend)
using Plots
unicodeplots()
plotrisk()
# Training: simulation on-the-fly with fixed parameters
θ_train = sampler(K)
θ_val = sampler(K)
estimator = train(estimator, θ_train, θ_val, simulator, simulator_args = m, optimiser = loadoptimiser(), risk_history = loadrisk())
# Training: fixed parameters and fixed data
Z_train = simulator(θ_train, m)
Z_val = simulator(θ_val, m)
estimator = train(estimator, θ_train, θ_val, Z_train, Z_val, optimiser = loadoptimiser(), risk_history = loadrisk())NeuralEstimators.loadrisk — Function
loadrisk(savepath::String = tempdir())Loads the training and validation risk history saved during the most recent call to train(). By default, loads from the temporary directory used during the current session. If a savepath was provided to train(), that path can be passed here to reload risk history from a previous session.
Returns a matrix with two columns: training risk (column 1) and validation risk (column 2).
See also plotrisk.
NeuralEstimators.plotrisk — Function
plotrisk(savepath::String = tempdir())Plots the training and validation risk as a function of epoch, loaded from savepath. By default, loads from the temporary directory used during the current session. If a savepath was provided to train(), that path can be passed here to plot risk history from a previous session.
Requires a plotting package (e.g., using Plots) to be loaded.
See also loadrisk.
NeuralEstimators.loadoptimiser — Function
loadoptimiser(savepath::String = tempdir(); best::Bool = true)Loads the optimiser saved during the most recent call to train().
By default, loads from the temporary directory used during the current session. If a savepath was provided to train(), that path can be passed here to reload the optimiser from a previous session.
By default, loads the optimiser corresponding to the best network (as measured by validation loss). Set best = false to load the optimiser from the final epoch instead, which can be useful for resuming training from exactly where it left off.
The returned optimizer can be passed directly to train() via the optimiser keyword argument to resume training with the correct optimiser state.
Assessment/calibration
NeuralEstimators.assess — Function
assess(estimator, θ, Z; ...)
assess(estimators::Vector, θ, Z; ...)Assesses an estimator (or a collection of estimators) based on true parameters θ and corresponding simulated data Z.
The parameters θ should be given as a $d$ × $K$ matrix, where $d$ is the parameter dimension and $K$ is the number of sampled parameter vectors.
When Z contains more simulated data sets than the number $K$ of sampled parameter vectors, θ will be recycled via horizontal concatenation: θ = repeat(θ, outer = (1, J)), where J = numobs(Z) ÷ K is the number of simulated data sets for each parameter vector. This allows assessment of the estimator's sampling distribution under fixed parameters.
The return value is of type Assessment.
Keyword arguments
estimator_name::String(orestimator_names::Vector{String}for multiple estimators): name(s) of the estimator(s) (sensible defaults provided).parameter_names::Vector{String}: names of the parameters (sensible default provided).use_gpu = true:BoolorVector{Bool}with length equal to the number of estimators.probs = nothing(applicable only toPointEstimator): probability levels taking values between 0 and 1. By default, no bootstrap uncertainty quantification is done; ifprobsis provided, it must be a two-element vector specifying the lower and upper probability levels for non-parametric bootstrap intervals (note that parametric bootstrap is not currently supported withassess()).B::Integer = 400(applicable only toPointEstimator): number of bootstrap samples.pointsummary::Function = mean(applicable only to estimators that yield posterior samples): a function that summarises a vector of posterior samples into a single point estimate for each marginal; any function mapping a vector to a scalar is valid (e.g.,medianfor the posterior median).N::Integer = 1000(applicable only to estimators that yield posterior samples): number of posterior samples drawn for each data set.kwargs...(applicable only to estimators that yield posterior samples): additional keyword arguments passed tosampleposterior.
NeuralEstimators.Assessment — Type
AssessmentA type for storing the output of assess(). The field runtime contains the total time taken for each estimator. The field estimates is a long-form DataFrame with columns:
parameter: the name of the parametertruth: the true value of the parameterestimate: the estimated value of the parameterk: the index of the parameter vectorj: the index of the data set (only relevant in the case that multiple data sets are associated with each parameter vector)
If the estimator is an IntervalEstimator, the column estimate will be replaced by the columns lower and upper, containing the lower and upper bounds of the interval, respectively.
If the estimator is a QuantileEstimator, there will also be a column prob indicating the probability level of the corresponding quantile estimate.
If the estimator is a PosteriorEstimator, in addition to the fields listed above, the field samples stores the posterior samples as a long-form DataFrame with the columns parameter, truth, k, j (as given above), as well as:
draw: the index of the draw within the posterior samplesvalue: the value of the posterior sample for a given parameter and draw.
Use merge() to combine assessments from multiple estimators of the same type or join() to combine assessments from a PointEstimator and an IntervalEstimator.
Makie.plot — Method
plot(assessment::Assessment; prob = 0.99)Visualise the performance of a neural estimator. Accepts the Assessment object returned by assess.
This function is defined in the NeuralEstimatorsPlottingMakieExt extension and requires CairoMakie (or another Makie backend) to be loaded.
The plot produced depends on the type of estimator being assessed:
PointEstimator: produces a scatter plot of estimates vs. true values, faceted by parameter.
IntervalEstimator: produces a plot of estimated credible intervals vs. true values, faceted by parameter. Each interval is drawn as a vertical line segment from lower to upper bound, with tick marks at the endpoints.
QuantileEstimator: produces a calibration plot of the empirical coverage probability vs. the nominal probability level τ, faceted by parameter. A well-calibrated estimator will follow the red diagonal line. Specifically, the diagnostic is constructed as follows:
- For k = 1,…,K, sample pairs (θᵏ, Zᵏ) with θᵏ ∼ p(θ), Zᵏ ∼ p(Z ∣ θᵏ). This gives K "posterior draws", θᵏ ∼ p(θ ∣ Zᵏ).
- For each k and each τ ∈ {τⱼ : j = 1,…,J}, estimate the posterior quantile Q(Zᵏ, τ).
- For each τ, compute the proportion of quantiles Q(Zᵏ, τ) exceeding the corresponding θᵏ, and plot this proportion against τ.
PosteriorEstimator: produces a three-row figure:
- Recovery plot: posterior mean vs. true value (scatter), with vertical line segments showing the 95% posterior credible interval, faceted by parameter.
- ECDF plot: for each parameter, the empirical CDF of the fractional rank of the true value within the posterior samples, together with a simultaneous
prob-level confidence band. A well-calibrated posterior yields an ECDF that stays within the band. - Z-score / contraction plot: posterior z-score (posterior mean − truth) / posterior SD vs. posterior contraction 1 − Var(posterior) / Var(prior), faceted by parameter. Ideally z-scores are centred near zero and contractions are near one.
Keyword arguments
prob = 0.99: nominal simultaneous coverage level for the SBC confidence band. Only used whenassessmentcontains posterior samples.
NeuralEstimators.risk — Function
risk(assessment::Assessment; ...)Computes a Monte Carlo approximation of an estimator's Bayes risk,
\[r(\hat{\boldsymbol{\theta}}(\cdot)) \approx \frac{1}{K} \sum_{k=1}^K L(\boldsymbol{\theta}^{(k)}, \hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)})),\]
where $\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}$ denotes a set of $K$ parameter vectors sampled from the prior and, for each $k$, data $\boldsymbol{Z}^{(k)}$ are simulated from the statistical model conditional on $\boldsymbol{\theta}^{(k)}$.
If the Assessment object corresponds to an estimator with a self-defined loss (e.g., PosteriorEstimator), the precomputed risk is returned directly. Otherwise, the risk is computed from the estimates and true parameters using the provided loss function.
Keyword arguments
loss = (x, y) -> abs(x - y): a binary operator defining the loss function (default: absolute-error loss)average_over_parameters::Bool = false: iftrue, the loss is averaged over all parameters; otherwise (default), it is computed separately for each parameter.
NeuralEstimators.bias — Function
bias(assessment::Assessment; average_over_parameters = false)Computes a Monte Carlo approximation of an estimator's bias,
\[{\textrm{bias}}(\hat{\boldsymbol{\theta}}(\cdot)) \approx \frac{1}{K} \sum_{k=1}^K \{\hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)}) - \boldsymbol{\theta}^{(k)}\},\]
where $\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}$ denotes a set of $K$ parameter vectors sampled from the prior and, for each $k$, data $\boldsymbol{Z}^{(k)}$ are simulated from the statistical model conditional on $\boldsymbol{\theta}^{(k)}$.
NeuralEstimators.rmse — Function
rmse(assessment::Assessment; average_over_parameters = false)Computes a Monte Carlo approximation of an estimator's root-mean-squared error,
\[{\textrm{rmse}}(\hat{\boldsymbol{\theta}}(\cdot)) \approx \sqrt{\frac{1}{K} \sum_{k=1}^K \{\hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)}) - \boldsymbol{\theta}^{(k)}\}^2},\]
where $\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}$ denotes a set of $K$ parameter vectors sampled from the prior and, for each $k$, data $\boldsymbol{Z}^{(k)}$ are simulated from the statistical model conditional on $\boldsymbol{\theta}^{(k)}$.
NeuralEstimators.coverage — Function
coverage(assessment::Assessment; ...)Computes a Monte Carlo approximation of an interval estimator's expected coverage, as defined in Hermans et al. (2022, Definition 2.1), and the proportion of parameters below and above the lower and upper bounds, respectively.
Keyword arguments
average_over_parameters::Bool = false: if true, the coverage is averaged over all parameters; otherwise (default), it is computed over each parameter separately.
Inference with observed data
The following functions facilitate the use of a trained neural estimator with observed data.
NeuralEstimators.estimate — Function
estimate(estimator, Z, X = nothing; batchsize::Integer = 32, use_gpu::Bool = true, kwargs...)Applies estimator to data Z.
If X is provided, the estimator will be applied to the tuple (Z, X).
NeuralEstimators.bootstrap — Function
bootstrap(estimator::PointEstimator, parameters::P, Z; use_gpu = true) where P <: Union{AbstractMatrix, ParameterConfigurations}
bootstrap(estimator::PointEstimator, parameters::P, simulator, m::Integer; B = 400, use_gpu = true) where P <: Union{AbstractMatrix, ParameterConfigurations}
bootstrap(estimator::PointEstimator, Z; B = 400, blocks = nothing, trim = true, use_gpu = true)Generates B bootstrap estimates using estimator.
Parametric bootstrapping is facilitated by passing a single parameter configuration, parameters, and corresponding simulated data, Z, whose length implicitly defines B. Alternatively, one may provide a simulator and the desired sample size, in which case the data will be simulated using simulator(parameters, m).
Non-parametric bootstrapping is facilitated by passing a single data set, Z. The argument blocks caters for block bootstrapping, and it should be a vector of integers specifying the block for each replicate. For example, with 5 replicates, the first two corresponding to block 1 and the remaining three corresponding to block 2, blocks should be [1, 1, 2, 2, 2]. The resampling algorithm generates resampled data sets by sampling blocks with replacement. If trim = true, the final block is trimmed as needed to ensure that the resampled data set matches the original size of Z.
The return type is a $d$ × B matrix, where $d$ is the dimension of the parameter vector.
NeuralEstimators.interval — Function
interval(θ::Matrix; probs = [0.05, 0.95], parameter_names = nothing)
interval(estimator::IntervalEstimator, Z; parameter_names = nothing, use_gpu = true)Computes a confidence/credible interval based either on a $d$ × $B$ matrix θ of parameters (typically containing bootstrap estimates or posterior draws), where $d$ denotes the number of parameters to make inference on, or from an IntervalEstimator and data Z.
When given θ, the intervals are constructed by computing quantiles with probability levels controlled by the keyword argument probs.
The return type is a $d$ × 2 matrix, whose first and second columns respectively contain the lower and upper bounds of the interval. The rows of this matrix can be named by passing a vector of strings to the keyword argument parameter_names.
NeuralEstimators.sampleposterior — Function
sampleposterior(estimator::PosteriorEstimator, Z, N::Integer = 1000; use_gpu::Bool = true)
sampleposterior(estimator::RatioEstimator, Z, N::Integer = 1000; θ_grid, logprior::Function = θ -> 0f0, use_gpu::Bool = true)Samples from the approximate posterior distribution implied by estimator.
The positional argument N controls the size of the posterior sample.
If Z represents a single data set as determined by Flux.numobs(), returns a $d$ × N matrix of posterior samples, where $d$ is the dimension of the parameter vector. Otherwise, if Z contains multiple data sets, a vector of matrices will be returned.
When using a RatioEstimator, the prior distribution $p(\boldsymbol{\theta})$ is controlled through the keyword argument logprior (by default, a uniform prior is used). The sampling algorithm is based on a fine-gridding of the parameter space, specified through the keyword argument θ_grid. The approximate posterior density is evaluated over this grid, which is then used to draw samples. This is effective when making inference with a small number of parameters. For models with a large number of parameters, other sampling algorithms (e.g., MCMC) may be needed (please contact the package maintainer).
NeuralEstimators.posteriormean — Function
posteriormean(θ::AbstractMatrix)
posteriormean(estimator, Z, N::Integer = 1000; kwargs...)Computes the posterior mean based either on a $d$ × $N$ matrix θ of posterior draws, where $d$ denotes the number of parameters to make inference on, or directly from an estimator that allows for posterior sampling via sampleposterior().
See also posteriormedian(), posteriormode().
NeuralEstimators.posteriormedian — Function
posteriormedian(θ::AbstractMatrix)
posteriormedian(estimator, Z, N::Integer = 1000; kwargs...)Computes the vector of marginal posterior medians based either on a $d$ × $N$ matrix θ of posterior draws, where $d$ denotes the number of parameters to make inference on, or directly from an estimator that allows for posterior sampling via sampleposterior().
See also posteriormean(), posteriorquantile().
NeuralEstimators.posteriorquantile — Function
posteriorquantile(θ::AbstractMatrix, probs)
posteriorquantile(estimator, Z, probs, N::Integer = 1000; kwargs...)Computes the vector of marginal posterior quantiles with (a collection of) probability levels probs, based either on a $d$ × $N$ matrix θ of posterior draws, where $d$ denotes the number of parameters to make inference on, or directly from an estimator that allows for posterior sampling via sampleposterior().
The return value is a $d$ × length(probs) matrix.
See also posteriormedian().
NeuralEstimators.posteriormode — Function
posteriormode(estimator::RatioEstimator, Z; θ₀ = nothing, θ_grid = nothing, logprior::Function = θ -> 0f0, use_gpu = true)Computes the (approximate) posterior mode (maximum a posteriori estimate) given data $\boldsymbol{Z}$,
\[\underset{\boldsymbol{\theta}}{\mathrm{arg\,max\;}} \ell(\boldsymbol{\theta} ; \boldsymbol{Z}) + \log p(\boldsymbol{\theta}),\]
where $\ell(\cdot ; \cdot)$ denotes the approximate log-likelihood function implied by estimator, and $p(\boldsymbol{\theta})$ denotes the prior density function controlled through the keyword argument prior. Note that this estimate can be viewed as an approximate maximum penalised likelihood estimate, with penalty term $p(\boldsymbol{\theta})$.
If a vector θ₀ of initial parameter estimates is given, the approximate posterior density is maximised by gradient descent (requires Optim.jl to be loaded). Otherwise, if a matrix of parameters θ_grid is given, the approximate posterior density is maximised by grid search.
See also posteriormedian(), posteriormean().