Core

This page documents the classes and functions that are central to the workflow of NeuralEstimators. Its organisation reflects the order in which these classes and functions appear in a standard implementation: from sampling parameters from the prior distribution, to making inference with observed data.

Sampling parameters

Parameters sampled from the prior distribution are stored as a $d \times K$ matrix, where $d$ is the dimension of the parameter vector to make inference on and $K$ is the number of sampled parameter vectors.

It can sometimes be helpful to wrap the parameter matrix in a user-defined type that also stores expensive intermediate objects needed for data simulated (e.g., Cholesky factors). The user-defined type should be a subtype of ParameterConfigurations, whose only requirement is a field θ that stores the matrix of parameters. See Storing expensive intermediate objects for data simulation for further discussion.

NeuralEstimators.ParameterConfigurationsType
ParameterConfigurations

An abstract supertype for user-defined types that store parameters and any intermediate objects needed for data simulation.

The user-defined type must have a field θ that stores the $d$ × $K$ matrix of parameters, where $d$ is the dimension of the parameter vector to make inference on and $K$ is the number of sampled parameter vectors. There are no other requirements.

See subsetparameters() for the generic function for subsetting these objects.

Examples

struct P <: ParameterConfigurations
	θ
	# other expensive intermediate objects...
end
source

Simulating data

The package accommodates any model for which simulation is feasible by allowing users to define their model implicitly through simulated data.

The data are stored as a Vector{A}, where each element of the vector is associated with one parameter vector, and the subtype A depends on the multivariate structure of the data. Common formats include:

  • Unstructured data: A is typically an $n \times m$ matrix, where:
    • $n$ is the dimension of each replicate (e.g., $n=1$ for univariate data, $n=2$ for bivariate data).
    • $m$ is the number of independent replicates in each data set ($m$ is allowed to vary between data sets).
  • Data collected over a regular grid: A is typically an ($N + 2$)-dimensional array, where:
    • The first $N$ dimensions correspond to the dimensions of the grid (e.g., $N = 1$ for time series, $N = 2$ for two-dimensional spatial grids).
    • The penultimate dimension stores the so-called "channels" (e.g., singleton for univariate processes, two for bivariate processes).
    • The final dimension stores the $m$ independent replicates.
  • Spatial data collected over irregular locations: A is typically a GNNGraph, where independent replicates (possibly with differing spatial locations) are stored as subgraphs. See the helper function spatialgraph() for constructing these graphs from matrices of spatial locations and data.

While the formats above cover many applications, the package is flexible: the data structure simply needs to align with the chosen neural-network architecture.

Estimators

The package provides several classes of neural estimators, organised within a type hierarchy. At the top-level of the hierarchy is NeuralEstimator, an abstract supertype for all neural estimators in the package.

Neural Bayes estimators are implemented as subtypes of the abstract supertype BayesEstimator. The simple type PointEstimator is used for constructing neural Bayes estimators under general, user-defined loss functions. Several specialised types cater for the estimation of posterior quantiles based on the quantile loss function: see IntervalEstimator and its generalisation QuantileEstimator for estimating posterior quantiles for a fixed set of probability levels.

The type PosteriorEstimator can be used to approximate the posterior distribution, and RatioEstimator can be used to approximate the likelihood-to-evidence ratio.

Several types serve as wrappers around the aforementioned estimators, enhancing their functionality. PiecewiseEstimator applies different estimators based on the sample size of the data (see the discussion on Variable sample sizes). Finally, Ensemble combines multiple estimators, aggregating their individual estimates to improve accuracy.

NeuralEstimators.PointEstimatorType
PointEstimator <: BayesEstimator
PointEstimator(network)

A neural point estimator, where the neural network is a mapping from the sample space to the parameter space.

source
NeuralEstimators.PosteriorEstimatorType
PosteriorEstimator <: NeuralEstimator
PosteriorEstimator(q::ApproximateDistribution, network)

A neural estimator that approximates the posterior distribution $p(\boldsymbol{\theta} \mid \boldsymbol{Z})$, based on a neural network and an approximate distribution q (see the available in-built Approximate distributions).

The neural network is a mapping from the sample space to a space determined by the chosen approximate distribution q. Often, the output space is the space $\mathcal{K}$ of the approximate-distribution parameters $\boldsymbol{\kappa}$. However, for certain distributions (notably, NormalisingFlow), the neural network outputs summary statistics of suitable dimension (e.g., the dimension $d$ of the parameter vector), which are then transformed into parameters of the approximate distribution using conventional multilayer perceptrons (see NormalisingFlow).

Examples

using NeuralEstimators, Flux

# Data Z|μ,σ ~ N(μ, σ²) with priors μ ~ U(0, 1) and σ ~ U(0, 1)
d = 2     # dimension of the parameter vector θ
n = 1     # dimension of each independent replicate of Z
m = 30    # number of independent replicates in each data set
sample(K) = rand32(d, K)
simulate(θ, m) = [ϑ[1] .+ ϑ[2] .* randn32(n, m) for ϑ in eachcol(θ)]

# Distribution used to approximate the posterior 
q = NormalisingFlow(d, d) 

# Neural network (outputs d summary statistics)
w = 128   
ψ = Chain(Dense(n, w, relu), Dense(w, w, relu), Dense(w, w, relu))
ϕ = Chain(Dense(w, w, relu), Dense(w, w, relu), Dense(w, d))
network = DeepSet(ψ, ϕ)

# Initialise the estimator
estimator = PosteriorEstimator(q, network)

# Train the estimator
estimator = train(estimator, sample, simulate, m = m)

# Inference with observed data 
θ = [0.8f0; 0.1f0]
Z = simulate(θ, m)
sampleposterior(estimator, Z) # posterior draws 
posteriormean(estimator, Z)   # point estimate
source
NeuralEstimators.RatioEstimatorType
RatioEstimator <: NeuralEstimator
RatioEstimator(network)

A neural estimator that estimates the likelihood-to-evidence ratio,

\[r(\boldsymbol{Z}, \boldsymbol{\theta}) \equiv p(\boldsymbol{Z} \mid \boldsymbol{\theta})/p(\boldsymbol{Z}),\]

where $p(\boldsymbol{Z} \mid \boldsymbol{\theta})$ is the likelihood and $p(\boldsymbol{Z})$ is the marginal likelihood, also known as the model evidence.

For numerical stability, training is done on the log-scale using the relation $\log r(\boldsymbol{Z}, \boldsymbol{\theta}) = \text{logit}(c^*(\boldsymbol{Z}, \boldsymbol{\theta}))$, where $c^*(\cdot, \cdot)$ denotes the Bayes classifier as described in the Methodology section. Hence, the neural network should be a mapping from $\mathcal{Z} \times \Theta$ to $\mathbb{R}$, where $\mathcal{Z}$ and $\Theta$ denote the sample and parameter spaces, respectively.

When the neural network is a DeepSet, two requirements must be met. First, the number of input neurons in the first layer of the outer network must equal $d$ plus the number of output neurons in the final layer of the inner network. Second, the number of output neurons in the final layer of the outer network must be one.

When applying the estimator to data Z, by default the likelihood-to-evidence ratio $r(\boldsymbol{Z}, \boldsymbol{\theta})$ is returned (setting the keyword argument classifier = true will yield class probability estimates). The estimated ratio can then be used in various Bayesian (e.g., Hermans et al., 2020) or frequentist (e.g., Walchessen et al., 2024) inferential algorithms.

Examples

using NeuralEstimators, Flux

# Data Z|μ,σ ~ N(μ, σ²) with priors μ ~ U(0, 1) and σ ~ U(0, 1)
d = 2     # dimension of the parameter vector θ
n = 1     # dimension of each independent replicate of Z
m = 30    # number of independent replicates in each data set
sample(K) = rand32(d, K)
simulate(θ, m) = [ϑ[1] .+ ϑ[2] .* randn32(n, m) for ϑ in eachcol(θ)]

# Neural network
w = 128 
ψ = Chain(Dense(n, w, relu), Dense(w, w, relu), Dense(w, w, relu))
ϕ = Chain(Dense(w + d, w, relu), Dense(w, w, relu), Dense(w, 1))
network = DeepSet(ψ, ϕ)

# Initialise the estimator
r̂ = RatioEstimator(network)

# Train the estimator
r̂ = train(r̂, sample, simulate, m = m)

# Inference with "observed" data (grid-based optimisation and sampling)
θ = sample(1)
z = simulate(θ, m)[1]
θ_grid = f32(expandgrid(0:0.01:1, 0:0.01:1))'  # fine gridding of the parameter space
r̂(z, θ_grid)                                   # likelihood-to-evidence ratios over grid
mlestimate(r̂, z; θ_grid = θ_grid)              # maximum-likelihood estimate
posteriormode(r̂, z; θ_grid = θ_grid)           # posterior mode 
sampleposterior(r̂, z; θ_grid = θ_grid)         # posterior samples

# Inference with "observed" data (gradient-based optimisation using Optim.jl)
using Optim
θ₀ = [0.5, 0.5]                                # initial estimate
mlestimate(r̂, z; θ₀ = θ₀)                      # maximum-likelihood estimate
posteriormode(r̂, z; θ₀ = θ₀)                   # posterior mode 
source
NeuralEstimators.IntervalEstimatorType
IntervalEstimator <: BayesEstimator
IntervalEstimator(u, v = u, c::Union{Function, Compress} = identity; probs = [0.025, 0.975], g = exp)
IntervalEstimator(u, c::Union{Function, Compress}; probs = [0.025, 0.975], g = exp)

A neural estimator that jointly estimates marginal posterior credible intervals based on the probability levels probs (by default, 95% central credible intervals).

The estimator employs a representation that prevents quantile crossing. Specifically, given data $\boldsymbol{Z}$, it constructs intervals for each parameter $\theta_i$, $i = 1, \dots, d,$ of the form,

\[[c_i(u_i(\boldsymbol{Z})), \;\; c_i(u_i(\boldsymbol{Z})) + g(v_i(\boldsymbol{Z})))],\]

where $\boldsymbol{u}(⋅) \equiv (u_1(\cdot), \dots, u_d(\cdot))'$ and $\boldsymbol{v}(⋅) \equiv (v_1(\cdot), \dots, v_d(\cdot))'$ are neural networks that map from the sample space to $\mathbb{R}^d$; $g(\cdot)$ is a monotonically increasing function (e.g., exponential or softplus); and each $c_i(⋅)$ is a monotonically increasing function that maps its input to the prior support of $\theta_i$.

The functions $c_i(⋅)$ may be collectively defined by a $d$-dimensional Compress object, which can constrain the interval estimator's output to the prior support. If these functions are unspecified, they will be set to the identity function so that the range of the intervals will be unrestricted. If only a single neural-network architecture is provided, it will be used for both $\boldsymbol{u}(⋅)$ and $\boldsymbol{v}(⋅)$.

The return value when applied to data using estimate() is a matrix with $2d$ rows, where the first and second $d$ rows correspond to the lower and upper bounds, respectively. The function interval() can be used to format this output in a readable $d$ × 2 matrix.

See also QuantileEstimator.

Examples

using NeuralEstimators, Flux

# Data Z|μ,σ ~ N(μ, σ²) with priors μ ~ U(0, 1) and σ ~ U(0, 1)
d = 2     # dimension of the parameter vector θ
n = 1     # dimension of each independent replicate of Z
m = 100   # number of independent replicates
sample(K) = rand32(d, K)
simulate(θ, m) = [ϑ[1] .+ ϑ[2] .* randn(n, m) for ϑ in eachcol(θ)]

# Neural network
w = 128   # width of each hidden layer
ψ = Chain(Dense(n, w, relu), Dense(w, w, relu))
ϕ = Chain(Dense(w, w, relu), Dense(w, d))
u = DeepSet(ψ, ϕ)

# Initialise the estimator
estimator = IntervalEstimator(u)

# Train the estimator
estimator = train(estimator, sample, simulate, m = m)

# Inference with "observed" data 
θ = [0.8f0; 0.1f0]
Z = simulate(θ, m)
estimate(estimator, Z) 
interval(estimator, Z)
source
NeuralEstimators.QuantileEstimatorType
QuantileEstimator <: BayesEstimator
QuantileEstimator(v; probs = [0.025, 0.5, 0.975], g = Flux.softplus, i = nothing)

A neural estimator that jointly estimates a fixed set of marginal posterior quantiles, with probability levels $\{\tau_1, \dots, \tau_T\}$ controlled by the keyword argument probs. This generalises IntervalEstimator to support an arbitrary number of probability levels.

Given data $\boldsymbol{Z}$, by default the estimator approximates quantiles of the distributions of

\[\theta_i \mid \boldsymbol{Z}, \quad i = 1, \dots, d, \]

for parameters $\boldsymbol{\theta} \equiv (\theta_1, \dots, \theta_d)'$. Alternatively, if initialised with i set to a positive integer, the estimator approximates quantiles of the full conditional distribution of

\[\theta_i \mid \boldsymbol{Z}, \boldsymbol{\theta}_{-i},\]

where $\boldsymbol{\theta}_{-i}$ denotes the parameter vector with its $i$th element removed.

The estimator employs a representation that prevents quantile crossing, namely,

\[\begin{aligned} \boldsymbol{q}^{(\tau_1)}(\boldsymbol{Z}) &= \boldsymbol{v}^{(\tau_1)}(\boldsymbol{Z}),\\ \boldsymbol{q}^{(\tau_t)}(\boldsymbol{Z}) &= \boldsymbol{v}^{(\tau_1)}(\boldsymbol{Z}) + \sum_{j=2}^t g(\boldsymbol{v}^{(\tau_j)}(\boldsymbol{Z})), \quad t = 2, \dots, T, \end{aligned}\]

where $\boldsymbol{q}^{(\tau)}(\boldsymbol{Z})$ denotes the vector of $\tau$-quantiles for parameters $\boldsymbol{\theta} \equiv (\theta_1, \dots, \theta_d)'$; $\boldsymbol{v}^{(\tau_t)}(\cdot)$, $t = 1, \dots, T$, are neural networks that map from the sample space to $\mathbb{R}^d$; and $g(\cdot)$ is a monotonically increasing function (e.g., exponential or softplus) applied elementwise to its arguments. If g = nothing, the quantiles are estimated independently through the representation

\[\boldsymbol{q}^{(\tau_t)}(\boldsymbol{Z}) = \boldsymbol{v}^{(\tau_t)}(\boldsymbol{Z}), \quad t = 1, \dots, T.\]

When the neural networks are DeepSet objects, two requirements must be met. First, the number of input neurons in the first layer of the outer network must equal the number of neurons in the final layer of the inner network plus $\text{dim}(\boldsymbol{\theta}_{-i})$, where we define $\text{dim}(\boldsymbol{\theta}_{-i}) \equiv 0$ when targetting marginal posteriors of the form $\theta_i \mid \boldsymbol{Z}$ (the default behaviour). Second, the number of output neurons in the final layer of the outer network must equal $d - \text{dim}(\boldsymbol{\theta}_{-i})$.

The return value is a matrix with $\{d - \text{dim}(\boldsymbol{\theta}_{-i})\} \times T$ rows, where the first $T$ rows correspond to the estimated quantiles for the first parameter, the second $T$ rows corresponds to the estimated quantiles for the second parameter, and so on.

Examples

using NeuralEstimators, Flux

# Data Z|μ,σ ~ N(μ, σ²) with priors μ ~ U(0, 1) and σ ~ U(0, 1)
d = 2     # dimension of the parameter vector θ
n = 1     # dimension of each independent replicate of Z
m = 30    # number of independent replicates in each data set
sample(K) = rand32(d, K)
simulate(θ, m) = [ϑ[1] .+ ϑ[2] .* randn32(n, m) for ϑ in eachcol(θ)]

# ---- Quantiles of θᵢ ∣ 𝐙, i = 1, …, d ----

# Neural network
w = 64   # width of each hidden layer
ψ = Chain(Dense(n, w, relu), Dense(w, w, relu))
ϕ = Chain(Dense(w, w, relu), Dense(w, d))
v = DeepSet(ψ, ϕ)

# Initialise the estimator
estimator = QuantileEstimator(v)

# Train the estimator
estimator = train(estimator, sample, simulate, m = m)

# Inference with "observed" data 
θ = [0.8f0; 0.1f0]
Z = simulate(θ, m)
estimate(estimator, Z) 

# ---- Quantiles of θᵢ ∣ 𝐙, θ₋ᵢ ----

# Neural network
w = 64  # width of each hidden layer
ψ = Chain(Dense(n, w, relu), Dense(w, w, relu))
ϕ = Chain(Dense(w + 1, w, relu), Dense(w, d - 1))
v = DeepSet(ψ, ϕ)

# Initialise estimators respectively targetting quantiles of μ∣Z,σ and σ∣Z,μ
q₁ = QuantileEstimator(v; i = 1)
q₂ = QuantileEstimator(v; i = 2)

# Train the estimators
q₁ = train(q₁, sample, simulate, m = m)
q₂ = train(q₂, sample, simulate, m = m)

# Estimate quantiles of μ∣Z,σ with σ = 0.5 and for many data sets
θ₋ᵢ = 0.5f0
q₁(Z, θ₋ᵢ)

# Estimate quantiles of μ∣Z,σ with σ = 0.5 for a single data set
q₁(Z[1], θ₋ᵢ)
source
NeuralEstimators.PiecewiseEstimatorType
PiecewiseEstimator <: NeuralEstimator
PiecewiseEstimator(estimators, changepoints)

Creates a piecewise estimator (Sainsbury-Dale et al., 2024, Sec. 2.2.2) from a collection of estimators and sample-size changepoints.

Specifically, with $l$ estimators and sample-size changepoints $m_1 < m_2 < \dots < m_{l-1}$, the piecewise etimator takes the form,

\[\hat{\boldsymbol{\theta}}(\boldsymbol{Z}) = \begin{cases} \hat{\boldsymbol{\theta}}_1(\boldsymbol{Z}) & m \leq m_1,\\ \hat{\boldsymbol{\theta}}_2(\boldsymbol{Z}) & m_1 < m \leq m_2,\\ \quad \vdots \\ \hat{\boldsymbol{\theta}}_l(\boldsymbol{Z}) & m > m_{l-1}. \end{cases}\]

For example, given an estimator $\hat{\boldsymbol{\theta}}_1(\cdot)$ trained for small sample sizes (e.g., $m \leq 30$) and an estimator $\hat{\boldsymbol{\theta}}_2(\cdot)$ trained for moderate-to-large sample sizes (e.g., $m > 30$), one may construct a PiecewiseEstimator that dispatches $\hat{\boldsymbol{\theta}}_1(\cdot)$ if $m \leq 30$ and $\hat{\boldsymbol{\theta}}_2(\cdot)$ otherwise.

See also trainmultiple().

Examples

using NeuralEstimators, Flux

n = 2    # bivariate data
d = 3    # dimension of parameter vector 
w = 128  # width of each hidden layer

# Small-sample estimator
ψ₁ = Chain(Dense(n, w, relu), Dense(w, w, relu));
ϕ₁ = Chain(Dense(w, w, relu), Dense(w, d));
θ̂₁ = PointEstimator(DeepSet(ψ₁, ϕ₁))

# Large-sample estimator
ψ₂ = Chain(Dense(n, w, relu), Dense(w, w, relu));
ϕ₂ = Chain(Dense(w, w, relu), Dense(w, d));
θ̂₂ = PointEstimator(DeepSet(ψ₂, ϕ₂))

# Piecewise estimator with changepoint m=30
θ̂ = PiecewiseEstimator([θ̂₁, θ̂₂], 30)

# Apply the (untrained) piecewise estimator to data
Z = [rand(n, m) for m ∈ (10, 50)]
estimate(θ̂, Z)
source
NeuralEstimators.EnsembleType
Ensemble <: NeuralEstimator
Ensemble(estimators)
Ensemble(architecture::Function, J::Integer)
(ensemble::Ensemble)(Z; aggr = median)

Defines an ensemble of estimators which, when applied to data Z, returns the median (or another summary defined by aggr) of the individual estimates (see, e.g., Sainsbury-Dale et al., 2025, Sec. S3).

The ensemble can be initialised with a collection of trained estimators and then applied immediately to observed data. Alternatively, the ensemble can be initialised with a collection of untrained estimators (or a function defining the architecture of each estimator, and the number of estimators in the ensemble), trained with train(), and then applied to observed data. In the latter case, where the ensemble is trained directly, if savepath is specified both the ensemble and component estimators will be saved.

Note that train() currently acts sequentially on the component estimators.

The ensemble components can be accessed by indexing the ensemble; the number of component estimators can be obtained using length().

See also Parallel, which can be used to mimic ensemble methods with an appropriately chosen connection.

Examples

using NeuralEstimators, Flux

# Data Z|θ ~ N(θ, 1) with θ ~ N(0, 1)
d = 1     # dimension of the parameter vector θ
n = 1     # dimension of each independent replicate of Z
m = 30    # number of independent replicates in each data set
sampler(K) = randn32(d, K)
simulator(θ, m) = [μ .+ randn32(n, m) for μ ∈ eachcol(θ)]

# Neural-network architecture of each ensemble component
function architecture()
	ψ = Chain(Dense(n, 64, relu), Dense(64, 64, relu))
	ϕ = Chain(Dense(64, 64, relu), Dense(64, d))
	network = DeepSet(ψ, ϕ)
	PointEstimator(network)
end

# Initialise ensemble with three component estimators 
ensemble = Ensemble(architecture, 3)
ensemble[1]      # access component estimators by indexing
ensemble[1:2]    # indexing with an iterable collection returns the corresponding ensemble 
length(ensemble) # number of component estimators

# Training
ensemble = train(ensemble, sampler, simulator, m = m, epochs = 5)

# Assessment
θ = sampler(1000)
Z = simulator(θ, m)
assessment = assess(ensemble, θ, Z)
rmse(assessment)

# Apply to data
ensemble(Z)
source

Training

The function train() is used to train a single neural estimator, while the wrapper function trainmultiple() is useful for training multiple neural estimators over a range of sample sizes, making using of the technique known as pre-training.

NeuralEstimators.trainFunction
train(estimator, sampler::Function, simulator::Function; ...)
train(estimator, θ_train::P, θ_val::P, simulator::Function; ...) where {P <: Union{AbstractMatrix, ParameterConfigurations}}
train(estimator, θ_train::P, θ_val::P, Z_train::T, Z_val::T; ...) where {T, P <: Union{AbstractMatrix, ParameterConfigurations}}

Trains a neural estimator.

The methods cater for different variants of "on-the-fly" simulation. Specifically, a sampler can be provided to continuously sample new parameter vectors from the prior, and a simulator can be provided to continuously simulate new data conditional on the parameters. If provided with specific sets of parameters (θ_train and θ_val) and/or data (Z_train and Z_val), they will be held fixed during training.

In all methods, the validation parameters and data are held fixed to reduce noise when evaluating the validation risk.

When the data are held fixed and the number of replicates in each element of Z_train is a multiple of the number of replicates in each element of Z_val, the training data will be recycled across epochs. For instance, if each element of Z_train contains 50 replicates and each element of Z_val contains 10 replicates, the first epoch will use the first 10 replicates of Z_train, the second epoch will use the next 10 replicates, and so on. Note that this recycling mechanism requires the data to be subsettable using subsetdata().

The estimator is returned on the CPU so that it can be easily saved post training.

Keyword arguments common to all methods:

  • loss = mae (applicable only to PointEstimator): loss function used to train the neural network. In addition to the standard loss functions provided by Flux (e.g., mae, mse), see Loss functions for further options.
  • epochs = 100: number of epochs to train the neural network. An epoch is one complete pass through the entire training data set when doing stochastic gradient descent.
  • batchsize = 32: the batchsize to use when performing stochastic gradient descent, that is, the number of training samples processed between each update of the neural-network parameters.
  • optimiser = Flux.setup(Adam(), estimator): any Optimisers.jl optimisation rule for updating the neural-network parameters. When the training data and/or parameters are held fixed, one may wish to employ regularisation to prevent overfitting; for example, optimiser = Flux.setup(OptimiserChain(WeightDecay(1e-4), Adam()), estimator), which corresponds to L₂ regularisation with penalty coefficient λ=10⁻⁴.
  • savepath::Union{String, Nothing} = nothing: path to save the trained estimator and other information; if nothing (default), nothing is saved. Otherwise, the neural-network parameters (i.e., the weights and biases) will be saved during training as bson files; the risk function evaluated over the training and validation sets will also be saved, in the first and second columns of loss_per_epoch.csv, respectively; the best parameters (as measured by validation risk) will be saved as best_network.bson.
  • stopping_epochs = 5: cease training if the risk doesn't improve in this number of epochs.
  • use_gpu = true: flag indicating whether to use a GPU if one is available.
  • verbose = true: flag indicating whether information, including empirical risk values and timings, should be printed to the console during training.

Keyword arguments common to train(estimator, sampler, simulator) and train(estimator, θ_train, θ_val, simulator):

  • m = nothing: arguments to the simulator (typically the number of replicates in each data set as an Integer or an Integer collection). The simulator is called as simulator(θ, m) if m is given and as simulator(θ) otherwise.
  • epochs_per_Z_refresh = 1: the number of passes to make through the training set before the training data are refreshed.
  • simulate_just_in_time = false: flag indicating whether we should simulate just-in-time, in the sense that only a batchsize number of parameter vectors and corresponding data are in memory at a given time.

Keyword arguments unique to train(estimator, sampler, simulator):

  • K = 10000: number of parameter vectors in the training set.
  • K_val = K ÷ 5 number of parameter vectors in the validation set.
  • ξ = nothing: an arbitrary collection of objects that, if provided, will be passed to the parameter sampler as sampler(K, ξ); otherwise, the parameter sampler will be called as sampler(K). Can also be provided as xi.
  • epochs_per_θ_refresh = 1: the number of passes to make through the training set before the training parameters are refreshed. Must be a multiple of epochs_per_Z_refresh. Can also be provided as epochs_per_theta_refresh.

Examples

using NeuralEstimators, Flux

# Data Z|μ,σ ~ N(μ, σ²) with priors μ ~ N(0, 1) and σ ~ U(0, 1)
function sampler(K)
	μ = randn(K) # Gaussian prior
	σ = rand(K)  # Uniform prior
	θ = vcat(μ', σ')
	return θ
end
function simulator(θ, m)
	[ϑ[1] .+ ϑ[2] * randn(1, m) for ϑ ∈ eachcol(θ)]
end

# Neural network 
d = 2     # dimension of the parameter vector θ
n = 1     # dimension of each independent replicate of Z
w = 128   # width of each hidden layer 
ψ = Chain(Dense(n, w, relu), Dense(w, w, relu))
ϕ = Chain(Dense(w, w, relu), Dense(w, d))
network = DeepSet(ψ, ϕ)

# Initialise the estimator
estimator = PointEstimator(network)

# Number of independent replicates to use during training
m = 15

# Training: simulation on-the-fly
estimator  = train(estimator, sampler, simulator, m = m, epochs = 5)

# Training: simulation on-the-fly with fixed parameters
K = 10000
θ_train = sampler(K)
θ_val   = sampler(K)
estimator = train(estimator, θ_train, θ_val, simulator, m = m, epochs = 5)

# Training: fixed parameters and fixed data
Z_train   = simulator(θ_train, m)
Z_val     = simulator(θ_val, m)
estimator = train(estimator, θ_train, θ_val, Z_train, Z_val, epochs = 5)
source
NeuralEstimators.trainmultipleFunction
trainmultiple(estimator, sampler::Function, simulator::Function, m::Vector{Integer}; ...)
trainmultiple(estimator, θ_train, θ_val, simulator::Function, m::Vector{Integer}; ...)
trainmultiple(estimator, θ_train, θ_val, Z_train, Z_val, m::Vector{Integer}; ...)
trainmultiple(estimator, θ_train, θ_val, Z_train::V, Z_val::V; ...) where {V <: AbstractVector{AbstractVector{Any}}}

A wrapper around train() to construct multiple neural estimators for different sample sizes m.

The positional argument m specifies the desired sample sizes. Each estimator is pre-trained with the estimator for the previous sample size (see Sainsbury-Dale at al., 2024, Sec 2.3.3). For example, if m = [m₁, m₂], the estimator for sample size m₂ is pre-trained with the estimator for sample size m₁.

The method for Z_train and Z_val subsets the data using subsetdata(Z, 1:mᵢ) for each mᵢ ∈ m. The method for Z_train::V and Z_val::V trains an estimator for each element of Z_train::V and Z_val::V and, hence, it does not need to invoke subsetdata(), which can be slow or difficult to define in some cases (e.g., for graphical data). Note that, in this case, m is inferred from the data.

The keyword arguments inherit from train(). The keyword arguments epochs, batchsize, stopping_epochs, and optimiser can each be given as vectors. For example, if training two estimators, one may use a different number of epochs for each estimator by providing epochs = [epoch₁, epoch₂].

The function returns a vector of neural estimators, each corresponding to a sample size in m.

See also PiecewiseEstimator.

source

Assessment/calibration

NeuralEstimators.assessFunction
assess(estimator, θ, Z; ...)
assess(estimators::Vector, θ, Z; ...)

Assesses an estimator (or a collection of estimators) based on true parameters θ and corresponding simulated data Z.

The parameters θ should be given as a $d$ × $K$ matrix, where $d$ is the parameter dimension and $K$ is the number of sampled parameter vectors.

The data Z should be a Vector, with each element representing a single simulated data set. If length(Z) is greater than $K$, θ will be recycled via horizontal concatenation: θ = repeat(θ, outer = (1, J)), where J = length(Z) ÷ K is the number of simulated data sets per parameter vector. This allows assessment of the estimator's sampling distribution under fixed parameters.

The return value is of type Assessment.

Keyword arguments

  • parameter_names::Vector{String}: names of the parameters (sensible defaults provided).
  • estimator_names::Vector{String}: names of the estimators (sensible defaults provided).
  • use_gpu = true: Bool or collection of Bool objects with length equal to the number of estimators.
  • probs = nothing (applicable only to PointEstimator): probability levels taking values between 0 and 1. By default, no bootstrap uncertainty quantification is done; if probs is provided, it must be a two-element vector specifying the lower and upper probability levels for the non-parametric bootstrap intervals (note that parametric bootstrap is not currently supported with assess()).
  • B::Integer = 400 (applicable only to PointEstimator): number of bootstrap samples.
source
NeuralEstimators.AssessmentType
Assessment(df::DataFrame, runtime::DataFrame)

A type for storing the output of assess(). The field runtime contains the total time taken for each estimator. The field df is a long-form DataFrame with columns:

  • estimator: the name of the estimator
  • parameter: the name of the parameter
  • truth: the true value of the parameter
  • estimate: the estimated value of the parameter
  • m: the sample size (number of iid replicates) for the given data set
  • k: the index of the parameter vector
  • j: the index of the data set (in the case that multiple data sets are associated with each parameter vector)

If the estimator is an IntervalEstimator, the column estimate will be replaced by the columns lower and upper, containing the lower and upper bounds of the interval, respectively.

If the estimator is a QuantileEstimator, there will also be a column prob indicating the probability level of the corresponding quantile estimate.

Use merge() to combine assessments from multiple estimators of the same type or join() to combine assessments from a PointEstimator and an IntervalEstimator.

source
NeuralEstimators.riskFunction
risk(assessment::Assessment; ...)

Computes a Monte Carlo approximation of an estimator's Bayes risk,

\[r(\hat{\boldsymbol{\theta}}(\cdot)) \approx \frac{1}{K} \sum_{k=1}^K L(\boldsymbol{\theta}^{(k)}, \hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)})),\]

where $\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}$ denotes a set of $K$ parameter vectors sampled from the prior and, for each $k$, data $\boldsymbol{Z}^{(k)}$ are simulated from the statistical model conditional on $\boldsymbol{\theta}^{(k)}$.

Keyword arguments

  • loss = (x, y) -> abs(x - y): a binary operator defining the loss function (default absolute-error loss).
  • average_over_parameters::Bool = false: if true, the loss is averaged over all parameters; otherwise (default), the loss is averaged over each parameter separately.
  • average_over_sample_sizes::Bool = true: if true (default), the loss is averaged over all sample sizes $m$; otherwise, the loss is averaged over each sample size separately.
source
NeuralEstimators.biasFunction
bias(assessment::Assessment; ...)

Computes a Monte Carlo approximation of an estimator's bias,

\[{\textrm{bias}}(\hat{\boldsymbol{\theta}}(\cdot)) \approx \frac{1}{K} \sum_{k=1}^K \{\hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)}) - \boldsymbol{\theta}^{(k)}\},\]

where $\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}$ denotes a set of $K$ parameter vectors sampled from the prior and, for each $k$, data $\boldsymbol{Z}^{(k)}$ are simulated from the statistical model conditional on $\boldsymbol{\theta}^{(k)}$.

This function inherits the keyword arguments of risk (excluding the argument loss).

source
NeuralEstimators.rmseFunction
rmse(assessment::Assessment; ...)

Computes a Monte Carlo approximation of an estimator's root-mean-squared error,

\[{\textrm{rmse}}(\hat{\boldsymbol{\theta}}(\cdot)) \approx \sqrt{\frac{1}{K} \sum_{k=1}^K \{\hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)}) - \boldsymbol{\theta}^{(k)}\}^2},\]

where $\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}$ denotes a set of $K$ parameter vectors sampled from the prior and, for each $k$, data $\boldsymbol{Z}^{(k)}$ are simulated from the statistical model conditional on $\boldsymbol{\theta}^{(k)}$.

This function inherits the keyword arguments of risk (excluding the argument loss).

source
NeuralEstimators.coverageFunction
coverage(assessment::Assessment; ...)

Computes a Monte Carlo approximation of an interval estimator's expected coverage, as defined in Hermans et al. (2022, Definition 2.1), and the proportion of parameters below and above the lower and upper bounds, respectively.

Keyword arguments

  • average_over_parameters::Bool = false: if true, the coverage is averaged over all parameters; otherwise (default), it is computed over each parameter separately.
  • average_over_sample_sizes::Bool = true: if true (default), the coverage is averaged over all sample sizes $m$; otherwise, it is computed over each sample size separately.
source

Inference with observed data

The following functions facilitate the use of a trained neural estimator with observed data.

NeuralEstimators.estimateFunction
estimate(estimator, Z; batchsize::Integer = 32, use_gpu::Bool = true, kwargs...)

Applies estimator to batches of Z of size batchsize, which can prevent memory issues that can occur with large data sets.

Batching will only be done if there are multiple data sets in Z, which will be inferred by Z being a vector, or a tuple whose first element is a vector.

source
NeuralEstimators.bootstrapFunction
bootstrap(estimator::PointEstimator, parameters::P, Z; use_gpu = true) where P <: Union{AbstractMatrix, ParameterConfigurations}
bootstrap(estimator::PointEstimator, parameters::P, simulator, m::Integer; B = 400, use_gpu = true) where P <: Union{AbstractMatrix, ParameterConfigurations}
bootstrap(estimator::PointEstimator, Z; B = 400, blocks = nothing, trim = true, use_gpu = true)

Generates B bootstrap estimates using estimator.

Parametric bootstrapping is facilitated by passing a single parameter configuration, parameters, and corresponding simulated data, Z, whose length implicitly defines B. Alternatively, one may provide a simulator and the desired sample size, in which case the data will be simulated using simulator(parameters, m).

Non-parametric bootstrapping is facilitated by passing a single data set, Z. The argument blocks caters for block bootstrapping, and it should be a vector of integers specifying the block for each replicate. For example, with 5 replicates, the first two corresponding to block 1 and the remaining three corresponding to block 2, blocks should be [1, 1, 2, 2, 2]. The resampling algorithm generates resampled data sets by sampling blocks with replacement. If trim = true, the final block is trimmed as needed to ensure that the resampled data set matches the original size of Z.

The return type is a $d$ × B matrix, where $d$ is the dimension of the parameter vector.

source
NeuralEstimators.intervalFunction
interval(θ::Matrix; probs = [0.05, 0.95], parameter_names = nothing)
interval(estimator::IntervalEstimator, Z; parameter_names = nothing, use_gpu = true)

Computes a confidence/credible interval based either on a $d$ × $B$ matrix θ of parameters (typically containing bootstrap estimates or posterior draws), where $d$ denotes the number of parameters to make inference on, or from an IntervalEstimator and data Z.

When given θ, the intervals are constructed by computing quantiles with probability levels controlled by the keyword argument probs.

The return type is a $d$ × 2 matrix, whose first and second columns respectively contain the lower and upper bounds of the interval. The rows of this matrix can be named by passing a vector of strings to the keyword argument parameter_names.

source
NeuralEstimators.sampleposteriorFunction
sampleposterior(estimator::PosteriorEstimator, Z, N::Integer = 1000)
sampleposterior(estimator::RatioEstimator, Z, N::Integer = 1000; θ_grid, prior::Function = θ -> 1f0)

Samples from the approximate posterior distribution implied by estimator.

The positional argument N controls the size of the posterior sample.

When sampling based on a RatioEstimator, the sampling algorithm is based on a fine-gridding of the parameter space, specified through the keyword argument θ_grid (or theta_grid). The approximate posterior density is evaluated over this grid, which is then used to draw samples. This is very effective when making inference with a small number of parameters. For models with a large number of parameters, other sampling algorithms may be needed (please feel free to contact the package maintainer for discussion). The prior distribution $p(\boldsymbol{\theta})$ is controlled through the keyword argument prior (by default, a uniform prior is used).

source
NeuralEstimators.posteriormeanFunction
posteriormean(θ::AbstractMatrix)	
posteriormean(estimator::Union{PosteriorEstimator, RatioEstimator}, Z, N::Integer = 1000; kwargs...)

Computes the posterior mean based either on a $d$ × $N$ matrix θ of posterior draws, where $d$ denotes the number of parameters to make inference on, or directly from an estimator that allows for posterior sampling via sampleposterior().

See also posteriormedian(), posteriormode().

source
NeuralEstimators.posteriormedianFunction
posteriormedian(θ::AbstractMatrix)	
posteriormedian(estimator::Union{PosteriorEstimator, RatioEstimator}, Z, N::Integer = 1000; kwargs...)

Computes the vector of marginal posterior medians based either on a $d$ × $N$ matrix θ of posterior draws, where $d$ denotes the number of parameters to make inference on, or directly from an estimator that allows for posterior sampling via sampleposterior().

See also posteriormean(), posteriorquantile().

source
NeuralEstimators.posteriorquantileFunction
posteriorquantile(θ::AbstractMatrix, probs)	
posteriormedian(estimator::Union{PosteriorEstimator, RatioEstimator}, Z, probs, N::Integer = 1000; kwargs...)

Computes the vector of marginal posterior quantiles with (a collection of) probability levels probs, based either on a $d$ × $N$ matrix θ of posterior draws, where $d$ denotes the number of parameters to make inference on, or directly from an estimator that allows for posterior sampling via sampleposterior().

The return value is a $d$ × length(probs) matrix.

See also posteriormedian().

source
NeuralEstimators.posteriormodeFunction
posteriormode(estimator::RatioEstimator, Z; θ₀ = nothing, θ_grid = nothing, prior::Function = θ -> 1, use_gpu = true)

Computes the (approximate) posterior mode (maximum a posteriori estimate) given data $\boldsymbol{Z}$,

\[\underset{\boldsymbol{\theta}}{\mathrm{arg\,max\;}} \ell(\boldsymbol{\theta} ; \boldsymbol{Z}) + \log p(\boldsymbol{\theta}),\]

where $\ell(\cdot ; \cdot)$ denotes the approximate log-likelihood function implied by estimator, and $p(\boldsymbol{\theta})$ denotes the prior density function controlled through the keyword argument prior. Note that this estimate can be viewed as an approximate maximum penalised likelihood estimate, with penalty term $p(\boldsymbol{\theta})$.

If a vector θ₀ of initial parameter estimates is given, the approximate posterior density is maximised by gradient descent (requires Optim.jl to be loaded). Otherwise, if a matrix of parameters θ_grid is given, the approximate posterior density is maximised by grid search.

See also posteriormedian(), posteriormean().

source