Core
This page documents the classes and functions that are central to the workflow of NeuralEstimators
. Its organisation reflects the order in which these classes and functions appear in a standard implementation: from sampling parameters from the prior distribution, to making inference with observed data.
Sampling parameters
Parameters sampled from the prior distribution are stored as a $d \times K$ matrix, where $d$ is the dimension of the parameter vector to make inference on and $K$ is the number of sampled parameter vectors.
It can sometimes be helpful to wrap the parameter matrix in a user-defined type that also stores expensive intermediate objects needed for data simulated (e.g., Cholesky factors). The user-defined type should be a subtype of ParameterConfigurations
, whose only requirement is a field θ
that stores the matrix of parameters. See Storing expensive intermediate objects for data simulation for further discussion.
NeuralEstimators.ParameterConfigurations
— TypeParameterConfigurations
An abstract supertype for user-defined types that store parameters and any intermediate objects needed for data simulation.
The user-defined type must have a field θ
that stores the $d$ × $K$ matrix of parameters, where $d$ is the dimension of the parameter vector to make inference on and $K$ is the number of sampled parameter vectors. There are no other requirements.
See subsetparameters()
for the generic function for subsetting these objects.
Examples
struct P <: ParameterConfigurations
θ
# other expensive intermediate objects...
end
Simulating data
The package accommodates any model for which simulation is feasible by allowing users to define their model implicitly through simulated data.
The data are stored as a Vector{A}
, where each element of the vector is associated with one parameter vector, and the subtype A
depends on the multivariate structure of the data. Common formats include:
- Unstructured data:
A
is typically an $n \times m$ matrix, where:- $n$ is the dimension of each replicate (e.g., $n=1$ for univariate data, $n=2$ for bivariate data).
- $m$ is the number of independent replicates in each data set ($m$ is allowed to vary between data sets).
- Data collected over a regular grid:
A
is typically an ($N + 2$)-dimensional array, where:- The first $N$ dimensions correspond to the dimensions of the grid (e.g., $N = 1$ for time series, $N = 2$ for two-dimensional spatial grids).
- The penultimate dimension stores the so-called "channels" (e.g., singleton for univariate processes, two for bivariate processes).
- The final dimension stores the $m$ independent replicates.
- Spatial data collected over irregular locations:
A
is typically aGNNGraph
, where independent replicates (possibly with differing spatial locations) are stored as subgraphs. See the helper functionspatialgraph()
for constructing these graphs from matrices of spatial locations and data.
While the formats above cover many applications, the package is flexible: the data structure simply needs to align with the chosen neural-network architecture.
Estimators
The package provides several classes of neural estimators, organised within a type hierarchy. At the top-level of the hierarchy is NeuralEstimator
, an abstract supertype for all neural estimators in the package.
Neural Bayes estimators are implemented as subtypes of the abstract supertype BayesEstimator
. The simple type PointEstimator
is used for constructing neural Bayes estimators under general, user-defined loss functions. Several specialised types cater for the estimation of posterior quantiles based on the quantile loss function: see IntervalEstimator
and its generalisation QuantileEstimator
for estimating posterior quantiles for a fixed set of probability levels.
The type PosteriorEstimator
can be used to approximate the posterior distribution, and RatioEstimator
can be used to approximate the likelihood-to-evidence ratio.
Several types serve as wrappers around the aforementioned estimators, enhancing their functionality. PiecewiseEstimator
applies different estimators based on the sample size of the data (see the discussion on Variable sample sizes). Finally, Ensemble
combines multiple estimators, aggregating their individual estimates to improve accuracy.
NeuralEstimators.NeuralEstimator
— TypeNeuralEstimator
An abstract supertype for all neural estimators.
NeuralEstimators.BayesEstimator
— TypeBayesEstimator <: NeuralEstimator
An abstract supertype for neural Bayes estimators.
NeuralEstimators.PointEstimator
— TypePointEstimator <: BayesEstimator
PointEstimator(network)
A neural point estimator, where the neural network
is a mapping from the sample space to the parameter space.
NeuralEstimators.PosteriorEstimator
— TypePosteriorEstimator <: NeuralEstimator
PosteriorEstimator(q::ApproximateDistribution, network)
A neural estimator that approximates the posterior distribution $p(\boldsymbol{\theta} \mid \boldsymbol{Z})$, based on a neural network
and an approximate distribution q
(see the available in-built Approximate distributions).
The neural network
is a mapping from the sample space to a space determined by the chosen approximate distribution q
. Often, the output space is the space $\mathcal{K}$ of the approximate-distribution parameters $\boldsymbol{\kappa}$. However, for certain distributions (notably, NormalisingFlow
), the neural network outputs summary statistics of suitable dimension (e.g., the dimension $d$ of the parameter vector), which are then transformed into parameters of the approximate distribution using conventional multilayer perceptrons (see NormalisingFlow
).
Examples
using NeuralEstimators, Flux
# Data Z|μ,σ ~ N(μ, σ²) with priors μ ~ U(0, 1) and σ ~ U(0, 1)
d = 2 # dimension of the parameter vector θ
n = 1 # dimension of each independent replicate of Z
m = 30 # number of independent replicates in each data set
sample(K) = rand32(d, K)
simulate(θ, m) = [ϑ[1] .+ ϑ[2] .* randn32(n, m) for ϑ in eachcol(θ)]
# Distribution used to approximate the posterior
q = NormalisingFlow(d, d)
# Neural network (outputs d summary statistics)
w = 128
ψ = Chain(Dense(n, w, relu), Dense(w, w, relu), Dense(w, w, relu))
ϕ = Chain(Dense(w, w, relu), Dense(w, w, relu), Dense(w, d))
network = DeepSet(ψ, ϕ)
# Initialise the estimator
estimator = PosteriorEstimator(q, network)
# Train the estimator
estimator = train(estimator, sample, simulate, m = m)
# Inference with observed data
θ = [0.8f0; 0.1f0]
Z = simulate(θ, m)
sampleposterior(estimator, Z) # posterior draws
posteriormean(estimator, Z) # point estimate
NeuralEstimators.RatioEstimator
— TypeRatioEstimator <: NeuralEstimator
RatioEstimator(network)
A neural estimator that estimates the likelihood-to-evidence ratio,
\[r(\boldsymbol{Z}, \boldsymbol{\theta}) \equiv p(\boldsymbol{Z} \mid \boldsymbol{\theta})/p(\boldsymbol{Z}),\]
where $p(\boldsymbol{Z} \mid \boldsymbol{\theta})$ is the likelihood and $p(\boldsymbol{Z})$ is the marginal likelihood, also known as the model evidence.
For numerical stability, training is done on the log-scale using the relation $\log r(\boldsymbol{Z}, \boldsymbol{\theta}) = \text{logit}(c^*(\boldsymbol{Z}, \boldsymbol{\theta}))$, where $c^*(\cdot, \cdot)$ denotes the Bayes classifier as described in the Methodology section. Hence, the neural network
should be a mapping from $\mathcal{Z} \times \Theta$ to $\mathbb{R}$, where $\mathcal{Z}$ and $\Theta$ denote the sample and parameter spaces, respectively.
When the neural network is a DeepSet
, two requirements must be met. First, the number of input neurons in the first layer of the outer network must equal $d$ plus the number of output neurons in the final layer of the inner network. Second, the number of output neurons in the final layer of the outer network must be one.
When applying the estimator to data Z
, by default the likelihood-to-evidence ratio $r(\boldsymbol{Z}, \boldsymbol{\theta})$ is returned (setting the keyword argument classifier = true
will yield class probability estimates). The estimated ratio can then be used in various Bayesian (e.g., Hermans et al., 2020) or frequentist (e.g., Walchessen et al., 2024) inferential algorithms.
Examples
using NeuralEstimators, Flux
# Data Z|μ,σ ~ N(μ, σ²) with priors μ ~ U(0, 1) and σ ~ U(0, 1)
d = 2 # dimension of the parameter vector θ
n = 1 # dimension of each independent replicate of Z
m = 30 # number of independent replicates in each data set
sample(K) = rand32(d, K)
simulate(θ, m) = [ϑ[1] .+ ϑ[2] .* randn32(n, m) for ϑ in eachcol(θ)]
# Neural network
w = 128
ψ = Chain(Dense(n, w, relu), Dense(w, w, relu), Dense(w, w, relu))
ϕ = Chain(Dense(w + d, w, relu), Dense(w, w, relu), Dense(w, 1))
network = DeepSet(ψ, ϕ)
# Initialise the estimator
r̂ = RatioEstimator(network)
# Train the estimator
r̂ = train(r̂, sample, simulate, m = m)
# Inference with "observed" data (grid-based optimisation and sampling)
θ = sample(1)
z = simulate(θ, m)[1]
θ_grid = f32(expandgrid(0:0.01:1, 0:0.01:1))' # fine gridding of the parameter space
r̂(z, θ_grid) # likelihood-to-evidence ratios over grid
mlestimate(r̂, z; θ_grid = θ_grid) # maximum-likelihood estimate
posteriormode(r̂, z; θ_grid = θ_grid) # posterior mode
sampleposterior(r̂, z; θ_grid = θ_grid) # posterior samples
# Inference with "observed" data (gradient-based optimisation using Optim.jl)
using Optim
θ₀ = [0.5, 0.5] # initial estimate
mlestimate(r̂, z; θ₀ = θ₀) # maximum-likelihood estimate
posteriormode(r̂, z; θ₀ = θ₀) # posterior mode
NeuralEstimators.IntervalEstimator
— TypeIntervalEstimator <: BayesEstimator
IntervalEstimator(u, v = u, c::Union{Function, Compress} = identity; probs = [0.025, 0.975], g = exp)
IntervalEstimator(u, c::Union{Function, Compress}; probs = [0.025, 0.975], g = exp)
A neural estimator that jointly estimates marginal posterior credible intervals based on the probability levels probs
(by default, 95% central credible intervals).
The estimator employs a representation that prevents quantile crossing. Specifically, given data $\boldsymbol{Z}$, it constructs intervals for each parameter $\theta_i$, $i = 1, \dots, d,$ of the form,
\[[c_i(u_i(\boldsymbol{Z})), \;\; c_i(u_i(\boldsymbol{Z})) + g(v_i(\boldsymbol{Z})))],\]
where $\boldsymbol{u}(⋅) \equiv (u_1(\cdot), \dots, u_d(\cdot))'$ and $\boldsymbol{v}(⋅) \equiv (v_1(\cdot), \dots, v_d(\cdot))'$ are neural networks that map from the sample space to $\mathbb{R}^d$; $g(\cdot)$ is a monotonically increasing function (e.g., exponential or softplus); and each $c_i(⋅)$ is a monotonically increasing function that maps its input to the prior support of $\theta_i$.
The functions $c_i(⋅)$ may be collectively defined by a $d$-dimensional Compress
object, which can constrain the interval estimator's output to the prior support. If these functions are unspecified, they will be set to the identity function so that the range of the intervals will be unrestricted. If only a single neural-network architecture is provided, it will be used for both $\boldsymbol{u}(⋅)$ and $\boldsymbol{v}(⋅)$.
The return value when applied to data using estimate
() is a matrix with $2d$ rows, where the first and second $d$ rows correspond to the lower and upper bounds, respectively. The function interval()
can be used to format this output in a readable $d$ × 2 matrix.
See also QuantileEstimator
.
Examples
using NeuralEstimators, Flux
# Data Z|μ,σ ~ N(μ, σ²) with priors μ ~ U(0, 1) and σ ~ U(0, 1)
d = 2 # dimension of the parameter vector θ
n = 1 # dimension of each independent replicate of Z
m = 100 # number of independent replicates
sample(K) = rand32(d, K)
simulate(θ, m) = [ϑ[1] .+ ϑ[2] .* randn(n, m) for ϑ in eachcol(θ)]
# Neural network
w = 128 # width of each hidden layer
ψ = Chain(Dense(n, w, relu), Dense(w, w, relu))
ϕ = Chain(Dense(w, w, relu), Dense(w, d))
u = DeepSet(ψ, ϕ)
# Initialise the estimator
estimator = IntervalEstimator(u)
# Train the estimator
estimator = train(estimator, sample, simulate, m = m)
# Inference with "observed" data
θ = [0.8f0; 0.1f0]
Z = simulate(θ, m)
estimate(estimator, Z)
interval(estimator, Z)
NeuralEstimators.QuantileEstimator
— TypeQuantileEstimator <: BayesEstimator
QuantileEstimator(v; probs = [0.025, 0.5, 0.975], g = Flux.softplus, i = nothing)
A neural estimator that jointly estimates a fixed set of marginal posterior quantiles, with probability levels $\{\tau_1, \dots, \tau_T\}$ controlled by the keyword argument probs
. This generalises IntervalEstimator
to support an arbitrary number of probability levels.
Given data $\boldsymbol{Z}$, by default the estimator approximates quantiles of the distributions of
\[\theta_i \mid \boldsymbol{Z}, \quad i = 1, \dots, d, \]
for parameters $\boldsymbol{\theta} \equiv (\theta_1, \dots, \theta_d)'$. Alternatively, if initialised with i
set to a positive integer, the estimator approximates quantiles of the full conditional distribution of
\[\theta_i \mid \boldsymbol{Z}, \boldsymbol{\theta}_{-i},\]
where $\boldsymbol{\theta}_{-i}$ denotes the parameter vector with its $i$th element removed.
The estimator employs a representation that prevents quantile crossing, namely,
\[\begin{aligned} \boldsymbol{q}^{(\tau_1)}(\boldsymbol{Z}) &= \boldsymbol{v}^{(\tau_1)}(\boldsymbol{Z}),\\ \boldsymbol{q}^{(\tau_t)}(\boldsymbol{Z}) &= \boldsymbol{v}^{(\tau_1)}(\boldsymbol{Z}) + \sum_{j=2}^t g(\boldsymbol{v}^{(\tau_j)}(\boldsymbol{Z})), \quad t = 2, \dots, T, \end{aligned}\]
where $\boldsymbol{q}^{(\tau)}(\boldsymbol{Z})$ denotes the vector of $\tau$-quantiles for parameters $\boldsymbol{\theta} \equiv (\theta_1, \dots, \theta_d)'$; $\boldsymbol{v}^{(\tau_t)}(\cdot)$, $t = 1, \dots, T$, are neural networks that map from the sample space to $\mathbb{R}^d$; and $g(\cdot)$ is a monotonically increasing function (e.g., exponential or softplus) applied elementwise to its arguments. If g = nothing
, the quantiles are estimated independently through the representation
\[\boldsymbol{q}^{(\tau_t)}(\boldsymbol{Z}) = \boldsymbol{v}^{(\tau_t)}(\boldsymbol{Z}), \quad t = 1, \dots, T.\]
When the neural networks are DeepSet
objects, two requirements must be met. First, the number of input neurons in the first layer of the outer network must equal the number of neurons in the final layer of the inner network plus $\text{dim}(\boldsymbol{\theta}_{-i})$, where we define $\text{dim}(\boldsymbol{\theta}_{-i}) \equiv 0$ when targetting marginal posteriors of the form $\theta_i \mid \boldsymbol{Z}$ (the default behaviour). Second, the number of output neurons in the final layer of the outer network must equal $d - \text{dim}(\boldsymbol{\theta}_{-i})$.
The return value is a matrix with $\{d - \text{dim}(\boldsymbol{\theta}_{-i})\} \times T$ rows, where the first $T$ rows correspond to the estimated quantiles for the first parameter, the second $T$ rows corresponds to the estimated quantiles for the second parameter, and so on.
Examples
using NeuralEstimators, Flux
# Data Z|μ,σ ~ N(μ, σ²) with priors μ ~ U(0, 1) and σ ~ U(0, 1)
d = 2 # dimension of the parameter vector θ
n = 1 # dimension of each independent replicate of Z
m = 30 # number of independent replicates in each data set
sample(K) = rand32(d, K)
simulate(θ, m) = [ϑ[1] .+ ϑ[2] .* randn32(n, m) for ϑ in eachcol(θ)]
# ---- Quantiles of θᵢ ∣ 𝐙, i = 1, …, d ----
# Neural network
w = 64 # width of each hidden layer
ψ = Chain(Dense(n, w, relu), Dense(w, w, relu))
ϕ = Chain(Dense(w, w, relu), Dense(w, d))
v = DeepSet(ψ, ϕ)
# Initialise the estimator
estimator = QuantileEstimator(v)
# Train the estimator
estimator = train(estimator, sample, simulate, m = m)
# Inference with "observed" data
θ = [0.8f0; 0.1f0]
Z = simulate(θ, m)
estimate(estimator, Z)
# ---- Quantiles of θᵢ ∣ 𝐙, θ₋ᵢ ----
# Neural network
w = 64 # width of each hidden layer
ψ = Chain(Dense(n, w, relu), Dense(w, w, relu))
ϕ = Chain(Dense(w + 1, w, relu), Dense(w, d - 1))
v = DeepSet(ψ, ϕ)
# Initialise estimators respectively targetting quantiles of μ∣Z,σ and σ∣Z,μ
q₁ = QuantileEstimator(v; i = 1)
q₂ = QuantileEstimator(v; i = 2)
# Train the estimators
q₁ = train(q₁, sample, simulate, m = m)
q₂ = train(q₂, sample, simulate, m = m)
# Estimate quantiles of μ∣Z,σ with σ = 0.5 and for many data sets
θ₋ᵢ = 0.5f0
q₁(Z, θ₋ᵢ)
# Estimate quantiles of μ∣Z,σ with σ = 0.5 for a single data set
q₁(Z[1], θ₋ᵢ)
NeuralEstimators.PiecewiseEstimator
— TypePiecewiseEstimator <: NeuralEstimator
PiecewiseEstimator(estimators, changepoints)
Creates a piecewise estimator (Sainsbury-Dale et al., 2024, Sec. 2.2.2) from a collection of estimators
and sample-size changepoints
.
Specifically, with $l$ estimators and sample-size changepoints $m_1 < m_2 < \dots < m_{l-1}$, the piecewise etimator takes the form,
\[\hat{\boldsymbol{\theta}}(\boldsymbol{Z}) = \begin{cases} \hat{\boldsymbol{\theta}}_1(\boldsymbol{Z}) & m \leq m_1,\\ \hat{\boldsymbol{\theta}}_2(\boldsymbol{Z}) & m_1 < m \leq m_2,\\ \quad \vdots \\ \hat{\boldsymbol{\theta}}_l(\boldsymbol{Z}) & m > m_{l-1}. \end{cases}\]
For example, given an estimator $\hat{\boldsymbol{\theta}}_1(\cdot)$ trained for small sample sizes (e.g., $m \leq 30$) and an estimator $\hat{\boldsymbol{\theta}}_2(\cdot)$ trained for moderate-to-large sample sizes (e.g., $m > 30$), one may construct a PiecewiseEstimator
that dispatches $\hat{\boldsymbol{\theta}}_1(\cdot)$ if $m \leq 30$ and $\hat{\boldsymbol{\theta}}_2(\cdot)$ otherwise.
See also trainmultiple()
.
Examples
using NeuralEstimators, Flux
n = 2 # bivariate data
d = 3 # dimension of parameter vector
w = 128 # width of each hidden layer
# Small-sample estimator
ψ₁ = Chain(Dense(n, w, relu), Dense(w, w, relu));
ϕ₁ = Chain(Dense(w, w, relu), Dense(w, d));
θ̂₁ = PointEstimator(DeepSet(ψ₁, ϕ₁))
# Large-sample estimator
ψ₂ = Chain(Dense(n, w, relu), Dense(w, w, relu));
ϕ₂ = Chain(Dense(w, w, relu), Dense(w, d));
θ̂₂ = PointEstimator(DeepSet(ψ₂, ϕ₂))
# Piecewise estimator with changepoint m=30
θ̂ = PiecewiseEstimator([θ̂₁, θ̂₂], 30)
# Apply the (untrained) piecewise estimator to data
Z = [rand(n, m) for m ∈ (10, 50)]
estimate(θ̂, Z)
NeuralEstimators.Ensemble
— TypeEnsemble <: NeuralEstimator
Ensemble(estimators)
Ensemble(architecture::Function, J::Integer)
(ensemble::Ensemble)(Z; aggr = median)
Defines an ensemble of estimators
which, when applied to data Z
, returns the median (or another summary defined by aggr
) of the individual estimates (see, e.g., Sainsbury-Dale et al., 2025, Sec. S3).
The ensemble can be initialised with a collection of trained estimators
and then applied immediately to observed data. Alternatively, the ensemble can be initialised with a collection of untrained estimators
(or a function defining the architecture of each estimator, and the number of estimators in the ensemble), trained with train()
, and then applied to observed data. In the latter case, where the ensemble is trained directly, if savepath
is specified both the ensemble and component estimators will be saved.
Note that train()
currently acts sequentially on the component estimators.
The ensemble components can be accessed by indexing the ensemble; the number of component estimators can be obtained using length()
.
See also Parallel
, which can be used to mimic ensemble methods with an appropriately chosen connection
.
Examples
using NeuralEstimators, Flux
# Data Z|θ ~ N(θ, 1) with θ ~ N(0, 1)
d = 1 # dimension of the parameter vector θ
n = 1 # dimension of each independent replicate of Z
m = 30 # number of independent replicates in each data set
sampler(K) = randn32(d, K)
simulator(θ, m) = [μ .+ randn32(n, m) for μ ∈ eachcol(θ)]
# Neural-network architecture of each ensemble component
function architecture()
ψ = Chain(Dense(n, 64, relu), Dense(64, 64, relu))
ϕ = Chain(Dense(64, 64, relu), Dense(64, d))
network = DeepSet(ψ, ϕ)
PointEstimator(network)
end
# Initialise ensemble with three component estimators
ensemble = Ensemble(architecture, 3)
ensemble[1] # access component estimators by indexing
ensemble[1:2] # indexing with an iterable collection returns the corresponding ensemble
length(ensemble) # number of component estimators
# Training
ensemble = train(ensemble, sampler, simulator, m = m, epochs = 5)
# Assessment
θ = sampler(1000)
Z = simulator(θ, m)
assessment = assess(ensemble, θ, Z)
rmse(assessment)
# Apply to data
ensemble(Z)
Training
The function train()
is used to train a single neural estimator, while the wrapper function trainmultiple()
is useful for training multiple neural estimators over a range of sample sizes, making using of the technique known as pre-training.
NeuralEstimators.train
— Functiontrain(estimator, sampler::Function, simulator::Function; ...)
train(estimator, θ_train::P, θ_val::P, simulator::Function; ...) where {P <: Union{AbstractMatrix, ParameterConfigurations}}
train(estimator, θ_train::P, θ_val::P, Z_train::T, Z_val::T; ...) where {T, P <: Union{AbstractMatrix, ParameterConfigurations}}
Trains a neural estimator
.
The methods cater for different variants of "on-the-fly" simulation. Specifically, a sampler
can be provided to continuously sample new parameter vectors from the prior, and a simulator
can be provided to continuously simulate new data conditional on the parameters. If provided with specific sets of parameters (θ_train
and θ_val
) and/or data (Z_train
and Z_val
), they will be held fixed during training.
In all methods, the validation parameters and data are held fixed to reduce noise when evaluating the validation risk.
When the data are held fixed and the number of replicates in each element of Z_train
is a multiple of the number of replicates in each element of Z_val
, the training data will be recycled across epochs. For instance, if each element of Z_train
contains 50 replicates and each element of Z_val
contains 10 replicates, the first epoch will use the first 10 replicates of Z_train
, the second epoch will use the next 10 replicates, and so on. Note that this recycling mechanism requires the data to be subsettable using subsetdata()
.
The estimator is returned on the CPU so that it can be easily saved post training.
Keyword arguments common to all methods:
loss = mae
(applicable only toPointEstimator
): loss function used to train the neural network. In addition to the standard loss functions provided byFlux
(e.g.,mae
,mse
), see Loss functions for further options.epochs = 100
: number of epochs to train the neural network. An epoch is one complete pass through the entire training data set when doing stochastic gradient descent.batchsize = 32
: the batchsize to use when performing stochastic gradient descent, that is, the number of training samples processed between each update of the neural-network parameters.optimiser = Flux.setup(Adam(), estimator)
: any Optimisers.jl optimisation rule for updating the neural-network parameters. When the training data and/or parameters are held fixed, one may wish to employ regularisation to prevent overfitting; for example,optimiser = Flux.setup(OptimiserChain(WeightDecay(1e-4), Adam()), estimator)
, which corresponds to L₂ regularisation with penalty coefficient λ=10⁻⁴.savepath::Union{String, Nothing} = nothing
: path to save the trained estimator and other information; ifnothing
(default), nothing is saved. Otherwise, the neural-network parameters (i.e., the weights and biases) will be saved during training asbson
files; the risk function evaluated over the training and validation sets will also be saved, in the first and second columns ofloss_per_epoch.csv
, respectively; the best parameters (as measured by validation risk) will be saved asbest_network.bson
.stopping_epochs = 5
: cease training if the risk doesn't improve in this number of epochs.use_gpu = true
: flag indicating whether to use a GPU if one is available.verbose = true
: flag indicating whether information, including empirical risk values and timings, should be printed to the console during training.
Keyword arguments common to train(estimator, sampler, simulator)
and train(estimator, θ_train, θ_val, simulator)
:
m = nothing
: arguments to the simulator (typically the number of replicates in each data set as anInteger
or anInteger
collection). The simulator is called assimulator(θ, m)
ifm
is given and assimulator(θ)
otherwise.epochs_per_Z_refresh = 1
: the number of passes to make through the training set before the training data are refreshed.simulate_just_in_time = false
: flag indicating whether we should simulate just-in-time, in the sense that only abatchsize
number of parameter vectors and corresponding data are in memory at a given time.
Keyword arguments unique to train(estimator, sampler, simulator)
:
K = 10000
: number of parameter vectors in the training set.K_val = K ÷ 5
number of parameter vectors in the validation set.ξ = nothing
: an arbitrary collection of objects that, if provided, will be passed to the parameter sampler assampler(K, ξ)
; otherwise, the parameter sampler will be called assampler(K)
. Can also be provided asxi
.epochs_per_θ_refresh = 1
: the number of passes to make through the training set before the training parameters are refreshed. Must be a multiple ofepochs_per_Z_refresh
. Can also be provided asepochs_per_theta_refresh
.
Examples
using NeuralEstimators, Flux
# Data Z|μ,σ ~ N(μ, σ²) with priors μ ~ N(0, 1) and σ ~ U(0, 1)
function sampler(K)
μ = randn(K) # Gaussian prior
σ = rand(K) # Uniform prior
θ = vcat(μ', σ')
return θ
end
function simulator(θ, m)
[ϑ[1] .+ ϑ[2] * randn(1, m) for ϑ ∈ eachcol(θ)]
end
# Neural network
d = 2 # dimension of the parameter vector θ
n = 1 # dimension of each independent replicate of Z
w = 128 # width of each hidden layer
ψ = Chain(Dense(n, w, relu), Dense(w, w, relu))
ϕ = Chain(Dense(w, w, relu), Dense(w, d))
network = DeepSet(ψ, ϕ)
# Initialise the estimator
estimator = PointEstimator(network)
# Number of independent replicates to use during training
m = 15
# Training: simulation on-the-fly
estimator = train(estimator, sampler, simulator, m = m, epochs = 5)
# Training: simulation on-the-fly with fixed parameters
K = 10000
θ_train = sampler(K)
θ_val = sampler(K)
estimator = train(estimator, θ_train, θ_val, simulator, m = m, epochs = 5)
# Training: fixed parameters and fixed data
Z_train = simulator(θ_train, m)
Z_val = simulator(θ_val, m)
estimator = train(estimator, θ_train, θ_val, Z_train, Z_val, epochs = 5)
NeuralEstimators.trainmultiple
— Functiontrainmultiple(estimator, sampler::Function, simulator::Function, m::Vector{Integer}; ...)
trainmultiple(estimator, θ_train, θ_val, simulator::Function, m::Vector{Integer}; ...)
trainmultiple(estimator, θ_train, θ_val, Z_train, Z_val, m::Vector{Integer}; ...)
trainmultiple(estimator, θ_train, θ_val, Z_train::V, Z_val::V; ...) where {V <: AbstractVector{AbstractVector{Any}}}
A wrapper around train()
to construct multiple neural estimators for different sample sizes m
.
The positional argument m
specifies the desired sample sizes. Each estimator is pre-trained with the estimator for the previous sample size (see Sainsbury-Dale at al., 2024, Sec 2.3.3). For example, if m = [m₁, m₂]
, the estimator for sample size m₂
is pre-trained with the estimator for sample size m₁
.
The method for Z_train
and Z_val
subsets the data using subsetdata(Z, 1:mᵢ)
for each mᵢ ∈ m
. The method for Z_train::V
and Z_val::V
trains an estimator for each element of Z_train::V
and Z_val::V
and, hence, it does not need to invoke subsetdata()
, which can be slow or difficult to define in some cases (e.g., for graphical data). Note that, in this case, m
is inferred from the data.
The keyword arguments inherit from train()
. The keyword arguments epochs
, batchsize
, stopping_epochs
, and optimiser
can each be given as vectors. For example, if training two estimators, one may use a different number of epochs for each estimator by providing epochs = [epoch₁, epoch₂]
.
The function returns a vector of neural estimators, each corresponding to a sample size in m
.
See also PiecewiseEstimator.
Assessment/calibration
NeuralEstimators.assess
— Functionassess(estimator, θ, Z; ...)
assess(estimators::Vector, θ, Z; ...)
Assesses an estimator
(or a collection of estimators
) based on true parameters θ
and corresponding simulated data Z
.
The parameters θ
should be given as a $d$ × $K$ matrix, where $d$ is the parameter dimension and $K$ is the number of sampled parameter vectors.
The data Z
should be a Vector
, with each element representing a single simulated data set. If length(Z)
is greater than $K$, θ
will be recycled via horizontal concatenation: θ = repeat(θ, outer = (1, J))
, where J = length(Z) ÷ K
is the number of simulated data sets per parameter vector. This allows assessment of the estimator's sampling distribution under fixed parameters.
The return value is of type Assessment
.
Keyword arguments
parameter_names::Vector{String}
: names of the parameters (sensible defaults provided).estimator_names::Vector{String}
: names of the estimators (sensible defaults provided).use_gpu = true
:Bool
or collection ofBool
objects with length equal to the number of estimators.probs = nothing
(applicable only toPointEstimator
): probability levels taking values between 0 and 1. By default, no bootstrap uncertainty quantification is done; ifprobs
is provided, it must be a two-element vector specifying the lower and upper probability levels for the non-parametric bootstrap intervals (note that parametric bootstrap is not currently supported withassess()
).B::Integer = 400
(applicable only toPointEstimator
): number of bootstrap samples.
NeuralEstimators.Assessment
— TypeAssessment(df::DataFrame, runtime::DataFrame)
A type for storing the output of assess()
. The field runtime
contains the total time taken for each estimator. The field df
is a long-form DataFrame
with columns:
estimator
: the name of the estimatorparameter
: the name of the parametertruth
: the true value of the parameterestimate
: the estimated value of the parameterm
: the sample size (number of iid replicates) for the given data setk
: the index of the parameter vectorj
: the index of the data set (in the case that multiple data sets are associated with each parameter vector)
If the estimator is an IntervalEstimator
, the column estimate
will be replaced by the columns lower
and upper
, containing the lower and upper bounds of the interval, respectively.
If the estimator is a QuantileEstimator
, there will also be a column prob
indicating the probability level of the corresponding quantile estimate.
Use merge()
to combine assessments from multiple estimators of the same type or join()
to combine assessments from a PointEstimator
and an IntervalEstimator
.
NeuralEstimators.risk
— Functionrisk(assessment::Assessment; ...)
Computes a Monte Carlo approximation of an estimator's Bayes risk,
\[r(\hat{\boldsymbol{\theta}}(\cdot)) \approx \frac{1}{K} \sum_{k=1}^K L(\boldsymbol{\theta}^{(k)}, \hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)})),\]
where $\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}$ denotes a set of $K$ parameter vectors sampled from the prior and, for each $k$, data $\boldsymbol{Z}^{(k)}$ are simulated from the statistical model conditional on $\boldsymbol{\theta}^{(k)}$.
Keyword arguments
loss = (x, y) -> abs(x - y)
: a binary operator defining the loss function (default absolute-error loss).average_over_parameters::Bool = false
: if true, the loss is averaged over all parameters; otherwise (default), the loss is averaged over each parameter separately.average_over_sample_sizes::Bool = true
: if true (default), the loss is averaged over all sample sizes $m$; otherwise, the loss is averaged over each sample size separately.
NeuralEstimators.bias
— Functionbias(assessment::Assessment; ...)
Computes a Monte Carlo approximation of an estimator's bias,
\[{\textrm{bias}}(\hat{\boldsymbol{\theta}}(\cdot)) \approx \frac{1}{K} \sum_{k=1}^K \{\hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)}) - \boldsymbol{\theta}^{(k)}\},\]
where $\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}$ denotes a set of $K$ parameter vectors sampled from the prior and, for each $k$, data $\boldsymbol{Z}^{(k)}$ are simulated from the statistical model conditional on $\boldsymbol{\theta}^{(k)}$.
This function inherits the keyword arguments of risk
(excluding the argument loss
).
NeuralEstimators.rmse
— Functionrmse(assessment::Assessment; ...)
Computes a Monte Carlo approximation of an estimator's root-mean-squared error,
\[{\textrm{rmse}}(\hat{\boldsymbol{\theta}}(\cdot)) \approx \sqrt{\frac{1}{K} \sum_{k=1}^K \{\hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)}) - \boldsymbol{\theta}^{(k)}\}^2},\]
where $\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}$ denotes a set of $K$ parameter vectors sampled from the prior and, for each $k$, data $\boldsymbol{Z}^{(k)}$ are simulated from the statistical model conditional on $\boldsymbol{\theta}^{(k)}$.
This function inherits the keyword arguments of risk
(excluding the argument loss
).
NeuralEstimators.coverage
— Functioncoverage(assessment::Assessment; ...)
Computes a Monte Carlo approximation of an interval estimator's expected coverage, as defined in Hermans et al. (2022, Definition 2.1), and the proportion of parameters below and above the lower and upper bounds, respectively.
Keyword arguments
average_over_parameters::Bool = false
: if true, the coverage is averaged over all parameters; otherwise (default), it is computed over each parameter separately.average_over_sample_sizes::Bool = true
: if true (default), the coverage is averaged over all sample sizes $m$; otherwise, it is computed over each sample size separately.
Inference with observed data
The following functions facilitate the use of a trained neural estimator with observed data.
NeuralEstimators.estimate
— Functionestimate(estimator, Z; batchsize::Integer = 32, use_gpu::Bool = true, kwargs...)
Applies estimator
to batches of Z
of size batchsize
, which can prevent memory issues that can occur with large data sets.
Batching will only be done if there are multiple data sets in Z
, which will be inferred by Z
being a vector, or a tuple whose first element is a vector.
NeuralEstimators.bootstrap
— Functionbootstrap(estimator::PointEstimator, parameters::P, Z; use_gpu = true) where P <: Union{AbstractMatrix, ParameterConfigurations}
bootstrap(estimator::PointEstimator, parameters::P, simulator, m::Integer; B = 400, use_gpu = true) where P <: Union{AbstractMatrix, ParameterConfigurations}
bootstrap(estimator::PointEstimator, Z; B = 400, blocks = nothing, trim = true, use_gpu = true)
Generates B
bootstrap estimates using estimator
.
Parametric bootstrapping is facilitated by passing a single parameter configuration, parameters
, and corresponding simulated data, Z
, whose length implicitly defines B
. Alternatively, one may provide a simulator
and the desired sample size, in which case the data will be simulated using simulator(parameters, m)
.
Non-parametric bootstrapping is facilitated by passing a single data set, Z
. The argument blocks
caters for block bootstrapping, and it should be a vector of integers specifying the block for each replicate. For example, with 5 replicates, the first two corresponding to block 1 and the remaining three corresponding to block 2, blocks
should be [1, 1, 2, 2, 2]
. The resampling algorithm generates resampled data sets by sampling blocks with replacement. If trim = true
, the final block is trimmed as needed to ensure that the resampled data set matches the original size of Z
.
The return type is a $d$ × B
matrix, where $d$ is the dimension of the parameter vector.
NeuralEstimators.interval
— Functioninterval(θ::Matrix; probs = [0.05, 0.95], parameter_names = nothing)
interval(estimator::IntervalEstimator, Z; parameter_names = nothing, use_gpu = true)
Computes a confidence/credible interval based either on a $d$ × $B$ matrix θ
of parameters (typically containing bootstrap estimates or posterior draws), where $d$ denotes the number of parameters to make inference on, or from an IntervalEstimator
and data Z
.
When given θ
, the intervals are constructed by computing quantiles with probability levels controlled by the keyword argument probs
.
The return type is a $d$ × 2 matrix, whose first and second columns respectively contain the lower and upper bounds of the interval. The rows of this matrix can be named by passing a vector of strings to the keyword argument parameter_names
.
NeuralEstimators.sampleposterior
— Functionsampleposterior(estimator::PosteriorEstimator, Z, N::Integer = 1000)
sampleposterior(estimator::RatioEstimator, Z, N::Integer = 1000; θ_grid, prior::Function = θ -> 1f0)
Samples from the approximate posterior distribution implied by estimator
.
The positional argument N
controls the size of the posterior sample.
When sampling based on a RatioEstimator
, the sampling algorithm is based on a fine-gridding of the parameter space, specified through the keyword argument θ_grid
(or theta_grid
). The approximate posterior density is evaluated over this grid, which is then used to draw samples. This is very effective when making inference with a small number of parameters. For models with a large number of parameters, other sampling algorithms may be needed (please feel free to contact the package maintainer for discussion). The prior distribution $p(\boldsymbol{\theta})$ is controlled through the keyword argument prior
(by default, a uniform prior is used).
NeuralEstimators.posteriormean
— Functionposteriormean(θ::AbstractMatrix)
posteriormean(estimator::Union{PosteriorEstimator, RatioEstimator}, Z, N::Integer = 1000; kwargs...)
Computes the posterior mean based either on a $d$ × $N$ matrix θ
of posterior draws, where $d$ denotes the number of parameters to make inference on, or directly from an estimator that allows for posterior sampling via sampleposterior()
.
See also posteriormedian()
, posteriormode()
.
NeuralEstimators.posteriormedian
— Functionposteriormedian(θ::AbstractMatrix)
posteriormedian(estimator::Union{PosteriorEstimator, RatioEstimator}, Z, N::Integer = 1000; kwargs...)
Computes the vector of marginal posterior medians based either on a $d$ × $N$ matrix θ
of posterior draws, where $d$ denotes the number of parameters to make inference on, or directly from an estimator that allows for posterior sampling via sampleposterior()
.
See also posteriormean()
, posteriorquantile()
.
NeuralEstimators.posteriorquantile
— Functionposteriorquantile(θ::AbstractMatrix, probs)
posteriormedian(estimator::Union{PosteriorEstimator, RatioEstimator}, Z, probs, N::Integer = 1000; kwargs...)
Computes the vector of marginal posterior quantiles with (a collection of) probability levels probs
, based either on a $d$ × $N$ matrix θ
of posterior draws, where $d$ denotes the number of parameters to make inference on, or directly from an estimator that allows for posterior sampling via sampleposterior()
.
The return value is a $d$ × length(probs)
matrix.
See also posteriormedian()
.
NeuralEstimators.posteriormode
— Functionposteriormode(estimator::RatioEstimator, Z; θ₀ = nothing, θ_grid = nothing, prior::Function = θ -> 1, use_gpu = true)
Computes the (approximate) posterior mode (maximum a posteriori estimate) given data $\boldsymbol{Z}$,
\[\underset{\boldsymbol{\theta}}{\mathrm{arg\,max\;}} \ell(\boldsymbol{\theta} ; \boldsymbol{Z}) + \log p(\boldsymbol{\theta}),\]
where $\ell(\cdot ; \cdot)$ denotes the approximate log-likelihood function implied by estimator
, and $p(\boldsymbol{\theta})$ denotes the prior density function controlled through the keyword argument prior
. Note that this estimate can be viewed as an approximate maximum penalised likelihood estimate, with penalty term $p(\boldsymbol{\theta})$.
If a vector θ₀
of initial parameter estimates is given, the approximate posterior density is maximised by gradient descent (requires Optim.jl
to be loaded). Otherwise, if a matrix of parameters θ_grid
is given, the approximate posterior density is maximised by grid search.
See also posteriormedian()
, posteriormean()
.