Core
This page documents the classes and functions that are central to the workflow of NeuralEstimators
. Its organisation reflects the order in which these classes and functions appear in a standard implementation; that is, from sampling parameters from the prior distribution, to using a neural Bayes estimator to make inference with observed data sets.
Sampling parameters
Parameters sampled from the prior distribution are stored as a $p \times K$ matrix, where $p$ is the number of parameters in the statistical model and $K$ is the number of parameter vectors sampled from the prior distribution.
It can sometimes be helpful to wrap the parameter matrix in a user-defined type that also stores expensive intermediate objects needed for data simulated (e.g., Cholesky factors). In this case, the user-defined type should be a subtype of the abstract type ParameterConfigurations
, whose only requirement is a field θ
that stores the matrix of parameters. See Storing expensive intermediate objects for data simulation for further discussion.
NeuralEstimators.ParameterConfigurations
— TypeParameterConfigurations
An abstract supertype for user-defined types that store parameters and any intermediate objects needed for data simulation.
The user-defined type must have a field θ
that stores the $p$ × $K$ matrix of parameters, where $p$ is the number of parameters in the model and $K$ is the number of parameter vectors sampled from the prior distribution. There are no other restrictions.
See subsetparameters
for the generic function for subsetting these objects.
Examples
struct P <: ParameterConfigurations
θ
# other expensive intermediate objects...
end
Simulating data
NeuralEstimators
facilitates neural estimation for arbitrary statistical models by having the user implicitly define their model via simulated data, either as fixed instances or via a function that simulates data from the statistical model.
The data are always stored as a Vector{A}
, where each element of the vector corresponds to a data set of $m$ independent replicates associated with one parameter vector (note that $m$ is arbitrary), and where the type A
depends on the multivariate structure of the data:
- For univariate and unstructured multivariate data,
A
is a $d \times m$ matrix where $d$ is the dimension each replicate (e.g., $d=1$ for univariate data). - For data collected over a regular grid,
A
is a ($N + 2$)-dimensional array, where $N$ is the dimension of the grid (e.g., $N = 1$ for time series, $N = 2$ for two-dimensional spatial grids, etc.). The first $N$ dimensions of the array correspond to the dimensions of the grid; the penultimate dimension stores the so-called "channels" (this dimension is singleton for univariate processes, two for bivariate processes, and so on); and the final dimension stores the independent replicates. For example, to store 50 independent replicates of a bivariate spatial process measured over a 10x15 grid, one would construct an array of dimension 10x15x2x50. - For spatial data collected over irregular spatial locations,
A
is aGNNGraph
with independent replicates (possibly with differing spatial locations) stored as subgraphs using the functionbatch
.
Estimators
Several classes of neural estimators are available in the package.
The simplest class is PointEstimator
, used for constructing arbitrary mappings from the sample space to the parameter space. When constructing a generic point estimator, the user defines the loss function and therefore the Bayes estimator that will be targeted.
Several classes cater for the estimation of marginal posterior quantiles, based on the quantile loss function (see quantileloss()
); in particular, see IntervalEstimator
and QuantileEstimatorDiscrete
for estimating marginal posterior quantiles for a fixed set of probability levels, and QuantileEstimatorContinuous
for estimating marginal posterior quantiles with the probability level as an input to the neural network.
In addition to point estimation, the package also provides the class RatioEstimator
for approximating the so-called likelihood-to-evidence ratio. The binary classification problem at the heart of this approach proceeds based on the binary cross-entropy loss.
Users are free to choose the neural-network architecture of these estimators as they see fit (subject to some class-specific requirements), but the package also provides the convenience constructor initialise_estimator()
.
NeuralEstimators.NeuralEstimator
— TypeNeuralEstimator
An abstract supertype for neural estimators.
NeuralEstimators.PointEstimator
— TypePointEstimator(deepset::DeepSet)
A neural point estimator, a mapping from the sample space to the parameter space.
The estimator leverages the DeepSet
architecture. The only requirement is that number of output neurons in the final layer of the inference network (i.e., the outer network) is equal to the number of parameters in the statistical model.
NeuralEstimators.IntervalEstimator
— TypeIntervalEstimator(u::DeepSet, v::DeepSet = u; probs = [0.025, 0.975], g::Function = exp)
IntervalEstimator(u::DeepSet, c::Union{Function,Compress}; probs = [0.025, 0.975], g::Function = exp)
IntervalEstimator(u::DeepSet, v::DeepSet, c::Union{Function,Compress}; probs = [0.025, 0.975], g::Function = exp)
A neural interval estimator which, given data $Z$, jointly estimates marginal posterior credible intervals based on the probability levels probs
.
The estimator employs a representation that prevents quantile crossing, namely, it constructs marginal posterior credible intervals for each parameter $\theta_i$, $i = 1, \dots, p,$ of the form,
\[[c_i(u_i(\boldsymbol{Z})), \;\; c_i(u_i(\boldsymbol{Z})) + g(v_i(\boldsymbol{Z})))],\]
where $\boldsymbol{u}(⋅) \equiv (u_1(\cdot), \dots, u_p(\cdot))'$ and $\boldsymbol{v}(⋅) \equiv (v_1(\cdot), \dots, v_p(\cdot))'$ are neural networks that transform data into $p$-dimensional vectors; $g(\cdot)$ is a monotonically increasing function (e.g., exponential or softplus); and each $c_i(⋅)$ is a monotonically increasing function that maps its input to the prior support of $\theta_i$.
The functions $c_i(⋅)$ may be defined by a $p$-dimensional object of type Compress
. If these functions are unspecified, they will be set to the identity function so that the range of the intervals will be unrestricted.
If only a single neural-network architecture is provided, it will be used for both $\boldsymbol{u}(⋅)$ and $\boldsymbol{v}(⋅)$.
The return value when applied to data is a matrix with $2p$ rows, where the first and second $p$ rows correspond to the lower and upper bounds, respectively.
See also QuantileEstimatorDiscrete
and QuantileEstimatorContinuous
.
Examples
using NeuralEstimators, Flux
# Generate some toy data
n = 2 # bivariate data
m = 100 # number of independent replicates
Z = rand(n, m)
# prior
p = 3 # number of parameters in the statistical model
min_supp = [25, 0.5, -pi/2]
max_supp = [500, 2.5, 0]
g = Compress(min_supp, max_supp)
# Create an architecture
w = 8 # width of each layer
ψ = Chain(Dense(n, w, relu), Dense(w, w, relu));
ϕ = Chain(Dense(w, w, relu), Dense(w, p));
u = DeepSet(ψ, ϕ)
# Initialise the interval estimator
estimator = IntervalEstimator(u, g)
# Apply the (untrained) interval estimator
estimator(Z)
interval(estimator, Z)
NeuralEstimators.QuantileEstimatorDiscrete
— TypeQuantileEstimatorDiscrete(v::DeepSet; probs = [0.05, 0.25, 0.5, 0.75, 0.95], g = Flux.softplus, i = nothing)
(estimator::QuantileEstimatorDiscrete)(Z)
(estimator::QuantileEstimatorDiscrete)(Z, θ₋ᵢ)
A neural estimator that jointly estimates a fixed set of marginal posterior quantiles with probability levels $\{\tau_1, \dots, \tau_T\}$, controlled by the keyword argument probs
.
By default, the estimator approximates the marginal quantiles for all parameters in the model, that is, the quantiles of
\[\theta_i \mid \boldsymbol{Z}\]
for parameters $\boldsymbol{\theta} \equiv (\theta_1, \dots, \theta_p)'$. Alternatively, if initialised with i
set to a positive integer, the estimator approximates the quantiles of the full conditional distribution
\[\theta_i \mid \boldsymbol{Z}, \boldsymbol{\theta}_{-i},\]
where $\boldsymbol{\theta}_{-i}$ denotes the parameter vector with its $i$th element removed. For ease of exposition, when targetting marginal posteriors of the form $\theta_i \mid \boldsymbol{Z}$ (i.e., the default behaviour), we define $\text{dim}(\boldsymbol{\theta}_{-i}) ≡ 0$.
The estimator leverages the DeepSet
architecture, subject to two requirements. First, the number of input neurons in the first layer of the inference network (i.e., the outer network) must be equal to the number of neurons in the final layer of the summary network plus $\text{dim}(\boldsymbol{\theta}_{-i})$. Second, the number of output neurons in the final layer of the inference network must be equal to $p - \text{dim}(\boldsymbol{\theta}_{-i})$. The estimator employs a representation that prevents quantile crossing, namely,
\[\begin{aligned} \boldsymbol{q}^{(\tau_1)}(\boldsymbol{Z}) &= \boldsymbol{v}^{(\tau_1)}(\boldsymbol{Z}),\\ \boldsymbol{q}^{(\tau_t)}(\boldsymbol{Z}) &= \boldsymbol{v}^{(\tau_1)}(\boldsymbol{Z}) + \sum_{j=2}^t g(\boldsymbol{v}^{(\tau_j)}(\boldsymbol{Z})), \quad t = 2, \dots, T, \end{aligned}\]
where $\boldsymbol{q}^{(\tau)}(\boldsymbol{Z})$ denotes the vector of $\tau$-quantiles for parameters $\boldsymbol{\theta} \equiv (\theta_1, \dots, \theta_p)'$, and $\boldsymbol{v}^{(\tau_t)}(\cdot)$, $t = 1, \dots, T$, are unconstrained neural networks that transform data into $p$-dimensional vectors, and $g(\cdot)$ is a non-negative function (e.g., exponential or softplus) applied elementwise to its arguments. If g=nothing
, the quantiles are estimated independently through the representation,
\[\boldsymbol{q}^{(\tau_t)}(\boldsymbol{Z}) = \boldsymbol{v}^{(\tau_t)}(\boldsymbol{Z}), \quad t = 1, \dots, T.\]
The return value is a matrix with $(p - \text{dim}(\boldsymbol{\theta}_{-i})) \times T$ rows, where the first set of $T$ rows corresponds to the estimated quantiles for the first parameter, the second set of $T$ rows corresponds to the estimated quantiles for the second parameter, and so on.
See also IntervalEstimator
and QuantileEstimatorContinuous
.
Examples
using NeuralEstimators, Flux, Distributions
using AlgebraOfGraphics, CairoMakie
# Model: Z|θ ~ N(θ, 1) with θ ~ N(0, 1)
d = 1 # dimension of each independent replicate
p = 1 # number of unknown parameters in the statistical model
m = 30 # number of independent replicates in each data set
prior(K) = randn32(p, K)
simulate(θ, m) = [μ .+ randn32(1, m) for μ ∈ eachcol(θ)]
# Architecture
ψ = Chain(Dense(d, 64, relu), Dense(64, 64, relu))
ϕ = Chain(Dense(64, 64, relu), Dense(64, p))
v = DeepSet(ψ, ϕ)
# Initialise the estimator
τ = [0.05, 0.25, 0.5, 0.75, 0.95]
q̂ = QuantileEstimatorDiscrete(v; probs = τ)
# Train the estimator
q̂ = train(q̂, prior, simulate, m = m)
# Assess the estimator
θ = prior(1000)
Z = simulate(θ, m)
assessment = assess(q̂, θ, Z)
plot(assessment)
# Estimate posterior quantiles
q̂(Z)
# -------------------------------------------------------------
# --------------------- Full conditionals ---------------------
# -------------------------------------------------------------
# Model: Z|μ,σ ~ N(μ, σ²) with μ ~ N(0, 1), σ ∼ IG(3,1)
d = 1 # dimension of each independent replicate
p = 2 # number of unknown parameters in the statistical model
m = 30 # number of independent replicates in each data set
function prior(K)
μ = randn(1, K)
σ = rand(InverseGamma(3, 1), 1, K)
θ = Float32.(vcat(μ, σ))
end
simulate(θ, m) = [ϑ[1] .+ ϑ[2] .* randn32(1, m) for ϑ ∈ eachcol(θ)]
# Architecture
ψ = Chain(Dense(d, 64, relu), Dense(64, 64, relu))
ϕ = Chain(Dense(64 + 1, 64, relu), Dense(64, 1))
v = DeepSet(ψ, ϕ)
# Initialise estimators respectively targetting quantiles of μ∣Z,σ and σ∣Z,μ
τ = [0.05, 0.25, 0.5, 0.75, 0.95]
q₁ = QuantileEstimatorDiscrete(v; probs = τ, i = 1)
q₂ = QuantileEstimatorDiscrete(v; probs = τ, i = 2)
# Train the estimators
q₁ = train(q₁, prior, simulate, m = m)
q₂ = train(q₂, prior, simulate, m = m)
# Assess the estimators
θ = prior(1000)
Z = simulate(θ, m)
assessment = assess([q₁, q₂], θ, Z, parameter_names = ["μ", "σ"])
plot(assessment)
# Estimate quantiles of μ∣Z,σ with σ = 0.5 and for many data sets
θ₋ᵢ = 0.5f0
q₁(Z, θ₋ᵢ)
# Estimate quantiles of μ∣Z,σ with σ = 0.5 for only a single data set
q₁(Z[1], θ₋ᵢ)
NeuralEstimators.QuantileEstimatorContinuous
— TypeQuantileEstimatorContinuous(deepset::DeepSet; i = nothing, num_training_probs::Integer = 1)
(estimator::QuantileEstimatorContinuous)(Z, τ)
(estimator::QuantileEstimatorContinuous)(Z, θ₋ᵢ, τ)
A neural estimator targetting posterior quantiles.
Given as input data $\boldsymbol{Z}$ and the desired probability level $\tau ∈ (0, 1)$, by default the estimator approximates the $\tau$-quantile of
\[\theta_i \mid \boldsymbol{Z}\]
for parameters $\boldsymbol{\theta} \equiv (\theta_1, \dots, \theta_p)'$. Alternatively, if initialised with i
set to a positive integer, the estimator approximates the $\tau$-quantile of the full conditional distribution
\[\theta_i \mid \boldsymbol{Z}, \boldsymbol{\theta}_{-i},\]
where $\boldsymbol{\theta}_{-i}$ denotes the parameter vector with its $i$th element removed. For ease of exposition, when targetting marginal posteriors of the form $\theta_i \mid \boldsymbol{Z}$ (i.e., the default behaviour), we define $\text{dim}(\boldsymbol{\theta}_{-i}) ≡ 0$.
The estimator leverages the DeepSet
architecture, subject to two requirements. First, the number of input neurons in the first layer of the inference network (i.e., the outer network) must be equal to the number of neurons in the final layer of the summary network plus $1 + \text{dim}(\boldsymbol{\theta}_{-i})$. Second, the number of output neurons in the final layer of the inference network must be equal to $p - \text{dim}(\boldsymbol{\theta}_{-i})$.
Although not a requirement, one may employ a (partially) monotonic neural network to prevent quantile crossing (i.e., to ensure that the $\tau_1$-quantile does not exceed the $\tau_2$-quantile for any $\tau_2 > \tau_1$). There are several ways to construct such a neural network: one simple yet effective approach is to ensure that all weights associated with $\tau$ are strictly positive (see, e.g., Cannon, 2018), and this can be done using the DensePositive
layer as illustrated in the examples below.
The return value is a matrix with $p - \text{dim}(\boldsymbol{\theta}_{-i})$ rows, corresponding to the estimated quantile for each parameter not in $\boldsymbol{\theta}_{-i}$.
See also QuantileEstimatorDiscrete
.
Examples
using NeuralEstimators, Flux, Distributions , InvertedIndices, Statistics
using AlgebraOfGraphics, CairoMakie
# Model: Z|θ ~ N(θ, 1) with θ ~ N(0, 1)
d = 1 # dimension of each independent replicate
p = 1 # number of unknown parameters in the statistical model
m = 30 # number of independent replicates in each data set
prior(K) = randn32(p, K)
simulateZ(θ, m) = [ϑ .+ randn32(1, m) for ϑ ∈ eachcol(θ)]
simulateτ(K) = [rand32(10) for k in 1:K]
simulate(θ, m) = simulateZ(θ, m), simulateτ(size(θ, 2))
# Architecture: partially monotonic network to preclude quantile crossing
w = 64 # width of each hidden layer
ψ = Chain(
Dense(d, w, relu),
Dense(w, w, relu),
Dense(w, w, relu)
)
ϕ = Chain(
DensePositive(Dense(w + 1, w, relu); last_only = true),
DensePositive(Dense(w, w, relu)),
DensePositive(Dense(w, p))
)
deepset = DeepSet(ψ, ϕ)
# Initialise the estimator
q̂ = QuantileEstimatorContinuous(deepset)
# Train the estimator
q̂ = train(q̂, prior, simulate, m = m)
# Assess the estimator
θ = prior(1000)
Z = simulateZ(θ, m)
assessment = assess(q̂, θ, Z)
plot(assessment)
# Estimate 0.1-quantile for many data sets
τ = 0.1f0
q̂(Z, τ)
# Estimate several quantiles for a single data set
# (note that τ is given as a row vector)
z = Z[1]
τ = Float32.([0.1, 0.25, 0.5, 0.75, 0.9])'
q̂(z, τ)
# -------------------------------------------------------------
# --------------------- Full conditionals ---------------------
# -------------------------------------------------------------
# Model: Z|μ,σ ~ N(μ, σ²) with μ ~ N(0, 1), σ ∼ IG(3,1)
d = 1 # dimension of each independent replicate
p = 2 # number of unknown parameters in the statistical model
m = 30 # number of independent replicates in each data set
function prior(K)
μ = randn(1, K)
σ = rand(InverseGamma(3, 1), 1, K)
θ = vcat(μ, σ)
θ = Float32.(θ)
return θ
end
simulateZ(θ, m) = [ϑ[1] .+ ϑ[2] .* randn32(1, m) for ϑ ∈ eachcol(θ)]
simulateτ(θ) = [rand32(10) for k in 1:size(θ, 2)]
simulate(θ, m) = simulateZ(θ, m), simulateτ(θ)
# Architecture: partially monotonic network to preclude quantile crossing
w = 64 # width of each hidden layer
ψ = Chain(
Dense(d, w, relu),
Dense(w, w, relu),
Dense(w, w, relu)
)
ϕ = Chain(
DensePositive(Dense(w + 2, w, relu); last_only = true),
DensePositive(Dense(w, w, relu)),
DensePositive(Dense(w, 1))
)
deepset = DeepSet(ψ, ϕ)
# Initialise the estimator for the first parameter, targetting μ∣Z,σ
i = 1
q̂ = QuantileEstimatorContinuous(deepset; i = i)
# Train the estimator
q̂ = train(q̂, prior, simulate, m = m)
# Assess the estimator
θ = prior(1000)
Z = simulateZ(θ, m)
assessment = assess(q̂, θ, Z)
plot(assessment)
# Estimate quantiles of μ∣Z,σ with σ = 0.5 and for many data sets
# (use θ[Not(i), :] to determine the order in which the conditioned parameters should be given)
θ = prior(1000)
Z = simulateZ(θ, m)
θ₋ᵢ = 0.5f0
τ = Float32.([0.1, 0.25, 0.5, 0.75, 0.9])
q̂(Z, θ₋ᵢ, τ)
# Estimate quantiles for a single data set
q̂(Z[1], θ₋ᵢ, τ)
NeuralEstimators.RatioEstimator
— TypeRatioEstimator(deepset::DeepSet)
A neural estimator that estimates the likelihood-to-evidence ratio,
\[r(\boldsymbol{Z}, \boldsymbol{\theta}) \equiv p(\boldsymbol{Z} \mid \boldsymbol{\theta})/p(\boldsymbol{Z}),\]
where $p(\boldsymbol{Z} \mid \boldsymbol{\theta})$ is the likelihood and $p(\boldsymbol{Z})$ is the marginal likelihood, also known as the model evidence.
The estimator leverages the DeepSet
architecture, subject to two requirements. First, the number of input neurons in the first layer of the inference network (i.e., the outer network) must equal the number of output neurons in the final layer of the summary network plus the number of parameters in the statistical model. Second, the number of output neurons in the final layer of the inference network must be equal to one.
The ratio estimator is trained by solving a relatively straightforward binary classification problem. Specifically, consider the problem of distinguishing dependent parameter–data pairs ${(\boldsymbol{\theta}', \boldsymbol{Z}')' \sim p(\boldsymbol{Z}, \boldsymbol{\theta})}$ with class labels $Y=1$ from independent parameter–data pairs ${(\tilde{\boldsymbol{\theta}}', \tilde{\boldsymbol{Z}}')' \sim p(\boldsymbol{\theta})p(\boldsymbol{Z})}$ with class labels $Y=0$, and where the classes are balanced. Then the Bayes classifier under binary cross-entropy loss is given by
\[c(\boldsymbol{Z}, \boldsymbol{\theta}) = \frac{p(\boldsymbol{Z}, \boldsymbol{\theta})}{p(\boldsymbol{Z}, \boldsymbol{\theta}) + p(\boldsymbol{\theta})p(\boldsymbol{Z})},\]
and hence,
\[r(\boldsymbol{Z}, \boldsymbol{\theta}) = \frac{c(\boldsymbol{Z}, \boldsymbol{\theta})}{1 - c(\boldsymbol{Z}, \boldsymbol{\theta})}.\]
For numerical stability, training is done on the log-scale using $\log r(\boldsymbol{Z}, \boldsymbol{\theta}) = \text{logit}(c(\boldsymbol{Z}, \boldsymbol{\theta}))$.
When applying the estimator to data, by default the likelihood-to-evidence ratio $r(\boldsymbol{Z}, \boldsymbol{\theta})$ is returned (setting the keyword argument classifier = true
will yield class probability estimates). The estimated ratio can then be used in various downstream Bayesian (e.g., Hermans et al., 2020) or Frequentist (e.g., Walchessen et al., 2023) inferential algorithms.
See also mlestimate
and mapestimate
for obtaining approximate maximum-likelihood and maximum-a-posteriori estimates, and sampleposterior
for obtaining approximate posterior samples.
Examples
using NeuralEstimators, Flux, Statistics, Optim
# Generate data from Z|μ,σ ~ N(μ, σ²) with μ, σ ~ U(0, 1)
p = 2 # number of unknown parameters in the statistical model
d = 1 # dimension of each independent replicate
m = 100 # number of independent replicates
prior(K) = rand32(p, K)
simulate(θ, m) = θ[1] .+ θ[2] .* randn32(d, m)
simulate(θ::AbstractMatrix, m) = simulate.(eachcol(θ), m)
# Architecture
w = 64 # width of each hidden layer
ψ = Chain(
Dense(d, w, relu),
Dense(w, w, relu),
Dense(w, w, relu)
)
ϕ = Chain(
Dense(w + p, w, relu),
Dense(w, w, relu),
Dense(w, 1)
)
deepset = DeepSet(ψ, ϕ)
# Initialise the estimator
r̂ = RatioEstimator(deepset)
# Train the estimator
r̂ = train(r̂, prior, simulate, m = m)
# Inference with "observed" data set
θ = prior(1)
z = simulate(θ, m)[1]
θ₀ = [0.5, 0.5] # initial estimate
mlestimate(r̂, z; θ₀ = θ₀) # maximum-likelihood estimate (requires Optim.jl to be loaded)
mapestimate(r̂, z; θ₀ = θ₀) # maximum-a-posteriori estimate (requires Optim.jl to be loaded)
θ_grid = expandgrid(0:0.01:1, 0:0.01:1)' # fine gridding of the parameter space
θ_grid = Float32.(θ_grid)
r̂(z, θ_grid) # likelihood-to-evidence ratios over grid
mlestimate(r̂, z; θ_grid = θ_grid) # maximum-likelihood estimate
mapestimate(r̂, z; θ_grid = θ_grid) # maximum-a-posteriori estimate
sampleposterior(r̂, z; θ_grid = θ_grid) # posterior samples
NeuralEstimators.PiecewiseEstimator
— TypePiecewiseEstimator(estimators, changepoints)
Creates a piecewise estimator (Sainsbury-Dale et al., 2024, sec. 2.2.2) from a collection of estimators
and sample-size changepoints
.
Specifically, with $l$ estimators and sample-size changepoints $m_1 < m_2 < \dots < m_{l-1}$, the piecewise etimator takes the form,
\[\hat{\boldsymbol{\theta}}(\boldsymbol{Z}) = \begin{cases} \hat{\boldsymbol{\theta}}_1(\boldsymbol{Z}) & m \leq m_1,\\ \hat{\boldsymbol{\theta}}_2(\boldsymbol{Z}) & m_1 < m \leq m_2,\\ \quad \vdots \\ \hat{\boldsymbol{\theta}}_l(\boldsymbol{Z}) & m > m_{l-1}. \end{cases}\]
For example, given an estimator $\hat{\boldsymbol{\theta}}_1(\cdot)$ trained for small sample sizes (e.g., m ≤ 30) and an estimator $\hat{\boldsymbol{\theta}}_2(\cdot)$ trained for moderate-to-large sample sizes (e.g., m > 30), we may construct a PiecewiseEstimator
that dispatches $\hat{\boldsymbol{\theta}}_1(\cdot)$ if m ≤ 30 and $\hat{\boldsymbol{\theta}}_2(\cdot)$ otherwise.
See also trainx()
for training estimators for a range of sample sizes.
Examples
using NeuralEstimators, Flux
d = 2 # bivariate data
p = 3 # number of parameters in the statistical model
w = 8 # width of each hidden layer
# Small-sample estimator
ψ₁ = Chain(Dense(d, w, relu), Dense(w, w, relu));
ϕ₁ = Chain(Dense(w, w, relu), Dense(w, p));
θ̂₁ = PointEstimator(DeepSet(ψ₁, ϕ₁))
# Large-sample estimator
ψ₂ = Chain(Dense(d, w, relu), Dense(w, w, relu));
ϕ₂ = Chain(Dense(w, w, relu), Dense(w, p));
θ̂₂ = PointEstimator(DeepSet(ψ₂, ϕ₂))
# Piecewise estimator with changepoint m=30
θ̂ = PiecewiseEstimator([θ̂₁, θ̂₂], 30)
# Apply the (untrained) piecewise estimator to data
Z = [rand(d, 1, m) for m ∈ (10, 50)]
θ̂(Z)
NeuralEstimators.Ensemble
— TypeEnsemble(estimators)
Ensemble(architecture::Function, J::Integer)
(ensemble::Ensemble)(Z; aggr = median)
Defines an ensemble based on a collection of estimators
which, when applied to data Z
, returns the median (or another summary defined by aggr
) of the estimates.
The ensemble can be initialised with a collection of trained estimators
and then applied immediately to observed data. Alternatively, the ensemble can be initialised with a collection of untrained estimators
(or a function defining the architecture of each estimator, and the number of estimators in the ensemble), trained with train()
, and then applied to observed data. In the latter case, where the ensemble is trained directly, if savepath
is specified both the ensemble and component estimators will be saved.
Note that train()
currently acts sequentially on the component estimators.
The ensemble components can be accessed by indexing the ensemble directly; the number of component estimators can be obtained using length()
.
Examples
using NeuralEstimators, Flux
# Define the model, Z|θ ~ N(θ, 1), θ ~ N(0, 1)
d = 1 # dimension of each replicate
p = 1 # number of unknown parameters in the statistical model
m = 30 # number of independent replicates in each data set
sampler(K) = randn32(p, K)
simulator(θ, m) = [μ .+ randn32(d, m) for μ ∈ eachcol(θ)]
# Architecture of each ensemble component
function architecture()
ψ = Chain(Dense(d, 64, relu), Dense(64, 64, relu))
ϕ = Chain(Dense(64, 64, relu), Dense(64, p))
deepset = DeepSet(ψ, ϕ)
PointEstimator(deepset)
end
# Initialise ensemble with three components
ensemble = Ensemble(architecture, 3)
ensemble[1] # access component estimators by indexing
length(ensemble) # number of component estimators
# Training
ensemble = train(ensemble, sampler, simulator, m = m, epochs = 5)
# Assessment
θ = sampler(1000)
Z = simulator(θ, m)
assessment = assess(ensemble, θ, Z)
rmse(assessment)
# Apply to data
ensemble(Z)
Training
The function train
is used to train a single neural estimator, while the wrapper function trainx
is useful for training multiple neural estimators over a range of sample sizes, making using of the technique known as pre-training.
NeuralEstimators.train
— Functiontrain(θ̂, sampler::Function, simulator::Function; ...)
train(θ̂, θ_train::P, θ_val::P, simulator::Function; ...) where {P <: Union{AbstractMatrix, ParameterConfigurations}}
train(θ̂, θ_train::P, θ_val::P, Z_train::T, Z_val::T; ...) where {T, P <: Union{AbstractMatrix, ParameterConfigurations}}
Train a neural estimator θ̂
.
The methods cater for different variants of "on-the-fly" simulation. Specifically, a sampler
can be provided to continuously sample new parameter vectors from the prior, and a simulator
can be provided to continuously simulate new data conditional on the parameters. If provided with specific sets of parameters (θ_train
and θ_val
) and/or data (Z_train
and Z_val
), they will be held fixed during training.
In all methods, the validation parameters and data are held fixed to reduce noise when evaluating the validation risk.
Keyword arguments common to all methods:
loss = mae
: loss function to evaluate performance. For some classes of estimators (e.g.,QuantileEstimator
andRatioEstimator
), the loss function does not need to be specified.epochs = 100
: number of epochs to train the neural network. An epoch is one complete pass through the entire training data set when doing stochastic gradient descent.batchsize = 32
: the batchsize to use when performing stochastic gradient descent, that is, the number of training samples processed between each update of the neural-network parameters.optimiser = ADAM()
: optimisation algorithm used to update the neural-network parameters.savepath::String = ""
: path to save the trained estimator and other information; if an empty string (default), nothing is saved. Otherwise, the neural-network parameters (i.e., the weights and biases) will be saved during training asbson
files; the risk function evaluated over the training and validation sets will also be saved, in the first and second columns ofloss_per_epoch.csv
, respectively; the best parameters (as measured by validation risk) will be saved asbest_network.bson
.stopping_epochs = 5
: cease training if the risk doesn't improve in this number of epochs.use_gpu = true
: flag indicating whether to use a GPU if one is available.verbose = true
: flag indicating whether information, including empirical risk values and timings, should be printed to the console during training.
Keyword arguments common to train(θ̂, sampler, simulator)
and train(θ̂, θ_train, θ_val, simulator)
:
m
: sample sizes (either anInteger
or a collection ofIntegers
). Thesimulator
is called assimulator(θ, m)
.epochs_per_Z_refresh = 1
: the number of passes to make through the training set before the training data are refreshed.simulate_just_in_time = false
: flag indicating whether we should simulate just-in-time, in the sense that only abatchsize
number of parameter vectors and corresponding data are in memory at a given time.
Keyword arguments unique to train(θ̂, sampler, simulator)
:
K = 10000
: number of parameter vectors in the training set; the size of the validation set isK ÷ 5
.ξ = nothing
: an arbitrary collection of objects that, if provided, will be passed to the parameter sampler assampler(K, ξ)
; otherwise, the parameter sampler will be called assampler(K)
. Can also be provided asxi
.epochs_per_θ_refresh = 1
: the number of passes to make through the training set before the training parameters are refreshed. Must be a multiple ofepochs_per_Z_refresh
. Can also be provided asepochs_per_theta_refresh
.
Examples
using NeuralEstimators, Flux
function sampler(K)
μ = randn(K) # Gaussian prior
σ = rand(K) # Uniform prior
θ = hcat(μ, σ)'
return θ
end
function simulator(θ_matrix, m)
[θ[1] .+ θ[2] * randn(1, m) for θ ∈ eachcol(θ_matrix)]
end
# architecture
d = 1 # dimension of each replicate
p = 2 # number of parameters in the statistical model
ψ = Chain(Dense(1, 32, relu), Dense(32, 32, relu))
ϕ = Chain(Dense(32, 32, relu), Dense(32, p))
θ̂ = DeepSet(ψ, ϕ)
# number of independent replicates to use during training
m = 15
# training: full simulation on-the-fly
θ̂ = train(θ̂, sampler, simulator, m = m, epochs = 5)
# training: simulation on-the-fly with fixed parameters
K = 10000
θ_train = sampler(K)
θ_val = sampler(K ÷ 5)
θ̂ = train(θ̂, θ_train, θ_val, simulator, m = m, epochs = 5)
# training: fixed parameters and fixed data
Z_train = simulator(θ_train, m)
Z_val = simulator(θ_val, m)
θ̂ = train(θ̂, θ_train, θ_val, Z_train, Z_val, epochs = 5)
NeuralEstimators.trainx
— Functiontrainx(θ̂, sampler::Function, simulator::Function, m::Vector{Integer}; ...)
trainx(θ̂, θ_train, θ_val, simulator::Function, m::Vector{Integer}; ...)
trainx(θ̂, θ_train, θ_val, Z_train, Z_val, m::Vector{Integer}; ...)
trainx(θ̂, θ_train, θ_val, Z_train::V, Z_val::V; ...) where {V <: AbstractVector{AbstractVector{Any}}}
A wrapper around train()
to construct neural estimators for different sample sizes.
The positional argument m
specifies the desired sample sizes. Each estimator is pre-trained with the estimator for the previous sample size. For example, if m = [m₁, m₂]
, the estimator for sample size m₂
is pre-trained with the estimator for sample size m₁
.
The method for Z_train
and Z_val
subsets the data using subsetdata(Z, 1:mᵢ)
for each mᵢ ∈ m
. The method for Z_train::V
and Z_val::V
trains an estimator for each element of Z_train::V
and Z_val::V
and, hence, it does not need to invoke subsetdata()
, which can be slow or difficult to define in some cases (e.g., for graphical data). Note that, in this case, m
is inferred from the data.
The keyword arguments inherit from train()
. The keyword arguments epochs
, batchsize
, stopping_epochs
, and optimiser
can each be given as vectors. For example, if training two estimators, one may use a different number of epochs for each estimator by providing epochs = [epoch₁, epoch₂]
.
Assessment/calibration
NeuralEstimators.assess
— Functionassess(estimator, θ, Z)
Using an estimator
(or a collection of estimators), computes estimates from data Z
simulated based on true parameter vectors stored in θ
.
The data Z
should be a Vector
, with each element corresponding to a single simulated data set. If Z
contains more data sets than parameter vectors, the parameter matrix θ
will be recycled by horizontal concatenation via the call θ = repeat(θ, outer = (1, J))
where J = length(Z) ÷ K
is the number of simulated data sets and K = size(θ, 2)
is the number of parameter vectors.
The output is of type Assessment
; see ?Assessment
for details.
Keyword arguments
estimator_names::Vector{String}
: names of the estimators (sensible defaults provided).parameter_names::Vector{String}
: names of the parameters (sensible defaults provided). Ifξ
is provided with a fieldparameter_names
, those names will be used.ξ = nothing
: an arbitrary collection of objects that are fixed (e.g., distance matrices). Can also be provided asxi
.use_ξ = false
: aBool
or a collection ofBool
objects with length equal to the number of estimators. Specifies whether or not the estimator usesξ
: if it does, the estimator will be applied asestimator(Z, ξ)
. This argument is useful when multipleestimators
are provided, only some of which needξ
; hence, if only one estimator is provided andξ
is notnothing
,use_ξ
is automatically set totrue
. Can also be provided asuse_xi
.use_gpu = true
: aBool
or a collection ofBool
objects with length equal to the number of estimators.probs = range(0.01, stop=0.99, length=100)
: (relevant only forestimator::QuantileEstimatorContinuous
) a collection of probability levels in (0, 1)
Examples
using NeuralEstimators, Flux
n = 10 # number of observations in each realisation
p = 4 # number of parameters in the statistical model
# Construct the neural estimator
w = 32 # width of each layer
ψ = Chain(Dense(n, w, relu), Dense(w, w, relu));
ϕ = Chain(Dense(w, w, relu), Dense(w, p));
θ̂ = DeepSet(ψ, ϕ)
# Generate testing parameters
K = 100
θ = rand32(p, K)
# Data for a single sample size
m = 30
Z = [rand32(n, m) for _ ∈ 1:K];
assessment = assess(θ̂, θ, Z);
risk(assessment)
# Multiple data sets for each parameter vector
J = 5
Z = repeat(Z, J);
assessment = assess(θ̂, θ, Z);
risk(assessment)
# With set-level information
qₓ = 2
ϕ = Chain(Dense(w + qₓ, w, relu), Dense(w, p));
θ̂ = DeepSet(ψ, ϕ)
x = [rand(qₓ) for _ ∈ eachindex(Z)]
assessment = assess(θ̂, θ, (Z, x));
risk(assessment)
NeuralEstimators.Assessment
— TypeAssessment(df::DataFrame, runtime::DataFrame)
A type for storing the output of assess()
. The field runtime
contains the total time taken for each estimator. The field df
is a long-form DataFrame
with columns:
estimator
: the name of the estimatorparameter
: the name of the parametertruth
: the true value of the parameterestimate
: the estimated value of the parameterm
: the sample size (number of iid replicates) for the given data setk
: the index of the parameter vectorj
: the index of the data set (in the case that multiple data sets are associated with each parameter vector)
If estimator
is an IntervalEstimator
, the column estimate
will be replaced by the columns lower
and upper
, containing the lower and upper bounds of the interval, respectively.
If estimator
is a QuantileEstimator
, the df
will also contain a column prob
indicating the probability level of the corresponding quantile estimate.
Multiple Assessment
objects can be combined with merge()
(used for combining assessments from multiple point estimators) or join()
(used for combining assessments from a point estimator and an interval estimator).
NeuralEstimators.risk
— Functionrisk(assessment::Assessment; ...)
Computes a Monte Carlo approximation of an estimator's Bayes risk,
\[r(\hat{\boldsymbol{\theta}}(\cdot)) \approx \frac{1}{K} \sum_{k=1}^K L(\boldsymbol{\theta}^{(k)}, \hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)})),\]
where $\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}$ denotes a set of $K$ parameter vectors sampled from the prior and, for each $k$, data $\boldsymbol{Z}^{(k)}$ are simulated from the statistical model conditional on $\boldsymbol{\theta}^{(k)}$.
Keyword arguments
loss = (x, y) -> abs(x - y)
: a binary operator defining the loss function (default absolute-error loss).average_over_parameters::Bool = false
: if true, the loss is averaged over all parameters; otherwise (default), the loss is averaged over each parameter separately.average_over_sample_sizes::Bool = true
: if true (default), the loss is averaged over all sample sizes $m$; otherwise, the loss is averaged over each sample size separately.
NeuralEstimators.bias
— Functionbias(assessment::Assessment; ...)
Computes a Monte Carlo approximation of an estimator's bias,
\[{\rm{bias}}(\hat{\boldsymbol{\theta}}(\cdot)) \approx \frac{1}{K} \sum_{k=1}^K \hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)}) - \boldsymbol{\theta}^{(k)},\]
where $\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}$ denotes a set of $K$ parameter vectors sampled from the prior and, for each $k$, data $\boldsymbol{Z}^{(k)}$ are simulated from the statistical model conditional on $\boldsymbol{\theta}^{(k)}$.
This function inherits the keyword arguments of risk
(excluding the argument loss
).
NeuralEstimators.rmse
— Functionrmse(assessment::Assessment; ...)
Computes a Monte Carlo approximation of an estimator's root-mean-squared error,
\[{\rm{rmse}}(\hat{\boldsymbol{\theta}}(\cdot)) \approx \sqrt{\frac{1}{K} \sum_{k=1}^K (\hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)}) - \boldsymbol{\theta}^{(k)})^2},\]
where $\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}$ denotes a set of $K$ parameter vectors sampled from the prior and, for each $k$, data $\boldsymbol{Z}^{(k)}$ are simulated from the statistical model conditional on $\boldsymbol{\theta}^{(k)}$.
This function inherits the keyword arguments of risk
(excluding the argument loss
).
NeuralEstimators.coverage
— Functioncoverage(assessment::Assessment; ...)
Computes a Monte Carlo approximation of an interval estimator's expected coverage, as defined in Hermans et al. (2022, Definition 2.1), and the proportion of parameters below and above the lower and upper bounds, respectively.
Keyword arguments
average_over_parameters::Bool = false
: if true, the coverage is averaged over all parameters; otherwise (default), it is computed over each parameter separately.average_over_sample_sizes::Bool = true
: if true (default), the coverage is averaged over all sample sizes $m$; otherwise, it is computed over each sample size separately.
Inference with observed data
Inference using point estimators
Inference with a neural Bayes (point) estimator proceeds simply by applying the estimator θ̂
to the observed data Z
(possibly containing multiple data sets) in a call of the form θ̂(Z)
. To leverage a GPU, simply move the estimator and the data to the GPU using gpu()
; see also estimateinbatches()
to apply the estimator over batches of data, which can alleviate memory issues when working with a large number of data sets.
Uncertainty quantification often proceeds through the bootstrap distribution, which is essentially available "for free" when bootstrap data sets can be quickly generated; this is facilitated by bootstrap()
and interval()
. Alternatively, one may approximate a set of low and high marginal posterior quantiles using a specially constructed neural Bayes estimator, which can then be used to construct credible intervals: see IntervalEstimator
, QuantileEstimatorDiscrete
, and QuantileEstimatorContinuous
.
NeuralEstimators.bootstrap
— Functionbootstrap(θ̂, parameters::P, Z) where P <: Union{AbstractMatrix, ParameterConfigurations}
bootstrap(θ̂, parameters::P, simulator, m::Integer; B = 400) where P <: Union{AbstractMatrix, ParameterConfigurations}
bootstrap(θ̂, Z; B = 400, blocks = nothing)
Generates B
bootstrap estimates from an estimator θ̂
.
Parametric bootstrapping is facilitated by passing a single parameter configuration, parameters
, and corresponding simulated data, Z
, whose length implicitly defines B
. Alternatively, one may provide a simulator
and the desired sample size, in which case the data will be simulated using simulator(parameters, m)
.
Non-parametric bootstrapping is facilitated by passing a single data set, Z
. The argument blocks
caters for block bootstrapping, and it should be a vector of integers specifying the block for each replicate. For example, with 5 replicates, the first two corresponding to block 1 and the remaining three corresponding to block 2, blocks
should be [1, 1, 2, 2, 2]
. The resampling algorithm aims to produce resampled data sets that are of a similar size to Z
, but this can only be achieved exactly if all blocks are equal in length.
The keyword argument use_gpu
is a flag determining whether to use the GPU, if it is available (default true
).
The return type is a p × B
matrix, where p is the number of parameters in the model.
NeuralEstimators.interval
— Functioninterval(θ::Matrix; probs = [0.05, 0.95], parameter_names = nothing)
interval(estimator::IntervalEstimator, Z; parameter_names = nothing, use_gpu = true)
Compute a confidence interval based either on a $p$ × $B$ matrix θ
of parameters (typically containing bootstrap estimates or posterior draws) with $p$ the number of parameters in the model, or from an IntervalEstimator
and data Z
.
When given θ
, the intervals are constructed by compute quantiles with probability levels controlled by the keyword argument probs
.
The return type is a $p$ × 2 matrix, whose first and second columns respectively contain the lower and upper bounds of the interval. The rows of this matrix can be named by passing a vector of strings to the keyword argument parameter_names
.
Examples
using NeuralEstimators
p = 3
B = 50
θ = rand(p, B)
interval(θ)
Inference using likelihood and likelihood-to-evidence-ratio estimators
NeuralEstimators.mlestimate
— Functionmlestimate(estimator::RatioEstimator, Z; θ₀ = nothing, θ_grid = nothing, penalty::Function = θ -> 1, use_gpu = true)
Computes the (approximate) maximum likelihood estimate given data $\boldsymbol{Z}$,
\[\argmax_{\boldsymbol{\theta}} \ell(\boldsymbol{\theta} ; \boldsymbol{Z})\]
where $\ell(\cdot ; \cdot)$ denotes the approximate log-likelihood function derived from estimator
.
If a vector θ₀
of initial parameter estimates is given, the approximate likelihood is maximised by gradient descent (requires Optim.jl
to be loaded). Otherwise, if a matrix of parameters θ_grid
is given, the approximate likelihood is maximised by grid search.
A maximum penalised likelihood estimate,
\[\argmax_{\boldsymbol{\theta}} \ell(\boldsymbol{\theta} ; \boldsymbol{Z}) + \log p(\boldsymbol{\theta}),\]
can be obtained by specifying the keyword argument penalty
that defines the penalty term $p(\boldsymbol{\theta})$.
See also mapestimate()
for computing (approximate) maximum a posteriori estimates.
NeuralEstimators.mapestimate
— Functionmapestimate(estimator::RatioEstimator, Z; θ₀ = nothing, θ_grid = nothing, prior::Function = θ -> 1, use_gpu = true)
Computes the (approximate) maximum a posteriori estimate given data $\boldsymbol{Z}$,
\[\argmax_{\boldsymbol{\theta}} \ell(\boldsymbol{\theta} ; \boldsymbol{Z}) + \log p(\boldsymbol{\theta})\]
where $\ell(\cdot ; \cdot)$ denotes the approximate log-likelihood function derived from estimator
, and $p(\boldsymbol{\theta})$ denotes the prior density function controlled through the keyword argument prior
(by default, a uniform prior is used).
If a vector θ₀
of initial parameter estimates is given, the approximate posterior density is maximised by gradient descent (requires Optim.jl
to be loaded). Otherwise, if a matrix of parameters θ_grid
is given, the approximate posterior density is maximised by grid search.
See also mlestimate()
for computing (approximate) maximum likelihood estimates.
NeuralEstimators.sampleposterior
— Functionsampleposterior(estimator::RatioEstimator, Z, N::Integer = 1000; θ_grid, prior::Function = θ -> 1f0)
Samples from the approximate posterior distribution $p(\boldsymbol{\theta} \mid \boldsymbol{Z})$ implied by estimator
.
The positional argument N
controls the size of the posterior sample.
Currently, the sampling algorithm is based on a fine-gridding of the parameter space, specified through the keyword argument θ_grid
(or theta_grid
). The approximate posterior density is evaluated over this grid, which is then used to draw samples. This is very effective when making inference with a small number of parameters. For models with a large number of parameters, other sampling algorithms may be needed (please feel free to contact the package maintainer for discussion).
The prior distribution $p(\boldsymbol{\theta})$ is controlled through the keyword argument prior
(by default, a uniform prior is used).