Loss functions
When training an estimator of type PointEstimator
, a loss function must be specified that determines the Bayes estimator that will be approximated. In addition to the standard loss functions provided by Flux
(e.g., mae
, mse
, which allow for the approximation of posterior medians and means, respectively), the following loss functions are provided with the package.
NeuralEstimators.tanhloss
— Functiontanhloss(θ̂, θ, κ; agg = mean)
For κ
> 0, computes the loss function given in Sainsbury-Dale et al. (2025; Eqn. 14), namely,
\[L(\hat{\boldsymbol{\theta}}, \boldsymbol{\theta}) = \tanh\big\|\hat{\boldsymbol{\theta}} - \boldsymbol{\theta}\|_1/\kappa\big),\]
which yields the 0-1 loss function in the limit κ
→ 0.
Compared with the kpowerloss()
, which may also be used as a continuous approximation of the 0–1 loss function, the gradient of this loss is bounded as $\|\hat{\boldsymbol{\theta}} - \boldsymbol{\theta}\|_1 \to 0$, which can improve numerical stability during training.
See also kpowerloss()
.
NeuralEstimators.kpowerloss
— Functionkpowerloss(θ̂, θ, κ; agg = mean, safeorigin = true, ϵ = 0.1)
For κ
> 0, the κ
-th power absolute-distance loss function,
\[L(\hat{\boldsymbol{\theta}}, \boldsymbol{\theta}) = \|\hat{\boldsymbol{\theta}} - \boldsymbol{\theta}\|_1^\kappa,\]
contains the squared-error (κ
= 2), absolute-error (κ
= 2), and 0–1 (κ
→ 0) loss functions as special cases. It is Lipschitz continuous if κ
= 1, convex if κ
≥ 1, and strictly convex if κ
> 1. It is quasiconvex for all κ
> 0.
If safeorigin = true
, the loss function is modified to be piecewise, continuous, and linear in the ϵ
-interval surrounding the origin, to avoid pathologies around the origin.
See also tanhloss()
.
NeuralEstimators.quantileloss
— Functionquantileloss(θ̂, θ, τ; agg = mean)
quantileloss(θ̂, θ, τ::Vector; agg = mean)
The asymmetric quantile loss function,
\[ L(θ̂, θ; τ) = (θ̂ - θ)(𝕀(θ̂ - θ > 0) - τ),\]
where τ
∈ (0, 1) is a probability level and 𝕀(⋅) is the indicator function.
The method that takes τ
as a vector is useful for jointly approximating several quantiles of the posterior distribution. In this case, the number of rows in θ̂
is assumed to be $dr$, where $d$ is the number of parameters and $r$ is the number probability levels in τ
(i.e., the length of τ
).
NeuralEstimators.intervalscore
— Functionintervalscore(l, u, θ, α; agg = mean)
intervalscore(θ̂, θ, α; agg = mean)
intervalscore(assessment::Assessment; average_over_parameters::Bool = false, average_over_sample_sizes::Bool = true)
Given an interval [l
, u
] with nominal coverage 100×(1-α
)% and true value θ
, the interval score (Gneiting and Raftery, 2007) is defined as
\[S(l, u, θ; α) = (u - l) + 2α⁻¹(l - θ)𝕀(θ < l) + 2α⁻¹(θ - u)𝕀(θ > u),\]
where α
∈ (0, 1) and 𝕀(⋅) is the indicator function.
The method that takes a single value θ̂
assumes that θ̂
is a matrix with $2d$ rows, where $d$ is the dimension of the parameter vector to make inference on. The first and second sets of $d$ rows will be used as l
and u
, respectively.