Loss functions

When training an estimator of type PointEstimator, a loss function must be specified that determines the Bayes estimator that will be approximated. In addition to the standard loss functions provided by Flux (e.g., mae, mse, which allow for the approximation of posterior medians and means, respectively), the following loss functions are provided with the package.

tanhloss(θ̂, θ, κ; agg = mean)

For κ > 0, computes the loss function given in Sainsbury-Dale et al. (2025; Eqn. 14), namely,

\[L(\hat{\boldsymbol{\theta}}, \boldsymbol{\theta}) = \tanh\big\|\hat{\boldsymbol{\theta}} - \boldsymbol{\theta}\|_1/\kappa\big),\]

which yields the 0-1 loss function in the limit κ → 0.

Compared with the kpowerloss(), which may also be used as a continuous approximation of the 0–1 loss function, the gradient of this loss is bounded as $\|\hat{\boldsymbol{\theta}} - \boldsymbol{\theta}\|_1 \to 0$, which can improve numerical stability during training.

See also kpowerloss().

kpowerloss(θ̂, θ, κ; agg = mean, safeorigin = true, ϵ = 0.1)

For κ > 0, the κ-th power absolute-distance loss function,

\[L(\hat{\boldsymbol{\theta}}, \boldsymbol{\theta}) = \|\hat{\boldsymbol{\theta}} - \boldsymbol{\theta}\|_1^\kappa,\]

contains the squared-error (κ = 2), absolute-error (κ = 2), and 0–1 (κ → 0) loss functions as special cases. It is Lipschitz continuous if κ = 1, convex if κ ≥ 1, and strictly convex if κ > 1. It is quasiconvex for all κ > 0.

If safeorigin = true, the loss function is modified to be piecewise, continuous, and linear in the ϵ-interval surrounding the origin, to avoid pathologies around the origin.

See also tanhloss().

quantileloss(θ̂, θ, τ; agg = mean)
quantileloss(θ̂, θ, τ::Vector; agg = mean)

The asymmetric quantile loss function,

\[ L(θ̂, θ; τ) = (θ̂ - θ)(𝕀(θ̂ - θ > 0) - τ),\]

where τ ∈ (0, 1) is a probability level and 𝕀(⋅) is the indicator function.

The method that takes τ as a vector is useful for jointly approximating several quantiles of the posterior distribution. In this case, the number of rows in θ̂ is assumed to be $dr$, where $d$ is the number of parameters and $r$ is the number probability levels in τ (i.e., the length of τ).

intervalscore(l, u, θ, α; agg = mean)
intervalscore(θ̂, θ, α; agg = mean)
intervalscore(assessment::Assessment; average_over_parameters::Bool = false, average_over_sample_sizes::Bool = true)

Given an interval [l, u] with nominal coverage 100×(1-α)% and true value θ, the interval score (Gneiting and Raftery, 2007) is defined as

\[S(l, u, θ; α) = (u - l) + 2α⁻¹(l - θ)𝕀(θ < l) + 2α⁻¹(θ - u)𝕀(θ > u),\]

where α ∈ (0, 1) and 𝕀(⋅) is the indicator function.

The method that takes a single value θ̂ assumes that θ̂ is a matrix with $2d$ rows, where $d$ is the dimension of the parameter vector to make inference on. The first and second sets of $d$ rows will be used as l and u, respectively.
