Loss functions
In addition to the standard loss functions provided by Flux
(e.g., mae
, mse
, etc.), NeuralEstimators
provides the following loss functions.
NeuralEstimators.tanhloss
— Functiontanhloss(θ̂, θ, k; agg = mean, joint = true)
For k
> 0, computes the loss function,
\[L(θ̂, θ) = tanh(|θ̂ - θ|/k),\]
which approximates the 0-1 loss as k
→ 0. Compared with the kpowerloss
, which may also be used as a continuous surrogate for the 0-1 loss, the gradient of the tanh loss is bounded as |θ̂ - θ| → 0, which can improve numerical stability during training.
If joint = true
, the L₁ norm is computed over each parameter vector, so that, with k
close to zero, the resulting Bayes estimator is the mode of the joint posterior distribution; otherwise, if joint = false
, the Bayes estimator is the vector containing the modes of the marginal posterior distributions.
See also kpowerloss
.
NeuralEstimators.kpowerloss
— Functionkpowerloss(θ̂, θ, k; agg = mean, joint = true, safeorigin = true, ϵ = 0.1)
For k
> 0, the k
-th power absolute-distance loss function,
\[L(θ̂, θ) = |θ̂ - θ|ᵏ,\]
contains the squared-error, absolute-error, and 0-1 loss functions as special cases (the latter obtained in the limit as k
→ 0). It is Lipschitz continuous iff k
= 1, convex iff k
≥ 1, and strictly convex iff k
> 1: it is quasiconvex for all k
> 0.
If joint = true
, the L₁ norm is computed over each parameter vector, so that, with k
close to zero, the resulting Bayes estimator is the mode of the joint posterior distribution; otherwise, if joint = false
, the Bayes estimator is the vector containing the modes of the marginal posterior distributions.
If safeorigin = true
, the loss function is modified to avoid pathologies around the origin, so that the resulting loss function behaves similarly to the absolute-error loss in the ϵ
-interval surrounding the origin.
See also tanhloss
.
NeuralEstimators.quantileloss
— Functionquantileloss(θ̂, θ, τ; agg = mean)
quantileloss(θ̂, θ, τ::Vector; agg = mean)
The asymmetric quantile loss function,
\[ L(θ̂, θ; τ) = (θ̂ - θ)(𝕀(θ̂ - θ > 0) - τ),\]
where τ
∈ (0, 1) is a probability level and 𝕀(⋅) is the indicator function.
The method that takes τ
as a vector is useful for jointly approximating several quantiles of the posterior distribution. In this case, the number of rows in θ̂
is assumed to be $pr$, where $p$ is the number of parameters and $r$ is the number probability levels in τ
(i.e., the length of τ
).
Examples
p = 1
K = 10
θ = rand(p, K)
θ̂ = rand(p, K)
quantileloss(θ̂, θ, 0.1)
θ̂ = rand(3p, K)
quantileloss(θ̂, θ, [0.1, 0.5, 0.9])
p = 2
θ = rand(p, K)
θ̂ = rand(p, K)
quantileloss(θ̂, θ, 0.1)
θ̂ = rand(3p, K)
quantileloss(θ̂, θ, [0.1, 0.5, 0.9])
NeuralEstimators.intervalscore
— Functionintervalscore(l, u, θ, α; agg = mean)
intervalscore(θ̂, θ, α; agg = mean)
intervalscore(assessment::Assessment; average_over_parameters::Bool = false, average_over_sample_sizes::Bool = true)
Given an interval [l
, u
] with nominal coverage 100×(1-α
)% and true value θ
, the interval score is defined by
\[S(l, u, θ; α) = (u - l) + 2α⁻¹(l - θ)𝕀(θ < l) + 2α⁻¹(θ - u)𝕀(θ > u),\]
where α
∈ (0, 1) and 𝕀(⋅) is the indicator function.
The method that takes a single value θ̂
assumes that θ̂
is a matrix with $2p$ rows, where $p$ is the number of parameters in the statistical model. Then, the first and second set of $p$ rows will be used as l
and u
, respectively.
For further discussion, see Section 6 of Gneiting, T. and Raftery, A. E. (2007), "Strictly proper scoring rules, prediction, and estimation", Journal of the American statistical Association, 102, 359–378.