Loss functions

In addition to the standard loss functions provided by Flux (e.g., mae, mse, etc.), NeuralEstimators provides the following loss functions.

NeuralEstimators.tanhlossFunction
tanhloss(θ̂, θ, k; agg = mean, joint = true)

For k > 0, computes the loss function,

\[L(θ̂, θ) = tanh(|θ̂ - θ|/k),\]

which approximates the 0-1 loss as k → 0. Compared with the kpowerloss, which may also be used as a continuous surrogate for the 0-1 loss, the gradient of the tanh loss is bounded as |θ̂ - θ| → 0, which can improve numerical stability during training.

If joint = true, the L₁ norm is computed over each parameter vector, so that, with k close to zero, the resulting Bayes estimator is the mode of the joint posterior distribution; otherwise, if joint = false, the Bayes estimator is the vector containing the modes of the marginal posterior distributions.

See also kpowerloss.

source
NeuralEstimators.kpowerlossFunction
kpowerloss(θ̂, θ, k; agg = mean, joint = true, safeorigin = true, ϵ = 0.1)

For k > 0, the k-th power absolute-distance loss function,

\[L(θ̂, θ) = |θ̂ - θ|ᵏ,\]

contains the squared-error, absolute-error, and 0-1 loss functions as special cases (the latter obtained in the limit as k → 0). It is Lipschitz continuous iff k = 1, convex iff k ≥ 1, and strictly convex iff k > 1: it is quasiconvex for all k > 0.

If joint = true, the L₁ norm is computed over each parameter vector, so that, with k close to zero, the resulting Bayes estimator is the mode of the joint posterior distribution; otherwise, if joint = false, the Bayes estimator is the vector containing the modes of the marginal posterior distributions.

If safeorigin = true, the loss function is modified to avoid pathologies around the origin, so that the resulting loss function behaves similarly to the absolute-error loss in the ϵ-interval surrounding the origin.

See also tanhloss.

source
NeuralEstimators.quantilelossFunction
quantileloss(θ̂, θ, τ; agg = mean)
quantileloss(θ̂, θ, τ::Vector; agg = mean)

The asymmetric quantile loss function,

\[ L(θ̂, θ; τ) = (θ̂ - θ)(𝕀(θ̂ - θ > 0) - τ),\]

where τ ∈ (0, 1) is a probability level and 𝕀(⋅) is the indicator function.

The method that takes τ as a vector is useful for jointly approximating several quantiles of the posterior distribution. In this case, the number of rows in θ̂ is assumed to be $pr$, where $p$ is the number of parameters and $r$ is the number probability levels in τ (i.e., the length of τ).

Examples

p = 1
K = 10
θ = rand(p, K)
θ̂ = rand(p, K)
quantileloss(θ̂, θ, 0.1)

θ̂ = rand(3p, K)
quantileloss(θ̂, θ, [0.1, 0.5, 0.9])

p = 2
θ = rand(p, K)
θ̂ = rand(p, K)
quantileloss(θ̂, θ, 0.1)

θ̂ = rand(3p, K)
quantileloss(θ̂, θ, [0.1, 0.5, 0.9])
source
NeuralEstimators.intervalscoreFunction
intervalscore(l, u, θ, α; agg = mean)
intervalscore(θ̂, θ, α; agg = mean)
intervalscore(assessment::Assessment; average_over_parameters::Bool = false, average_over_sample_sizes::Bool = true)

Given an interval [l, u] with nominal coverage 100×(1-α)% and true value θ, the interval score is defined by

\[S(l, u, θ; α) = (u - l) + 2α⁻¹(l - θ)𝕀(θ < l) + 2α⁻¹(θ - u)𝕀(θ > u),\]

where α ∈ (0, 1) and 𝕀(⋅) is the indicator function.

The method that takes a single value θ̂ assumes that θ̂ is a matrix with $2p$ rows, where $p$ is the number of parameters in the statistical model. Then, the first and second set of $p$ rows will be used as l and u, respectively.

For further discussion, see Section 6 of Gneiting, T. and Raftery, A. E. (2007), "Strictly proper scoring rules, prediction, and estimation", Journal of the American statistical Association, 102, 359–378.

source