Model-specific functions

Data simulators

The philosophy of NeuralEstimators is to cater for arbitrary statistical models by having the user define their statistical model implicitly through simulated data. However, the following functions have been included as they may be helpful to others, and their source code illustrates how a user could formulate code for their own model.

See also Distributions.jl for a large range of distributions implemented in Julia, and the package RCall for calling R functions within Julia.

NeuralEstimators.simulategaussianFunction
simulategaussian(L::AbstractMatrix, m = 1)

Simulates m independent and identically distributed (i.i.d.) realisations from a mean-zero multivariate Gaussian random variable with associated lower Cholesky factor L.

If m is not specified, the simulated data are returned as a vector with length equal to the number of spatial locations, $n$; otherwise, the data are returned as an $n$xm matrix.

Examples

using NeuralEstimators, Distances, LinearAlgebra

n = 500
ρ = 0.6
ν = 1.0
S = rand(n, 2)
D = pairwise(Euclidean(), S, dims = 1)
Σ = Symmetric(matern.(D, ρ, ν))
L = cholesky(Σ).L
simulategaussian(L)
source
NeuralEstimators.simulatepottsFunction
simulatepotts(grid::Matrix{Int}, β)
simulatepotts(grid::Matrix{Union{Int, Nothing}}, β)
simulatepotts(nrows::Int, ncols::Int, num_states::Int, β)

Chequerboard Gibbs sampling from 2D Potts model with parameter β>0.

Approximately independent simulations can be obtained by setting nsims > 1 or num_iterations > burn. The degree to which the resulting simulations can be considered independent depends on the thinning factor (thin) and the burn-in (burn).

Keyword arguments

  • nsims = 1: number of approximately independent replicates.
  • num_iterations = 2000: number of MCMC iterations.
  • burn = num_iterations: burn-in.
  • thin = 10: thinning factor.

Examples

using NeuralEstimators 

## Marginal simulation 
β = 0.8
simulatepotts(10, 10, 5, β)

## Marginal simulation: approximately independent samples 
simulatepotts(10, 10, 5, β; nsims = 100, thin = 10)

## Conditional simulation 
β = 0.8
complete_grid   = simulatepotts(50, 50, 2, β)        # simulate marginally from the Ising model 
incomplete_grid = removedata(complete_grid, 0.1)     # remove 10% of the pixels at random  
imputed_grid    = simulatepotts(incomplete_grid, β)  # conditionally simulate over missing pixels

## Multiple conditional simulations 
imputed_grids   = simulatepotts(incomplete_grid, β; num_iterations = 2000, burn = 1000, thin = 10)

## Recreate Fig. 8.8 of Marin & Robert (2007) “Bayesian Core”
using Plots 
grids = [simulatepotts(100, 100, 2, β) for β ∈ 0.3:0.1:1.2]
heatmaps = heatmap.(grids, legend = false, aspect_ratio=1)
Plots.plot(heatmaps...)
source
NeuralEstimators.simulateschlatherFunction
simulateschlather(L::Matrix, m = 1; C = 3.5, Gumbel::Bool = false)

Simulates m independent and identically distributed (i.i.d.) realisations from Schlather's max-stable model using the algorithm for approximate simulation given by Schlather (2002).

Requires the lower Cholesky factor L associated with the covariance matrix of the underlying Gaussian process.

If m is not specified, the simulated data are returned as a vector with length equal to the number of spatial locations, $n$; otherwise, the data are returned as an $n$xm matrix.

Keyword arguments

  • C = 3.5: a tuning parameter that controls the accuracy of the algorithm: small C favours computational efficiency, while large C favours accuracy. Schlather (2002) recommends the use of C = 3.
  • Gumbel = true: flag indicating whether the data should be log-transformed from the unit Fréchet scale to the Gumbel scale.

Examples

using NeuralEstimators, Distances, LinearAlgebra

n = 500
ρ = 0.6
ν = 1.0
S = rand(n, 2)
D = pairwise(Euclidean(), S, dims = 1)
Σ = Symmetric(matern.(D, ρ, ν))
L = cholesky(Σ).L
simulateschlather(L)
source

Spatial point processes

NeuralEstimators.maternclusterprocessFunction
maternclusterprocess(; λ=10, μ=10, r=0.1, xmin=0, xmax=1, ymin=0, ymax=1, unit_bounding_box=false)

Simulates a Matérn cluster process with density of parent Poisson point process λ, mean number of daughter points μ, and radius of cluster disk r, over the simulation window defined by xmin and xmax, ymin and ymax.

If unit_bounding_box is true, then the simulated points will be scaled so that the longest side of their bounding box is equal to one (this may change the simulation window).

See also the R package spatstat, which provides functions for simulating from a range of point processes and which can be interfaced from Julia using RCall.

Examples

using NeuralEstimators

# Simulate a realisation from a Matérn cluster process
S = maternclusterprocess()

# Visualise realisation (requires UnicodePlots)
using UnicodePlots
scatterplot(S[:, 1], S[:, 2])

# Visualise realisations from the cluster process with varying parameters
n = 250
λ = [10, 25, 50, 90]
μ = n ./ λ
plots = map(eachindex(λ)) do i
	S = maternclusterprocess(λ = λ[i], μ = μ[i])
	scatterplot(S[:, 1], S[:, 2])
end
source

Covariance functions

These covariance functions may be of use for various models.

NeuralEstimators.maternFunction
matern(h, ρ, ν, σ² = 1)

Given distance $\|\boldsymbol{h}\|$ (h), computes the Matérn covariance function,

\[C(\|\boldsymbol{h}\|) = \sigma^2 \frac{2^{1 - \nu}}{\Gamma(\nu)} \left(\frac{\|\boldsymbol{h}\|}{\rho}\right)^\nu K_\nu \left(\frac{\|\boldsymbol{h}\|}{\rho}\right),\]

where ρ is a range parameter, ν is a smoothness parameter, σ² is the marginal variance, $\Gamma(\cdot)$ is the gamma function, and $K_\nu(\cdot)$ is the modified Bessel function of the second kind of order $\nu$.

source
NeuralEstimators.paciorekFunction
paciorek(s, r, ω₁, ω₂, ρ, β)

Given spatial locations s and r, computes the nonstationary covariance function,

\[C(\boldsymbol{s}, \boldsymbol{r}) = |\boldsymbol{\Sigma}(\boldsymbol{s})|^{1/4} |\boldsymbol{\Sigma}(\boldsymbol{r})|^{1/4} \left|\frac{\boldsymbol{\Sigma}(\boldsymbol{s}) + \boldsymbol{\Sigma}(\boldsymbol{r})}{2}\right|^{-1/2} C^0\big(\sqrt{Q(\boldsymbol{s}, \boldsymbol{r})}\big), \]

where $C^0(h) = \exp\{-(h/\rho)^{3/2}\}$ for range parameter $\rho > 0$, the matrix $\boldsymbol{\Sigma}(\boldsymbol{s}) = \exp(\beta\|\boldsymbol{s} - \boldsymbol{\omega}\|)\boldsymbol{I}$ is a kernel matrix (Paciorek and Schervish, 2006) with scale parameter $\beta > 0$ and $\boldsymbol{\omega} \equiv (\omega_1, \omega_2)' \in \mathcal{D}$, and

\[Q(\boldsymbol{s}, \boldsymbol{r}) = (\boldsymbol{s} - \boldsymbol{r})' \left(\frac{\boldsymbol{\Sigma}(\boldsymbol{s}) + \boldsymbol{\Sigma}(\boldsymbol{r})}{2}\right)^{-1} (\boldsymbol{s} - \boldsymbol{r})\]

is the squared Mahalanobis distance between $\boldsymbol{s}$ and $\boldsymbol{r}$.

source

Density functions

Density functions are not needed in the workflow of NeuralEstimators. However, as part of a series of comparison studies between neural estimators and likelihood-based estimators given in various paper, we have developed the following functions for evaluating the density function for several popular distributions. We include these in NeuralEstimators to cater for the possibility that they may be of use in future comparison studies.

NeuralEstimators.gaussiandensityFunction
gaussiandensity(y::V, L::LT) where {V <: AbstractVector, LT <: LowerTriangular}
gaussiandensity(y::A, L::LT) where {A <: AbstractArray, LT <: LowerTriangular}
gaussiandensity(y::A, Σ::M) where {A <: AbstractArray, M <: AbstractMatrix}

Efficiently computes the density function for y ~ 𝑁(0, Σ) for covariance matrix Σ, and where L is lower Cholesky factor of Σ.

The method gaussiandensity(y::A, L::LT) assumes that the last dimension of y contains independent and identically distributed (iid) replicates.

The log-density is returned if the keyword argument logdensity is true (default).

source