Miscellaneous

Core

These functions can appear during the core workflow, and may need to be overloaded in some applications.

NeuralEstimators.numberreplicatesFunction
numberreplicates(Z)

Generic function that returns the number of replicates in a given object. Default implementations are provided for commonly used data formats, namely, data stored as an Array or as a GNNGraph.

source
NeuralEstimators.subsetdataFunction
subsetdata(Z::V, i) where {V <: AbstractArray{A}} where {A <: Any}
subsetdata(Z::A, i) where {A <: AbstractArray{T, N}} where {T, N}
subsetdata(Z::G, i) where {G <: AbstractGraph}

Return replicate(s) i from each data set in Z.

If the user is working with data that are not covered by the default methods, simply overload the function with the appropriate type for Z.

For graphical data, calls getgraph(), where the replicates are assumed be to stored as batched graphs. Since this can be slow, one should consider using a method of train() that does not require the data to be subsetted when working with graphical data (use numberreplicates() to check that the training and validation data sets are equally replicated, which prevents subsetting).

Examples

using NeuralEstimators
using GraphNeuralNetworks
using Flux: batch

d = 1  # dimension of the response variable
n = 4  # number of observations in each realisation
m = 6  # number of replicates in each data set
K = 2  # number of data sets

# Array data
Z = [rand(n, d, m) for k ∈ 1:K]
subsetdata(Z, 2)   # extract second replicate from each data set
subsetdata(Z, 1:3) # extract first 3 replicates from each data set

# Graphical data
e = 8 # number of edges
Z = [batch([rand_graph(n, e, ndata = rand(d, n)) for _ ∈ 1:m]) for k ∈ 1:K]
subsetdata(Z, 2)   # extract second replicate from each data set
subsetdata(Z, 1:3) # extract first 3 replicates from each data set
source
NeuralEstimators.subsetparametersFunction
subsetparameters(parameters::M, indices) where {M <: AbstractMatrix}
subsetparameters(parameters::P, indices) where {P <: ParameterConfigurations}

Subset parameters using a collection of indices.

Arrays in parameters::P with last dimension equal in size to the number of parameter configurations, K, are also subsetted (over their last dimension) using indices. All other fields are left unchanged. To modify this default behaviour, overload subsetparameters.

source

Downstream-inference algorithms

NeuralEstimators.EMType
EM(simulateconditional::Function, MAP::Union{Function, NeuralEstimator}, θ₀ = nothing)

Implements the (Bayesian) Monte Carlo expectation-maximisation (EM) algorithm, with $l$th iteration

\[\boldsymbol{\theta}^{(l)} = \argmax_{\boldsymbol{\theta}} \sum_{h = 1}^H \ell(\boldsymbol{\theta}; \boldsymbol{Z}_1, \boldsymbol{Z}_2^{(lh)}) + H\log \pi(\boldsymbol{\theta})\]

where $\ell(\cdot)$ is the complete-data log-likelihood function, $\boldsymbol{Z} \equiv (\boldsymbol{Z}_1', \boldsymbol{Z}_2')'$ denotes the complete data with $\boldsymbol{Z}_1$ and $\boldsymbol{Z}_2$ the observed and missing components, respectively, $\boldsymbol{Z}_2^{(lh)}$, $h = 1, \dots, H$, is simulated from the distribution of $\boldsymbol{Z}_2 \mid \boldsymbol{Z}_1, \boldsymbol{\theta}^{(l-1)}$, and $\pi(\boldsymbol{\theta})$ denotes the prior density.

Fields

The function simulateconditional should have a signature of the form,

simulateconditional(Z::A, θ; nsims = 1) where {A <: AbstractArray{Union{Missing, T}}} where T

The output of simulateconditional should be the completed-data Z, and it should be returned in whatever form is appropriate to be passed to the MAP estimator as MAP(Z). For example, if the data are gridded and the MAP is a neural MAP estimator based on a CNN architecture, then Z should be returned as a four-dimensional array.

The field MAP can be a function (to facilitate the conventional Monte Carlo EM algorithm) or a NeuralEstimator (to facilitate the so-called neural EM algorithm).

The starting values θ₀ may be provided during initialisation (as a vector), or when applying the EM object to data (see below). The starting values given in a function call take precedence over those stored in the object.

Methods

Once constructed, obects of type EM can be applied to data via the methods,

(em::EM)(Z::A, θ₀::Union{Nothing, Vector} = nothing; ...) where {A <: AbstractArray{Union{Missing, T}, N}} where {T, N}
(em::EM)(Z::V, θ₀::Union{Nothing, Vector, Matrix} = nothing; ...) where {V <: AbstractVector{A}} where {A <: AbstractArray{Union{Missing, T}, N}} where {T, N}

where Z is the complete data containing the observed data and Missing values. Note that the second method caters for the case that one has multiple data sets. The keyword arguments are:

  • nsims = 1: the number $H$ of conditional simulations in each iteration.
  • niterations = 50: the maximum number of iterations.
  • nconsecutive = 3: the number of consecutive iterations for which the convergence criterion must be met.
  • ϵ = 0.01: tolerance used to assess convergence; the algorithm halts if the relative change in parameter values in successive iterations is less than ϵ.
  • return_iterates::Bool: if true, the estimate at each iteration of the algorithm is returned; otherwise, only the final estimate is returned.
  • ξ = nothing: model information needed for conditional simulation (e.g., distance matrices) or in the MAP estimator.
  • use_ξ_in_simulateconditional::Bool = false: if set to true, the conditional simulator is called as simulateconditional(Z, θ, ξ; nsims = nsims).
  • use_ξ_in_MAP::Bool = false: if set to true, the MAP estimator is called as MAP(Z, ξ).
  • use_gpu::Bool = true
  • verbose::Bool = false

Examples

# See the "Missing data" section in "Advanced usage"
source

Utility functions

NeuralEstimators.adjacencymatrixFunction
adjacencymatrix(S::Matrix, k::Integer; maxmin = false, combined = false)
adjacencymatrix(S::Matrix, r::AbstractFloat)
adjacencymatrix(S::Matrix, r::AbstractFloat, k::Integer; random = true)
adjacencymatrix(M::Matrix; k, r, kwargs...)

Computes a spatially weighted adjacency matrix from spatial locations S based on either the k-nearest neighbours of each location; all nodes within a disc of fixed radius r; or, if both r and k are provided, a subset of k neighbours within a disc of fixed radius r.

Several subsampling strategies are possible when choosing a subset of k neighbours within a disc of fixed radius r. If random=true (default), the neighbours are randomly selected from within the disc (note that this also approximately preserves the distribution of distances within the neighbourhood set). If random=false, a deterministic algorithm is used that aims to preserve the distribution of distances within the neighbourhood set, by choosing those nodes with distances to the central node corresponding to the $\{0, \frac{1}{k}, \frac{2}{k}, \dots, \frac{k-1}{k}, 1\}$ quantiles of the empirical distribution function of distances within the disc. (This algorithm in fact yields $k+1$ neighbours, since both the closest and furthest nodes are always included.) Otherwise,

If maxmin=false (default) the k-nearest neighbours are chosen based on all points in the graph. If maxmin=true, a so-called maxmin ordering is applied, whereby an initial point is selected, and each subsequent point is selected to maximise the minimum distance to those points that have already been selected. Then, the neighbours of each point are defined as the k-nearest neighbours amongst the points that have already appeared in the ordering. If combined=true, the neighbours are defined to be the union of the k-nearest neighbours and the k-nearest neighbours subject to a maxmin ordering.

If S is a square matrix, it is treated as a distance matrix; otherwise, it should be an $n$ x $d$ matrix, where $n$ is the number of spatial locations and $d$ is the spatial dimension (typically $d$ = 2). In the latter case, the distance metric is taken to be the Euclidean distance. Note that use of a maxmin ordering currently requires a matrix of spatial locations (not a distance matrix).

By convention with the functionality in GraphNeuralNetworks.jl which is based on directed graphs, the neighbours of location i are stored in the column A[:, i] where A is the returned adjacency matrix. Therefore, the number of neighbours for each location is given by collect(mapslices(nnz, A; dims = 1)), and the number of times each node is a neighbour of another node is given by collect(mapslices(nnz, A; dims = 2)).

By convention, we do not consider a location to neighbour itself (i.e., the diagonal elements of the adjacency matrix are zero).

Examples

using NeuralEstimators, Distances, SparseArrays

n = 250
d = 2
S = rand(Float32, n, d)
k = 10
r = 0.10

# Memory efficient constructors
adjacencymatrix(S, k)
adjacencymatrix(S, k; maxmin = true)
adjacencymatrix(S, k; maxmin = true, combined = true)
adjacencymatrix(S, r)
adjacencymatrix(S, r, k)
adjacencymatrix(S, r, k; random = false)

# Construct from full distance matrix D
D = pairwise(Euclidean(), S, dims = 1)
adjacencymatrix(D, k)
adjacencymatrix(D, r)
adjacencymatrix(D, r, k)
adjacencymatrix(D, r, k; random = false)
source
NeuralEstimators.containertypeFunction
containertype(A::Type)
containertype(::Type{A}) where A <: SubArray
containertype(a::A) where A

Returns the container type of its argument.

If given a SubArray, returns the container type of the parent array.

Examples

a = rand(3, 4)
containertype(a)
containertype(typeof(a))
[containertype(x) for x ∈ eachcol(a)]
source
NeuralEstimators.encodedataFunction
encodedata(Z::A; c::T = zero(T)) where {A <: AbstractArray{Union{Missing, T}, N}} where T, N

For data Z with missing entries, returns an encoded data set (U, W) where W encodes the missingness pattern as an indicator vector and U is the original data Z with missing entries replaced by a fixed constant c.

The indicator vector W is stored in the second-to-last dimension of Z, which should be singleton. If the second-to-last dimension is not singleton, then two singleton dimensions will be added to the array, and W will be stored in the new second-to-last dimension.

Examples

using NeuralEstimators

# Generate some missing data
Z = rand(16, 16, 1, 1)
Z = removedata(Z, 0.25)	 # remove 25% of the data

# Encode the data
UW = encodedata(Z)
source
NeuralEstimators.estimateinbatchesFunction
estimateinbatches(θ̂, z, t = nothing; batchsize::Integer = 32, use_gpu::Bool = true, kwargs...)

Apply the estimator θ̂ on minibatches of z (and optionally other set-level information t) of size batchsize.

This can prevent memory issues that can occur with large data sets, particularly on the GPU.

Minibatching will only be done if there are multiple data sets in z; this will be inferred by z being a vector, or a tuple whose first element is a vector.

source
NeuralEstimators.IndicatorWeightsType
IndicatorWeights(h_max, n_bins::Integer)
(w::IndicatorWeights)(h::Matrix)

For spatial locations $\boldsymbol{s}$ and $\boldsymbol{u}$, creates a spatial weight function defined as

\[\boldsymbol{w}(\boldsymbol{s}, \boldsymbol{u}) \equiv (\mathbb{I}(h \in B_k) : k = 1, \dots, K)',\]

where $\mathbb{I}(\cdot)$ denotes the indicator function, $h \equiv \|\boldsymbol{s} - \boldsymbol{u} \|$ is the spatial distance between $\boldsymbol{s}$ and $\boldsymbol{u}$, and $\{B_k : k = 1, \dots, K\}$ is a set of $K =$n_bins equally-sized distance bins covering the spatial distances between 0 and h_max.

Examples

using NeuralEstimators 

h_max = 1
n_bins = 10
w = IndicatorWeights(h_max, n_bins)
h = rand(1, 30) # distances between 30 pairs of spatial locations 
w(h)
source
NeuralEstimators.initialise_estimatorFunction
initialise_estimator(p::Integer; ...)

Initialise a neural estimator for a statistical model with p unknown parameters.

The estimator is couched in the DeepSets framework (see DeepSet) so that it can be applied to data sets containing an arbitrary number of independent replicates (including the special case of a single replicate).

Note also that the user is free to initialise their neural estimator however they see fit using arbitrary Flux code; see here for Flux's API reference.

Finally, the method with positional argument data_typeis a wrapper that allows one to specify the type of their data (either "unstructured", "gridded", or "irregular_spatial").

Keyword arguments

  • architecture::String: for unstructured multivariate data, one may use a fully-connected multilayer perceptron ("MLP"); for data collected over a grid, a convolutional neural network ("CNN"); and for graphical or irregular spatial data, a graphical neural network ("GNN").
  • d::Integer = 1: for unstructured multivariate data (i.e., when architecture = "MLP"), the dimension of the data (e.g., d = 3 for trivariate data); otherwise, if architecture ∈ ["CNN", "GNN"], the argument d controls the number of input channels (e.g., d = 1 for univariate spatial processes).
  • estimator_type::String = "point": the type of estimator; either "point" or "interval".
  • depth = 3: the number of hidden layers; either a single integer or an integer vector of length two specifying the depth of the inner (summary) and outer (inference) network of the DeepSets framework.
  • width = 32: a single integer or an integer vector of length sum(depth) specifying the width (or number of convolutional filters/channels) in each hidden layer.
  • activation::Function = relu: the (non-linear) activation function of each hidden layer.
  • activation_output::Function = identity: the activation function of the output layer.
  • variance_stabiliser::Union{Nothing, Function} = nothing: a function that will be applied directly to the input, usually to stabilise the variance.
  • kernel_size = nothing: (applicable only to CNNs) a vector of length depth[1] containing integer tuples of length D, where D is the dimension of the convolution (e.g., D = 2 for two-dimensional convolution).
  • weight_by_distance::Bool = true: (applicable only to GNNs) flag indicating whether the estimator will weight by spatial distance; if true, a SpatialGraphConv layer is used in the propagation module; otherwise, a regular GraphConv layer is used.
  • probs = [0.025, 0.975]: (applicable only if estimator_type = "interval") probability levels defining the lower and upper endpoints of the posterior credible interval.

Examples

## MLP, GNN, 1D CNN, and 2D CNN for a statistical model with two parameters:
p = 2
initialise_estimator(p, architecture = "MLP")
initialise_estimator(p, architecture = "GNN")
initialise_estimator(p, architecture = "CNN", kernel_size = [10, 5, 3])
initialise_estimator(p, architecture = "CNN", kernel_size = [(10, 10), (5, 5), (3, 3)])
source
NeuralEstimators.materncholsFunction
maternchols(D, ρ, ν, σ² = 1; stack = true)

Given a matrix D of distances, constructs the Cholesky factor of the covariance matrix under the Matérn covariance function with range parameter ρ, smoothness parameter ν, and marginal variance σ².

Providing vectors of parameters will yield a three-dimensional array of Cholesky factors (note that the vectors must of the same length, but a mix of vectors and scalars is allowed). A vector of distance matrices D may also be provided.

If stack = true, the Cholesky factors will be "stacked" into a three-dimensional array (this is only possible if all distance matrices in D are the same size).

Examples

using NeuralEstimators
using LinearAlgebra: norm
n  = 10
S  = rand(n, 2)
D  = [norm(sᵢ - sⱼ) for sᵢ ∈ eachrow(S), sⱼ ∈ eachrow(S)]
ρ  = [0.6, 0.5]
ν  = [0.7, 1.2]
σ² = [0.2, 0.4]
maternchols(D, ρ, ν)
maternchols([D], ρ, ν)
maternchols(D, ρ, ν, σ²; stack = false)

S̃  = rand(n, 2)
D̃  = [norm(sᵢ - sⱼ) for sᵢ ∈ eachrow(S̃), sⱼ ∈ eachrow(S̃)]
maternchols([D, D̃], ρ, ν, σ²)
maternchols([D, D̃], ρ, ν, σ²; stack = false)

S̃  = rand(2n, 2)
D̃  = [norm(sᵢ - sⱼ) for sᵢ ∈ eachrow(S̃), sⱼ ∈ eachrow(S̃)]
maternchols([D, D̃], ρ, ν, σ²; stack = false)
source
NeuralEstimators.removedataFunction
removedata(Z::Array, Iᵤ::Vector{Integer})
removedata(Z::Array, p::Union{Float, Vector{Float}}; prevent_complete_missing = true)
removedata(Z::Array, n::Integer; fixed_pattern = false, contiguous_pattern = false, variable_proportion = false)

Replaces elements of Z with missing.

The simplest method accepts a vector of integers Iᵤ that give the specific indices of the data to be removed.

Alternatively, there are two methods available to generate data that are missing completely at random (MCAR).

First, a vector p may be given that specifies the proportion of missingness for each element in the response vector. Hence, p should have length equal to the dimension of the response vector. If a single proportion is given, it will be replicated accordingly. If prevent_complete_missing = true, no replicates will contain 100% missingness (note that this can slightly alter the effective values of p).

Second, if an integer n is provided, all replicates will contain n observations after the data are removed. If fixed_pattern = true, the missingness pattern is fixed for all replicates. If contiguous_pattern = true, the data will be removed in a contiguous block. If variable_proportion = true, the proportion of missingness will vary across replicates, with each replicate containing between 1 and n observations after data removal, sampled uniformly (note that variable_proportion overrides fixed_pattern).

The return type is Array{Union{T, Missing}}.

Examples

d = 5           # dimension of each replicate
m = 2000        # number of replicates
Z = rand(d, m)  # simulated data

# Passing a desired proportion of missingness
p = rand(d)
removedata(Z, p)

# Passing a desired final sample size
n = 3  # number of observed elements of each replicate: must have n <= d
removedata(Z, n)
source
NeuralEstimators.spatialgraphFunction
spatialgraph(S)
spatialgraph(S, Z)
spatialgraph(g::GNNGraph, Z)

Given spatial data Z measured at spatial locations S, constructs a GNNGraph ready for use in a graph neural network that employs SpatialGraphConv layers.

When $m$ independent replicates are collected over the same set of $n$ spatial locations,

\[\{\boldsymbol{s}_1, \dots, \boldsymbol{s}_n\} \subset \mathcal{D},\]

where $\mathcal{D} \subset \mathbb{R}^d$ denotes the spatial domain of interest, Z should be given as an $n \times m$ matrix and S should be given as an $n \times d$ matrix. Otherwise, when $m$ independent replicates are collected over differing sets of spatial locations,

\[\{\boldsymbol{s}_{ij}, \dots, \boldsymbol{s}_{in_i}\} \subset \mathcal{D}, \quad i = 1, \dots, m,\]

Z should be given as an $m$-vector of $n_i$-vectors, and S should be given as an $m$-vector of $n_i \times d$ matrices.

The spatial information between neighbours is stored as an edge feature, with the specific information controlled by the keyword arguments stationary and isotropic. Specifically, the edge feature between node $j$ and node $j'$ stores the spatial distance $\|\boldsymbol{s}_{j'} - \boldsymbol{s}_j\|$ (if isotropic), the spatial displacement $\boldsymbol{s}_{j'} - \boldsymbol{s}_j$ (if stationary), or the matrix of locations $(\boldsymbol{s}_{j'}, \boldsymbol{s}_j)$ (if !stationary).

Additional keyword arguments inherit from adjacencymatrix() to determine the neighbourhood of each node, with the default being a randomly selected set of k=30 neighbours within a disc of radius r=0.15 units.

Examples

using NeuralEstimators

# Number of replicates and spatial dimension
m = 5  
d = 2  

# Spatial locations fixed for all replicates
n = 100
S = rand(n, d)
Z = rand(n, m)
g = spatialgraph(S, Z)

# Spatial locations varying between replicates
n = rand(50:100, m)
S = rand.(n, d)
Z = rand.(n)
g = spatialgraph(S, Z)
source
NeuralEstimators.stackarraysFunction
stackarrays(v::V; merge = true) where {V <: AbstractVector{A}} where {A <: AbstractArray{T, N}} where {T, N}

Stack a vector of arrays v along the last dimension of each array, optionally merging the final dimension of the stacked array.

The arrays must be of the same size for the first N-1 dimensions. However, if merge = true, the size of the final dimension can vary.

Examples

# Vector containing arrays of the same size:
Z = [rand(2, 3, m) for m ∈ (1, 1)];
stackarrays(Z)
stackarrays(Z, merge = false)

# Vector containing arrays with differing final dimension size:
Z = [rand(2, 3, m) for m ∈ (1, 2)];
stackarrays(Z)
source
NeuralEstimators.vectotrilFunction
vectotril(v; strict = false)
vectotriu(v; strict = false)

Converts a vector v of length $d(d+1)÷2$ (a triangular number) into a $d × d$ lower or upper triangular matrix.

If strict = true, the matrix will be strictly lower or upper triangular, that is, a $(d+1) × (d+1)$ triangular matrix with zero diagonal.

Note that the triangular matrix is constructed on the CPU, but the returned matrix will be a GPU array if v is a GPU array. Note also that the return type is not of type Triangular matrix (i.e., the zeros are materialised) since Traingular matrices are not always compatible with other GPU operations.

Examples

using NeuralEstimators

d = 4
n = d*(d+1)÷2
v = collect(range(1, n))
vectotril(v)
vectotriu(v)
vectotril(v; strict = true)
vectotriu(v; strict = true)
source