Miscellaneous
NeuralEstimators.EM
NeuralEstimators.IndicatorWeights
NeuralEstimators.KernelWeights
NeuralEstimators.adjacencymatrix
NeuralEstimators.containertype
NeuralEstimators.encodedata
NeuralEstimators.estimateinbatches
NeuralEstimators.expandgrid
NeuralEstimators.initialise_estimator
NeuralEstimators.loadbestweights
NeuralEstimators.maternchols
NeuralEstimators.numberreplicates
NeuralEstimators.removedata
NeuralEstimators.rowwisenorm
NeuralEstimators.spatialgraph
NeuralEstimators.stackarrays
NeuralEstimators.subsetdata
NeuralEstimators.subsetparameters
NeuralEstimators.vectotril
Core
These functions can appear during the core workflow, and may need to be overloaded in some applications.
NeuralEstimators.numberreplicates
— Functionnumberreplicates(Z)
Generic function that returns the number of replicates in a given object. Default implementations are provided for commonly used data formats, namely, data stored as an Array
or as a GNNGraph
.
NeuralEstimators.subsetdata
— Functionsubsetdata(Z::V, i) where {V <: AbstractArray{A}} where {A <: Any}
subsetdata(Z::A, i) where {A <: AbstractArray{T, N}} where {T, N}
subsetdata(Z::G, i) where {G <: AbstractGraph}
Return replicate(s) i
from each data set in Z
.
If the user is working with data that are not covered by the default methods, simply overload the function with the appropriate type for Z
.
For graphical data, calls getgraph()
, where the replicates are assumed be to stored as batched graphs. Since this can be slow, one should consider using a method of train()
that does not require the data to be subsetted when working with graphical data (use numberreplicates()
to check that the training and validation data sets are equally replicated, which prevents subsetting).
Examples
using NeuralEstimators
using GraphNeuralNetworks
using Flux: batch
d = 1 # dimension of the response variable
n = 4 # number of observations in each realisation
m = 6 # number of replicates in each data set
K = 2 # number of data sets
# Array data
Z = [rand(n, d, m) for k ∈ 1:K]
subsetdata(Z, 2) # extract second replicate from each data set
subsetdata(Z, 1:3) # extract first 3 replicates from each data set
# Graphical data
e = 8 # number of edges
Z = [batch([rand_graph(n, e, ndata = rand(d, n)) for _ ∈ 1:m]) for k ∈ 1:K]
subsetdata(Z, 2) # extract second replicate from each data set
subsetdata(Z, 1:3) # extract first 3 replicates from each data set
NeuralEstimators.subsetparameters
— Functionsubsetparameters(parameters::M, indices) where {M <: AbstractMatrix}
subsetparameters(parameters::P, indices) where {P <: ParameterConfigurations}
Subset parameters
using a collection of indices
.
Arrays in parameters::P
with last dimension equal in size to the number of parameter configurations, K, are also subsetted (over their last dimension) using indices
. All other fields are left unchanged. To modify this default behaviour, overload subsetparameters
.
Downstream-inference algorithms
NeuralEstimators.EM
— TypeEM(simulateconditional::Function, MAP::Union{Function, NeuralEstimator}, θ₀ = nothing)
Implements the (Bayesian) Monte Carlo expectation-maximisation (EM) algorithm, with $l$th iteration
\[\boldsymbol{\theta}^{(l)} = \argmax_{\boldsymbol{\theta}} \sum_{h = 1}^H \ell(\boldsymbol{\theta}; \boldsymbol{Z}_1, \boldsymbol{Z}_2^{(lh)}) + H\log \pi(\boldsymbol{\theta})\]
where $\ell(\cdot)$ is the complete-data log-likelihood function, $\boldsymbol{Z} \equiv (\boldsymbol{Z}_1', \boldsymbol{Z}_2')'$ denotes the complete data with $\boldsymbol{Z}_1$ and $\boldsymbol{Z}_2$ the observed and missing components, respectively, $\boldsymbol{Z}_2^{(lh)}$, $h = 1, \dots, H$, is simulated from the distribution of $\boldsymbol{Z}_2 \mid \boldsymbol{Z}_1, \boldsymbol{\theta}^{(l-1)}$, and $\pi(\boldsymbol{\theta})$ denotes the prior density.
Fields
The function simulateconditional
should have a signature of the form,
simulateconditional(Z::A, θ; nsims = 1) where {A <: AbstractArray{Union{Missing, T}}} where T
The output of simulateconditional
should be the completed-data Z
, and it should be returned in whatever form is appropriate to be passed to the MAP estimator as MAP(Z)
. For example, if the data are gridded and the MAP
is a neural MAP estimator based on a CNN architecture, then Z
should be returned as a four-dimensional array.
The field MAP
can be a function (to facilitate the conventional Monte Carlo EM algorithm) or a NeuralEstimator
(to facilitate the so-called neural EM algorithm).
The starting values θ₀
may be provided during initialisation (as a vector), or when applying the EM
object to data (see below). The starting values given in a function call take precedence over those stored in the object.
Methods
Once constructed, obects of type EM
can be applied to data via the methods,
(em::EM)(Z::A, θ₀::Union{Nothing, Vector} = nothing; ...) where {A <: AbstractArray{Union{Missing, T}, N}} where {T, N}
(em::EM)(Z::V, θ₀::Union{Nothing, Vector, Matrix} = nothing; ...) where {V <: AbstractVector{A}} where {A <: AbstractArray{Union{Missing, T}, N}} where {T, N}
where Z
is the complete data containing the observed data and Missing
values. Note that the second method caters for the case that one has multiple data sets. The keyword arguments are:
nsims = 1
: the number $H$ of conditional simulations in each iteration.niterations = 50
: the maximum number of iterations.nconsecutive = 3
: the number of consecutive iterations for which the convergence criterion must be met.ϵ = 0.01
: tolerance used to assess convergence; the algorithm halts if the relative change in parameter values in successive iterations is less thanϵ
.return_iterates::Bool
: iftrue
, the estimate at each iteration of the algorithm is returned; otherwise, only the final estimate is returned.ξ = nothing
: model information needed for conditional simulation (e.g., distance matrices) or in the MAP estimator.use_ξ_in_simulateconditional::Bool = false
: if set totrue
, the conditional simulator is called assimulateconditional(Z, θ, ξ; nsims = nsims)
.use_ξ_in_MAP::Bool = false
: if set totrue
, the MAP estimator is called asMAP(Z, ξ)
.use_gpu::Bool = true
verbose::Bool = false
Examples
# See the "Missing data" section in "Advanced usage"
Utility functions
NeuralEstimators.adjacencymatrix
— Functionadjacencymatrix(S::Matrix, k::Integer; maxmin = false, combined = false)
adjacencymatrix(S::Matrix, r::AbstractFloat)
adjacencymatrix(S::Matrix, r::AbstractFloat, k::Integer; random = true)
adjacencymatrix(M::Matrix; k, r, kwargs...)
Computes a spatially weighted adjacency matrix from spatial locations S
based on either the k
-nearest neighbours of each location; all nodes within a disc of fixed radius r
; or, if both r
and k
are provided, a subset of k
neighbours within a disc of fixed radius r
.
If S
is a square matrix, it is treated as a distance matrix; otherwise, it should be an $n$ x $d$ matrix, where $n$ is the number of spatial locations and $d$ is the spatial dimension (typically $d$ = 2). In the latter case, the distance metric is taken to be the Euclidean distance. Note that use of a maxmin ordering currently requires a matrix of spatial locations (not a distance matrix).
When using the k
nearest neighbours, if maxmin=false
(default) the neighbours are chosen based on all points in the graph. If maxmin=true
, a so-called maxmin ordering is applied, whereby an initial point is selected, and each subsequent point is selected to maximise the minimum distance to those points that have already been selected. Then, the neighbours of each point are defined as the k
-nearest neighbours amongst the points that have already appeared in the ordering. If combined=true
, the neighbours are defined to be the union of the k
-nearest neighbours and the k
-nearest neighbours subject to a maxmin ordering.
Two subsampling strategies are implemented when choosing a subset of k
neighbours within a disc of fixed radius r
. If random=true
(default), the neighbours are randomly selected from within the disc. If random=false
, a deterministic algorithm is used that aims to preserve the distribution of distances within the neighbourhood set, by choosing those nodes with distances to the central node corresponding to the $\{0, \frac{1}{k}, \frac{2}{k}, \dots, \frac{k-1}{k}, 1\}$ quantiles of the empirical distribution function of distances within the disc (this in fact yields up to $k+1$ neighbours, since both the closest and furthest nodes are always included).
By convention with the functionality in GraphNeuralNetworks.jl
which is based on directed graphs, the neighbours of location i
are stored in the column A[:, i]
where A
is the returned adjacency matrix. Therefore, the number of neighbours for each location is given by collect(mapslices(nnz, A; dims = 1))
, and the number of times each node is a neighbour of another node is given by collect(mapslices(nnz, A; dims = 2))
.
By convention, we do not consider a location to neighbour itself (i.e., the diagonal elements of the adjacency matrix are zero).
Examples
using NeuralEstimators, Distances, SparseArrays
n = 250
d = 2
S = rand(Float32, n, d)
k = 10
r = 0.10
# Memory efficient constructors
adjacencymatrix(S, k)
adjacencymatrix(S, k; maxmin = true)
adjacencymatrix(S, k; maxmin = true, combined = true)
adjacencymatrix(S, r)
adjacencymatrix(S, r, k)
adjacencymatrix(S, r, k; random = false)
# Construct from full distance matrix D
D = pairwise(Euclidean(), S, dims = 1)
adjacencymatrix(D, k)
adjacencymatrix(D, r)
adjacencymatrix(D, r, k)
adjacencymatrix(D, r, k; random = false)
NeuralEstimators.containertype
— Functioncontainertype(A::Type)
containertype(::Type{A}) where A <: SubArray
containertype(a::A) where A
Returns the container type of its argument.
If given a SubArray
, returns the container type of the parent array.
Examples
a = rand(3, 4)
containertype(a)
containertype(typeof(a))
[containertype(x) for x ∈ eachcol(a)]
NeuralEstimators.encodedata
— Functionencodedata(Z::A; c::T = zero(T)) where {A <: AbstractArray{Union{Missing, T}, N}} where T, N
For data Z
with missing entries, returns an encoded data set (U, W) where W encodes the missingness pattern as an indicator vector and U is the original data Z with missing entries replaced by a fixed constant c
.
The indicator vector W is stored in the second-to-last dimension of Z
, which should be singleton. If the second-to-last dimension is not singleton, then two singleton dimensions will be added to the array, and W will be stored in the new second-to-last dimension.
Examples
using NeuralEstimators
# Generate some missing data
Z = rand(16, 16, 1, 1)
Z = removedata(Z, 0.25) # remove 25% of the data
# Encode the data
UW = encodedata(Z)
NeuralEstimators.estimateinbatches
— Functionestimateinbatches(θ̂, z, t = nothing; batchsize::Integer = 32, use_gpu::Bool = true, kwargs...)
Apply the estimator θ̂
on minibatches of z
(and optionally other set-level information t
) of size batchsize
.
This can prevent memory issues that can occur with large data sets, particularly on the GPU.
Minibatching will only be done if there are multiple data sets in z
; this will be inferred by z
being a vector, or a tuple whose first element is a vector.
NeuralEstimators.expandgrid
— Functionexpandgrid(xs, ys)
Same as expand.grid()
in R
, but currently caters for two dimensions only.
NeuralEstimators.IndicatorWeights
— TypeIndicatorWeights(h_max, n_bins::Integer)
(w::IndicatorWeights)(h::Matrix)
For spatial locations $\boldsymbol{s}$ and $\boldsymbol{u}$, creates a spatial weight function defined as
\[\boldsymbol{w}(\boldsymbol{s}, \boldsymbol{u}) \equiv (\mathbb{I}(h \in B_k) : k = 1, \dots, K)',\]
where $\mathbb{I}(\cdot)$ denotes the indicator function, $h \equiv \|\boldsymbol{s} - \boldsymbol{u} \|$ is the spatial distance between $\boldsymbol{s}$ and $\boldsymbol{u}$, and $\{B_k : k = 1, \dots, K\}$ is a set of $K =$n_bins
equally-sized distance bins covering the spatial distances between 0 and h_max
.
Examples
using NeuralEstimators
h_max = 1
n_bins = 10
w = IndicatorWeights(h_max, n_bins)
h = rand(1, 30) # distances between 30 pairs of spatial locations
w(h)
NeuralEstimators.KernelWeights
— TypeKernelWeights(h_max, n_bins::Integer)
(w::KernelWeights)(h::Matrix)
For spatial locations $\boldsymbol{s}$ and $\boldsymbol{u}$, creates a spatial weight function defined as
\[\boldsymbol{w}(\boldsymbol{s}, \boldsymbol{u}) \equiv (\exp(-(h - \mu_k)^2 / (2\sigma_k^2)) : k = 1, \dots, K)',\]
where $h \equiv \|\boldsymbol{s} - \boldsymbol{u}\|$ is the spatial distance between $\boldsymbol{s}$ and $\boldsymbol{u}$, and ${\mu_k : k = 1, \dots, K}$ and ${\sigma_k : k = 1, \dots, K}$ are the means and standard deviations of the Gaussian kernels for each bin, covering the spatial distances between 0 and h_max.
Examples
using NeuralEstimators
h_max = 1
n_bins = 10
w = KernelWeights(h_max, n_bins)
h = rand(1, 30) # distances between 30 pairs of spatial locations
w(h)
NeuralEstimators.initialise_estimator
— Functioninitialise_estimator(p::Integer; ...)
Initialise a neural estimator for a statistical model with p
unknown parameters.
The estimator is couched in the DeepSets framework (see DeepSet
) so that it can be applied to data sets containing an arbitrary number of independent replicates (including the special case of a single replicate).
Note also that the user is free to initialise their neural estimator however they see fit using arbitrary Flux
code; see here for Flux
's API reference.
Finally, the method with positional argument data_type
is a wrapper that allows one to specify the type of their data (either "unstructured", "gridded", or "irregular_spatial").
Keyword arguments
architecture::String
: for unstructured multivariate data, one may use a fully-connected multilayer perceptron ("MLP"
); for data collected over a grid, a convolutional neural network ("CNN"
); and for graphical or irregular spatial data, a graphical neural network ("GNN"
).d::Integer = 1
: for unstructured multivariate data (i.e., whenarchitecture = "MLP"
), the dimension of the data (e.g.,d = 3
for trivariate data); otherwise, ifarchitecture ∈ ["CNN", "GNN"]
, the argumentd
controls the number of input channels (e.g.,d = 1
for univariate spatial processes).estimator_type::String = "point"
: the type of estimator; either"point"
or"interval"
.depth = 3
: the number of hidden layers; either a single integer or an integer vector of length two specifying the depth of the inner (summary) and outer (inference) network of the DeepSets framework.width = 32
: a single integer or an integer vector of lengthsum(depth)
specifying the width (or number of convolutional filters/channels) in each hidden layer.activation::Function = relu
: the (non-linear) activation function of each hidden layer.activation_output::Function = identity
: the activation function of the output layer.variance_stabiliser::Union{Nothing, Function} = nothing
: a function that will be applied directly to the input, usually to stabilise the variance.kernel_size = nothing
: (applicable only to CNNs) a vector of lengthdepth[1]
containing integer tuples of lengthD
, whereD
is the dimension of the convolution (e.g.,D = 2
for two-dimensional convolution).weight_by_distance::Bool = true
: (applicable only to GNNs) flag indicating whether the estimator will weight by spatial distance; if true, aSpatialGraphConv
layer is used in the propagation module; otherwise, a regularGraphConv
layer is used.probs = [0.025, 0.975]
: (applicable only ifestimator_type = "interval"
) probability levels defining the lower and upper endpoints of the posterior credible interval.
Examples
## MLP, GNN, 1D CNN, and 2D CNN for a statistical model with two parameters:
p = 2
initialise_estimator(p, architecture = "MLP")
initialise_estimator(p, architecture = "GNN")
initialise_estimator(p, architecture = "CNN", kernel_size = [10, 5, 3])
initialise_estimator(p, architecture = "CNN", kernel_size = [(10, 10), (5, 5), (3, 3)])
NeuralEstimators.loadbestweights
— Functionloadbestweights(path::String)
Returns the weights of the neural network saved as 'best_network.bson' in the given path
.
NeuralEstimators.maternchols
— Functionmaternchols(D, ρ, ν, σ² = 1; stack = true)
Given a matrix D
of distances, constructs the Cholesky factor of the covariance matrix under the Matérn covariance function with range parameter ρ
, smoothness parameter ν
, and marginal variance σ²
.
Providing vectors of parameters will yield a three-dimensional array of Cholesky factors (note that the vectors must of the same length, but a mix of vectors and scalars is allowed). A vector of distance matrices D
may also be provided.
If stack = true
, the Cholesky factors will be "stacked" into a three-dimensional array (this is only possible if all distance matrices in D
are the same size).
Examples
using NeuralEstimators
using LinearAlgebra: norm
n = 10
S = rand(n, 2)
D = [norm(sᵢ - sⱼ) for sᵢ ∈ eachrow(S), sⱼ ∈ eachrow(S)]
ρ = [0.6, 0.5]
ν = [0.7, 1.2]
σ² = [0.2, 0.4]
maternchols(D, ρ, ν)
maternchols([D], ρ, ν)
maternchols(D, ρ, ν, σ²; stack = false)
S̃ = rand(n, 2)
D̃ = [norm(sᵢ - sⱼ) for sᵢ ∈ eachrow(S̃), sⱼ ∈ eachrow(S̃)]
maternchols([D, D̃], ρ, ν, σ²)
maternchols([D, D̃], ρ, ν, σ²; stack = false)
S̃ = rand(2n, 2)
D̃ = [norm(sᵢ - sⱼ) for sᵢ ∈ eachrow(S̃), sⱼ ∈ eachrow(S̃)]
maternchols([D, D̃], ρ, ν, σ²; stack = false)
NeuralEstimators.removedata
— Functionremovedata(Z::Array, Iᵤ::Vector{Integer})
removedata(Z::Array, p::Union{Float, Vector{Float}}; prevent_complete_missing = true)
removedata(Z::Array, n::Integer; fixed_pattern = false, contiguous_pattern = false, variable_proportion = false)
Replaces elements of Z
with missing
.
The simplest method accepts a vector of integers Iᵤ
that give the specific indices of the data to be removed.
Alternatively, there are two methods available to generate data that are missing completely at random (MCAR).
First, a vector p
may be given that specifies the proportion of missingness for each element in the response vector. Hence, p
should have length equal to the dimension of the response vector. If a single proportion is given, it will be replicated accordingly. If prevent_complete_missing = true
, no replicates will contain 100% missingness (note that this can slightly alter the effective values of p
).
Second, if an integer n
is provided, all replicates will contain n
observations after the data are removed. If fixed_pattern = true
, the missingness pattern is fixed for all replicates. If contiguous_pattern = true
, the data will be removed in a contiguous block. If variable_proportion = true
, the proportion of missingness will vary across replicates, with each replicate containing between 1 and n
observations after data removal, sampled uniformly (note that variable_proportion
overrides fixed_pattern
).
The return type is Array{Union{T, Missing}}
.
Examples
d = 5 # dimension of each replicate
m = 2000 # number of replicates
Z = rand(d, m) # simulated data
# Passing a desired proportion of missingness
p = rand(d)
removedata(Z, p)
# Passing a desired final sample size
n = 3 # number of observed elements of each replicate: must have n <= d
removedata(Z, n)
NeuralEstimators.rowwisenorm
— Functionrowwisenorm(A)
Computes the row-wise norm of a matrix A
.
NeuralEstimators.spatialgraph
— Functionspatialgraph(S)
spatialgraph(S, Z)
spatialgraph(g::GNNGraph, Z)
Given spatial data Z
measured at spatial locations S
, constructs a GNNGraph
ready for use in a graph neural network that employs SpatialGraphConv
layers.
When $m$ independent replicates are collected over the same set of $n$ spatial locations,
\[\{\boldsymbol{s}_1, \dots, \boldsymbol{s}_n\} \subset \mathcal{D},\]
where $\mathcal{D} \subset \mathbb{R}^d$ denotes the spatial domain of interest, Z
should be given as an $n \times m$ matrix and S
should be given as an $n \times d$ matrix. Otherwise, when $m$ independent replicates are collected over differing sets of spatial locations,
\[\{\boldsymbol{s}_{ij}, \dots, \boldsymbol{s}_{in_i}\} \subset \mathcal{D}, \quad i = 1, \dots, m,\]
Z
should be given as an $m$-vector of $n_i$-vectors, and S
should be given as an $m$-vector of $n_i \times d$ matrices.
The spatial information between neighbours is stored as an edge feature, with the specific information controlled by the keyword arguments stationary
and isotropic
. Specifically, the edge feature between node $j$ and node $j'$ stores the spatial distance $\|\boldsymbol{s}_{j'} - \boldsymbol{s}_j\|$ (if isotropic
), the spatial displacement $\boldsymbol{s}_{j'} - \boldsymbol{s}_j$ (if stationary
), or the matrix of locations $(\boldsymbol{s}_{j'}, \boldsymbol{s}_j)$ (if !stationary
).
Additional keyword arguments inherit from adjacencymatrix()
to determine the neighbourhood of each node, with the default being a randomly selected set of k=30
neighbours within a disc of radius r=0.15
units.
Examples
using NeuralEstimators
# Number of replicates and spatial dimension
m = 5
d = 2
# Spatial locations fixed for all replicates
n = 100
S = rand(n, d)
Z = rand(n, m)
g = spatialgraph(S, Z)
# Spatial locations varying between replicates
n = rand(50:100, m)
S = rand.(n, d)
Z = rand.(n)
g = spatialgraph(S, Z)
NeuralEstimators.stackarrays
— Functionstackarrays(v::V; merge = true) where {V <: AbstractVector{A}} where {A <: AbstractArray{T, N}} where {T, N}
Stack a vector of arrays v
along the last dimension of each array, optionally merging the final dimension of the stacked array.
The arrays must be of the same size for the first N-1
dimensions. However, if merge = true
, the size of the final dimension can vary.
Examples
# Vector containing arrays of the same size:
Z = [rand(2, 3, m) for m ∈ (1, 1)];
stackarrays(Z)
stackarrays(Z, merge = false)
# Vector containing arrays with differing final dimension size:
Z = [rand(2, 3, m) for m ∈ (1, 2)];
stackarrays(Z)
NeuralEstimators.vectotril
— Functionvectotril(v; strict = false)
vectotriu(v; strict = false)
Converts a vector v
of length $d(d+1)÷2$ (a triangular number) into a $d × d$ lower or upper triangular matrix.
If strict = true
, the matrix will be strictly lower or upper triangular, that is, a $(d+1) × (d+1)$ triangular matrix with zero diagonal.
Note that the triangular matrix is constructed on the CPU, but the returned matrix will be a GPU array if v
is a GPU array. Note also that the return type is not of type Triangular
matrix (i.e., the zeros are materialised) since Traingular
matrices are not always compatible with other GPU operations.
Examples
using NeuralEstimators
d = 4
n = d*(d+1)÷2
v = collect(range(1, n))
vectotril(v)
vectotriu(v)
vectotril(v; strict = true)
vectotriu(v; strict = true)