Miscellaneous
NeuralEstimators.EMNeuralEstimators.IndicatorWeightsNeuralEstimators.KernelWeightsNeuralEstimators.adjacencymatrixNeuralEstimators.containertypeNeuralEstimators.encodedataNeuralEstimators.expandgridNeuralEstimators.materncholsNeuralEstimators.numberreplicatesNeuralEstimators.removedataNeuralEstimators.rowwisenormNeuralEstimators.spatialgraphNeuralEstimators.stackarraysNeuralEstimators.subsetdataNeuralEstimators.subsetparametersNeuralEstimators.vectotril
Core
These functions can appear during the core workflow, and may need to be overloaded in some applications.
NeuralEstimators.numberreplicates — Function
numberreplicates(Z)Generic function that returns the number of replicates in a given object. Default implementations are provided for commonly used data formats, namely, data stored as an Array or as a GNNGraph.
NeuralEstimators.subsetdata — Function
subsetdata(Z::V, i) where {V <: AbstractArray{A}} where {A <: Any}
subsetdata(Z::A, i) where {A <: AbstractArray{T, N}} where {T, N}
subsetdata(Z::G, i) where {G <: AbstractGraph}Return replicate(s) i from each data set in Z.
If working with data that are not covered by the default methods, overload the function with the appropriate type for Z.
For graphical data, calls getgraph(), where the replicates are assumed be to stored as batched graphs. Since this can be slow, one should consider using a method of train() that does not require the data to be subsetted when working with graphical data (use numberreplicates() to check that the training and validation data sets are equally replicated, which prevents subsetting).
Examples
using NeuralEstimators
using GraphNeuralNetworks
using Flux: batch
d = 1 # dimension of the response variable
n = 4 # number of observations in each realisation
m = 6 # number of replicates in each data set
K = 2 # number of data sets
# Array data
Z = [rand(n, d, m) for k ∈ 1:K]
subsetdata(Z, 2) # extract second replicate from each data set
subsetdata(Z, 1:3) # extract first 3 replicates from each data set
# Graphical data
e = 8 # number of edges
Z = [batch([rand_graph(n, e, ndata = rand(d, n)) for _ ∈ 1:m]) for k ∈ 1:K]
subsetdata(Z, 2) # extract second replicate from each data set
subsetdata(Z, 1:3) # extract first 3 replicates from each data setNeuralEstimators.subsetparameters — Function
subsetparameters(parameters::M, indices) where {M <: AbstractMatrix}
subsetparameters(parameters::P, indices) where {P <: ParameterConfigurations}Subset parameters using a collection of indices.
Arrays in parameters::P with last dimension equal in size to the number of parameter configurations, K, are also subsetted (over their last dimension) using indices. All other fields are left unchanged. To modify this default behaviour, overload subsetparameters.
Downstream-inference algorithms
NeuralEstimators.EM — Type
EM(simulateconditional::Function, MAP::Union{Function, NeuralEstimator}, θ₀ = nothing)Implements a (Bayesian) Monte Carlo expectation-maximization (EM) algorithm for parameter estimation with missing data. The algorithm iteratively simulates missing data conditional on current parameter estimates, then updates parameters using a (neural) maximum a posteriori (MAP) estimator.
The $l$th iteration is given by:
\[\boldsymbol{\theta}^{(l)} = \underset{\boldsymbol{\theta}}{\mathrm{arg\,max}} \sum_{h = 1}^H \ell(\boldsymbol{\theta}; \boldsymbol{Z}_1, \boldsymbol{Z}_2^{(l, h)}) + H\log \pi(\boldsymbol{\theta})\]
where $\ell((\boldsymbol{\theta}; \boldsymbol{Z}_1, \boldsymbol{Z}_2)$ denotes the complete-data log-likelihood function, $\boldsymbol{Z} \equiv (\boldsymbol{Z}_1', \boldsymbol{Z}_2')'$ denotes the complete data with $\boldsymbol{Z}_1$ and $\boldsymbol{Z}_2$ the observed and missing components respectively, $\boldsymbol{Z}_2^{(l, h)}$, $h = 1, \dots, H$, are simulated from the distribution of $\boldsymbol{Z}_2 \mid \boldsymbol{Z}_1, \boldsymbol{\theta}^{(l-1)}$, and $\pi(\boldsymbol{\theta})$ is the prior density (which can be viewed as a penalty function).
The algorithm monitors convergence by computing:
\[\max_i \left| \frac{\bar{\theta}_i^{(l+1)} - \bar{\theta}_i^{(l)}}{|\bar{\theta}_i^{(l)}| + \epsilon} \right|\]
where $\bar{\theta}^{(l)}$ is the average of parameter estimates from iteration burnin+1 to iteration $l$, and $\epsilon$ is machine precision. Convergence is declared when this quantity is less than tol for nconsecutive consecutive iterations (see keyword arguments below).
Fields
simulateconditional::Function: Function for simulating missing data conditional on observed data and current parameter estimates. Must have signature:simulateconditional(Z::AbstractArray{Union{Missing, T}}, θ; nsims = 1, kwargs...)Returns completed data in the format expected by
MAP(e.g., 4D array for CNNs).MAP::NeuralEstimator: MAP estimator applied to completed data.θ₀: Optional initial parameter values (vector). Can also be provided when calling theEMobject.
Methods
Once constructed, objects of type EM can be applied to data via the following methods (corresponding to single or multiple data sets, respectively):
(em::EM)(Z::A, θ₀::Union{Nothing, Vector} = nothing; ...) where {A <: AbstractArray{Union{Missing, T}, N}} where {T, N}
(em::EM)(Z::V, θ₀::Union{Nothing, Vector, Matrix} = nothing; ...) where {V <: AbstractVector{A}} where {A <: AbstractArray{Union{Missing, T}, N}} where {T, N}where Z is the complete data containing the observed data and Missing values.
For multiple datasets, θ₀ can be a vector (same initial values for all) or a matrix with one column per dataset.
Keyword Arguments
niterations::Integer = 50: Maximum number of EM iterations.nsims::Union{Integer, Vector{<:Integer}} = 1: Number of conditional simulations per iteration. Can be fixed (scalar) or varying (vector of lengthniterations).burnin::Integer = 1: Number of initial iterations to discard before averaging parameter estimates for convergence assessment.nconsecutive::Integer = 3: Number of consecutive iterations meeting the convergence criterion required to halt.tol = 0.01: Convergence tolerance. Algorithm stops if the relative change in post-burnin averaged parameters is less thantolfornconsecutiveiterations.use_gpu::Bool = true: Whether to use a GPU (if available) for MAP estimation.verbose::Bool = false: Whether to print iteration details.kwargs...: Additional arguments passed tosimulateconditional.
Returns
For a single data set, returns a named tuple containing:
estimate: Final parameter estimate (post-burnin average).iterates: Matrix of all parameter estimates across iterations (each column is one iteration).burnin: The burnin value used.
For multiple data set, returns a matrix with one column per dataset.
Notes
- If
Zcontains no missing values, the MAP estimator is applied directly (after passing throughsimulateconditionalto ensure correct format). - When using a GPU, data are moved to the GPU for MAP estimation and back to the CPU for conditional simulation.
Examples
# Below we give a pseudocode example; see the "Missing data" section in "Advanced usage" for a concrete example.
# Define conditional simulation function
function sim_conditional(Z, θ; nsims = 1)
# Your implementation here
# Returns completed data in format suitable for MAP estimator
end
# Define or load MAP estimator
MAP_estimator = ... # Neural MAP estimator
# Create EM object
em = EM(sim_conditional, MAP_estimator, θ₀ = [1.0, 2.0])
# Apply to data with missing values
Z = ... # Array with Missing entries
result = em(Z, niterations = 100, nsims = 5, tol = 0.001, verbose = true)
# Access results
θ_final = result.estimate
θ_sequence = result.iterates
# Multiple datasets
Z_list = [Z1, Z2, Z3]
estimates = em(Z_list, θ₀ = [1.0, 2.0]) # Matrix with 3 columnsUtility functions
NeuralEstimators.adjacencymatrix — Function
adjacencymatrix(S::Matrix, k::Integer; maxmin = false, combined = false)
adjacencymatrix(S::Matrix, r::AbstractFloat)
adjacencymatrix(S::Matrix, r::AbstractFloat, k::Integer; random = true)
adjacencymatrix(M::Matrix; k, r, kwargs...)Computes a spatially weighted adjacency matrix from spatial locations S based on either the k-nearest neighbours of each location; all nodes within a disc of fixed radius r; or, if both r and k are provided, a subset of k neighbours within a disc of fixed radius r.
If S is a square matrix, it is treated as a distance matrix; otherwise, it should be an $n$ x $d$ matrix, where $n$ is the number of spatial locations and $d$ is the spatial dimension (typically $d$ = 2). In the latter case, the distance metric is taken to be the Euclidean distance. Note that use of a maxmin ordering currently requires a matrix of spatial locations (not a distance matrix).
When using the k nearest neighbours, if maxmin=false (default) the neighbours are chosen based on all points in the graph. If maxmin=true, a so-called maxmin ordering is applied, whereby an initial point is selected, and each subsequent point is selected to maximise the minimum distance to those points that have already been selected. Then, the neighbours of each point are defined as the k-nearest neighbours amongst the points that have already appeared in the ordering. If combined=true, the neighbours are defined to be the union of the k-nearest neighbours and the k-nearest neighbours subject to a maxmin ordering.
Two subsampling strategies are implemented when choosing a subset of k neighbours within a disc of fixed radius r. If random=true (default), the neighbours are randomly selected from within the disc. If random=false, a deterministic algorithm is used that aims to preserve the distribution of distances within the neighbourhood set, by choosing those nodes with distances to the central node corresponding to the $\{0, \frac{1}{k}, \frac{2}{k}, \dots, \frac{k-1}{k}, 1\}$ quantiles of the empirical distribution function of distances within the disc (this in fact yields up to $k+1$ neighbours, since both the closest and furthest nodes are always included).
By convention with the functionality in GraphNeuralNetworks.jl which is based on directed graphs, the neighbours of location i are stored in the column A[:, i] where A is the returned adjacency matrix. Therefore, the number of neighbours for each location is given by collect(mapslices(nnz, A; dims = 1)), and the number of times each node is a neighbour of another node is given by collect(mapslices(nnz, A; dims = 2)).
By convention, we do not consider a location to neighbour itself (i.e., the diagonal elements of the adjacency matrix are zero).
Examples
using NeuralEstimators, Distances, SparseArrays
n = 250
d = 2
S = rand(Float32, n, d)
k = 10
r = 0.10
# Memory efficient constructors
adjacencymatrix(S, k)
adjacencymatrix(S, k; maxmin = true)
adjacencymatrix(S, k; maxmin = true, combined = true)
adjacencymatrix(S, r)
adjacencymatrix(S, r, k)
adjacencymatrix(S, r, k; random = false)
# Construct from full distance matrix D
D = pairwise(Euclidean(), S, dims = 1)
adjacencymatrix(D, k)
adjacencymatrix(D, r)
adjacencymatrix(D, r, k)
adjacencymatrix(D, r, k; random = false)NeuralEstimators.containertype — Function
containertype(A::Type)
containertype(::Type{A}) where A <: SubArray
containertype(a::A) where AReturns the container type of its argument.
If given a SubArray, returns the container type of the parent array.
Examples
a = rand(3, 4)
containertype(a)
containertype(typeof(a))
[containertype(x) for x ∈ eachcol(a)]NeuralEstimators.encodedata — Function
encodedata(Z::A; c::T = zero(T)) where {A <: AbstractArray{Union{Missing, T}, N}} where T, NFor data Z with missing entries, returns an encoded data set (U, W) where U is the original data Z with missing entries replaced by a fixed constant c, and W encodes the missingness pattern as an indicator array equal to one if the corresponding element of Z is observed and zero otherwise.
The behavior depends on the dimensionality of Z. If Z has 1 or 2 dimensions, the indicator array W is concatenated along the first dimension of Z. If Z has more than 2 dimensions, W is concatenated along the second-to-last dimension of Z.
Examples
using NeuralEstimators
Z = rand(16, 16, 1, 1)
Z = removedata(Z, 0.25) # remove 25% of the data at random
UW = encodedata(Z)NeuralEstimators.expandgrid — Function
expandgrid(xs, ys)Generates a grid of all possible combinations of the elements from two input vectors, xs and ys.
Same as expand.grid() in R, but currently caters for two dimensions only.
NeuralEstimators.IndicatorWeights — Type
IndicatorWeights(h_max, n_bins::Integer)
(w::IndicatorWeights)(h::Matrix)For spatial locations $\boldsymbol{s}$ and $\boldsymbol{u}$, creates a spatial weight function defined as
\[\boldsymbol{w}(\boldsymbol{s}, \boldsymbol{u}) \equiv (\mathbb{I}(h \in B_k) : k = 1, \dots, K)',\]
where $\mathbb{I}(\cdot)$ denotes the indicator function, $h \equiv \|\boldsymbol{s} - \boldsymbol{u} \|$ is the spatial distance between $\boldsymbol{s}$ and $\boldsymbol{u}$, and $\{B_k : k = 1, \dots, K\}$ is a set of $K =$n_bins equally-sized distance bins covering the spatial distances between 0 and h_max.
Examples
using NeuralEstimators
h_max = 1
n_bins = 10
w = IndicatorWeights(h_max, n_bins)
h = rand(1, 30) # distances between 30 pairs of spatial locations
w(h)NeuralEstimators.KernelWeights — Type
KernelWeights(h_max, n_bins::Integer)
(w::KernelWeights)(h::Matrix)For spatial locations $\boldsymbol{s}$ and $\boldsymbol{u}$, creates a spatial weight function defined as
\[\boldsymbol{w}(\boldsymbol{s}, \boldsymbol{u}) \equiv (\exp(-(h - \mu_k)^2 / (2\sigma_k^2)) : k = 1, \dots, K)',\]
where $h \equiv \|\boldsymbol{s} - \boldsymbol{u}\|$ is the spatial distance between $\boldsymbol{s}$ and $\boldsymbol{u}$, and ${\mu_k : k = 1, \dots, K}$ and ${\sigma_k : k = 1, \dots, K}$ are the means and standard deviations of the Gaussian kernels for each bin, covering the spatial distances between 0 and h_max.
Examples
using NeuralEstimators
h_max = 1
n_bins = 10
w = KernelWeights(h_max, n_bins)
h = rand(1, 30) # distances between 30 pairs of spatial locations
w(h)NeuralEstimators.maternchols — Function
maternchols(D, ρ, ν, σ² = 1; stack = true)Given a matrix D of distances, constructs the Cholesky factor of the covariance matrix under the Matérn covariance function with range parameter ρ, smoothness parameter ν, and marginal variance σ².
Providing vectors of parameters will yield a three-dimensional array of Cholesky factors (note that the vectors must of the same length, but a mix of vectors and scalars is allowed). A vector of distance matrices D may also be provided.
If stack = true, the Cholesky factors will be "stacked" into a three-dimensional array (this is only possible if all distance matrices in D are the same size).
Examples
using NeuralEstimators
using LinearAlgebra: norm
n = 10
S = rand(n, 2)
D = [norm(sᵢ - sⱼ) for sᵢ ∈ eachrow(S), sⱼ ∈ eachrow(S)]
ρ = [0.6, 0.5]
ν = [0.7, 1.2]
σ² = [0.2, 0.4]
maternchols(D, ρ, ν)
maternchols([D], ρ, ν)
maternchols(D, ρ, ν, σ²; stack = false)
S̃ = rand(n, 2)
D̃ = [norm(sᵢ - sⱼ) for sᵢ ∈ eachrow(S̃), sⱼ ∈ eachrow(S̃)]
maternchols([D, D̃], ρ, ν, σ²)
maternchols([D, D̃], ρ, ν, σ²; stack = false)
S̃ = rand(2n, 2)
D̃ = [norm(sᵢ - sⱼ) for sᵢ ∈ eachrow(S̃), sⱼ ∈ eachrow(S̃)]
maternchols([D, D̃], ρ, ν, σ²; stack = false)NeuralEstimators.removedata — Function
removedata(Z::Array, Iᵤ::Vector{T}) where T <: Union{Integer, CartesianIndex}
removedata(Z::Array, p::Union{Float, Vector{Float}}; prevent_complete_missing = true)
removedata(Z::Array, n::Integer; fixed_pattern = false, contiguous_pattern = false, variable_proportion = false)Replaces elements of Z with missing.
The simplest method accepts a vector Iᵤ that specifes the indices of the data to be removed.
Alternatively, there are two methods available to randomly generate missing data.
First, a vector p may be given that specifies the proportion of missingness for each element in the response vector. Hence, p should have length equal to the dimension of the response vector. If a single proportion is given, it will be replicated accordingly. If prevent_complete_missing = true, no replicates will contain 100% missingness (note that this can slightly alter the effective values of p).
Second, if an integer n is provided, all replicates will contain n observations after the data are removed. If fixed_pattern = true, the missingness pattern is fixed for all replicates. If contiguous_pattern = true, the data will be removed in a contiguous block based on a randomly selected starting index. If variable_proportion = true, the proportion of missingness will vary across replicates, with each replicate containing between 1 and n observations after data removal, sampled uniformly (note that variable_proportion overrides fixed_pattern).
The return type is Array{Union{T, Missing}}.
Examples
d = 5 # dimension of each replicate
m = 2000 # number of replicates
Z = rand(d, m) # simulated data
# Passing a desired proportion of missingness
p = rand(d)
removedata(Z, p)
# Passing a desired final sample size
n = 3 # number of observed elements of each replicate: must have n <= d
removedata(Z, n)NeuralEstimators.rowwisenorm — Function
rowwisenorm(A)Computes the row-wise norm of a matrix A.
NeuralEstimators.spatialgraph — Function
spatialgraph(S)
spatialgraph(S, Z)
spatialgraph(g::GNNGraph, Z)Given spatial data Z measured at spatial locations S, constructs a GNNGraph ready for use in a graph neural network that employs SpatialGraphConv layers.
When $m$ independent replicates are collected over the same set of $n$ spatial locations,
\[\{\boldsymbol{s}_1, \dots, \boldsymbol{s}_n\} \subset \mathcal{D},\]
where $\mathcal{D} \subset \mathbb{R}^d$ denotes the spatial domain of interest, Z should be given as an $n \times m$ matrix and S should be given as an $n \times d$ matrix. Otherwise, when $m$ independent replicates are collected over differing sets of spatial locations,
\[\{\boldsymbol{s}_{ij}, \dots, \boldsymbol{s}_{in_i}\} \subset \mathcal{D}, \quad i = 1, \dots, m,\]
Z should be given as an $m$-vector of $n_i$-vectors, and S should be given as an $m$-vector of $n_i \times d$ matrices.
The spatial information between neighbours is stored as an edge feature, with the specific information controlled by the keyword arguments stationary and isotropic. Specifically, the edge feature between node $j$ and node $j'$ stores the spatial distance $\|\boldsymbol{s}_{j'} - \boldsymbol{s}_j\|$ (if isotropic), the spatial displacement $\boldsymbol{s}_{j'} - \boldsymbol{s}_j$ (if stationary), or the matrix of locations $(\boldsymbol{s}_{j'}, \boldsymbol{s}_j)$ (if !stationary).
Additional keyword arguments inherit from adjacencymatrix() to determine the neighbourhood of each node, with the default being a randomly selected set of k=30 neighbours within a disc of radius r=0.15 units.
Examples
using NeuralEstimators
# Number of replicates and spatial dimension
m = 5
d = 2
# Spatial locations fixed for all replicates
n = 100
S = rand(n, d)
Z = rand(n, m)
g = spatialgraph(S, Z)
# Spatial locations varying between replicates
n = rand(50:100, m)
S = rand.(n, d)
Z = rand.(n)
g = spatialgraph(S, Z)NeuralEstimators.stackarrays — Function
stackarrays(v::Vector{<:AbstractArray}; merge::Bool = true)Stack a vector of arrays v into a single higher-dimensional array.
If all arrays have the same size along the last dimension, stacks along a new final dimension. Then, if merge = true, merges the last two dimensions into one.
Alternatively, if sizes differ along the last dimension, concatenates along the last dimension.
Examples
# Vector containing arrays of the same size:
Z = [rand(2, 3, m) for m ∈ (1, 1)];
stackarrays(Z)
stackarrays(Z, merge = false)
# Vector containing arrays with differing final dimension size:
Z = [rand(2, 3, m) for m ∈ (1, 2)];
stackarrays(Z)NeuralEstimators.vectotril — Function
vectotril(v; strict = false)
vectotriu(v; strict = false)Converts a vector v of length $d(d+1)÷2$ (a triangular number) into a $d × d$ lower or upper triangular matrix.
If strict = true, the matrix will be strictly lower or upper triangular, that is, a $(d+1) × (d+1)$ triangular matrix with zero diagonal.
Note that the triangular matrix is constructed on the CPU, but the returned matrix will be a GPU array if v is a GPU array. Note also that the return type is not of type Triangular matrix (i.e., the zeros are materialised) since Triangular matrices are not always compatible with other GPU operations.
Examples
using NeuralEstimators
d = 4
n = d*(d+1)÷2
v = collect(range(1, n))
vectotril(v)
vectotriu(v)
vectotril(v; strict = true)
vectotriu(v; strict = true)