x.validation: Run a conStruct cross-validation analysis

Description

x.validation runs a conStruct cross-validation analysis

Usage

x.validation(
  train.prop = 0.9,
  n.reps,
  K,
  freqs = NULL,
  data.partitions = NULL,
  geoDist,
  coords,
  prefix,
  n.iter,
  make.figs = FALSE,
  save.files = FALSE,
  parallel = FALSE,
  n.nodes = NULL,
  ...
)

Value

This function returns (and also saves as a .Robj) a list

containing the standardized results of the cross-validation analysis across replicates. For each replicate, the function returns a list with the following elements:

sp - the mean of the standardized log likelihoods of the "testing" data partition of that replicate for the spatial model for each value of K specified in K.
nsp - the mean of the standardized log likelihoods of the "testing" data partitions of that replicate for the nonspatial model for each value of K specified in K.

In addition, this function saves two text files containing the standardized cross-validation results for the spatial and nonspatial results (prefix_sp_xval_results.txt and prefix_nsp_xval_results.txt, respectively). These values are written as matrices for user convenience; each column is a cross-validation replicate, and each row gives the result for a value of K.

Arguments

train.prop: A numeric value between 0 and 1 that gives the proportions of the data to be used in the training partition of the analysis. Default is 0.9.
n.reps: An integer giving the number of cross- validation replicates to be run.
K: A numeric vector giving the numbers of layers to be tested in each cross-validation replicate. E.g., K=1:7.
freqs: A matrix of allele frequencies with one column per locus and one row per sample. Missing data should be indicated with NA.
data.partitions: A list with one element for each desired cross-validation replicate. This argument can be specified instead of the freqs argument if the user wants to provide their own data partitions for model training and testing. See the model comparison vignette for details on what this should look like.
geoDist: A matrix of geographic distance between samples. If NULL, user can only run the nonspatial model.
coords: A matrix giving the longitude and latitude (or X and Y coordinates) of the samples.
prefix: A character vector giving the prefix to be attached to all output files.
n.iter: An integer giving the number of iterations each MCMC chain is run. Default is 1e3. If the number of iterations is greater than 500, the MCMC is thinned so that the number of retained iterations is 500 (before burn-in).
make.figs: A logical value indicating whether to automatically make figures during the course of the cross-validation analysis. Default is FALSE.
save.files: A logical value indicating whether to automatically save output and intermediate files once the analysis is complete. Default is FALSE.
parallel: A logical value indicating whether or not to run the different cross-validation replicates in parallel. Default is FALSE. For more details on how to set up runs in parallel, see the model comparison vignette.
n.nodes: Number of nodes to run parallel analyses on. Default is NULL. Ignored if parallel is FALSE. For more details in how to set up runs in parallel, see the model comparison vignette.
...: Further options to be passed to rstan::sampling (e.g., adapt_delta).

Details

This function initiates a cross-validation analysis that uses Monte Carlo cross-validation to determine the statistical support for models with different numbers of layers or with and without a spatial component.