Learn R Programming

spartan (version 3.0.2)

generate_requested_emulations: Generate emulators for specified machine learning techniques with provided data

Description

This method generates an emulator model from a training set for a specified technique, and generates performance statistics from the test set. The currently implemented techniques are a neural network (using the neuralnet package), a random forest (from the randomforest package), a support vector machine (from package e1071), a gaussian process model (from package mlegp), and a general linear model. Where a neural network is desired, the hyper-parameters are determined using k-fold cross validation from a set of specified network structures. Where a simulation has multiple outputs, an emulator model is created for each output response. This method provides capacity to save the generated emulator models to file, in Rda format, and plot a comparison of the predicted responses to a set of those of the training and test sets, giving correlation of determination (R-squared) and mean squared error values. The method returns a list of emulators of a specified technique, one for each simulation output, and the performance statistics for each measure, including the time taken to generate these emulators. If the training data has been normalised, minimum and maximum sampling values for each parameter are also returned such that any predictions generated using this emulation can be rescaled correctly. If plots are desired (by setting a flag in emulation_algorithm settings), plots produced are stored as PDF's in the working directory. The same applies to saving the generated emulator, set by the saveEmulation flag in emulation_algorithm_settings. Note that it must be specified as to whether the data being provided in partitioned_data has been normalised or not: this affects the output of the plots (as the data is rescaled back to its original scale if the data was normalised). Similarly to the rest of spartan, this method can create emulations for multiple timepoints.

Usage

generate_requested_emulations(model_list, partitioned_data, parameters,
  measures, algorithm_settings = NULL, timepoint = NULL,
  normalised = FALSE, output_formats = c("pdf"))

Arguments

model_list

Vector of the types of emulation model to create. Accepted abbreviations are: SVM (Support-Vector Machine), GP (Gaussian Process Model), NNET (Neural Network), RF (Random Forest), GLM (General Linear Model)

partitioned_data

Object output from the function partition_dataset, an object containing training, testing, and validation data

parameters

Vector containing the names of the simulation parameters in the dataset on which the emulator is being trained

measures

Vector containing the simulation outputs that the emulators should be able to predict

algorithm_settings

Object output from the function emulation_algorithm_settings, containing the settings of the machine learning algorithms to use in emulation creation. If no setting changes are required, and a neural network is not being generated, this can be left out, and will be generated by generate_requested_emulations (so this defaults to NULL). If you are making any changes to the settings or generating a neural network, you must create this object before calling generate_requested_emulations.

timepoint

If using multiple timepoints, the timepoint for which emulators are being created

normalised

Whether the emulator data has been normalised or not. Affects how training and test output predictions are displayed

output_formats

File formats in which result graphs should be produced

Value

Emulation objects, bundled into a list, with the required sampling information to rescale the data these emulations produce if required