Technique 4: eFAST - Perform Analysis of Results: eFAST: Perform Analysis of Results

Description

This technique analyses simulation results generated through parametering using the eFAST approach (extended Fourier Amplitude Sampling Test, Saltelli et al, reference below). This perturbs the value of all parameters at the same time, with the aim of partitioning the variance in simulation output between input parameters. Values for each parameter are chosen using fourier frequency curves through a parameters potential range of values. A selected number of values are selected from points along the curve. Though all parameters are perturbed simultaneously, the method does focus on one parameter of interest in turn, by giving this a very different sampling frequency to that assigned to the other parameters. Thus for each parameter of interest in turn, a sampling frequency is assigned to each parameter and values chosen at points along the curve. So a set of simulation parameters then exists for each parameter of interest. As this is the case, this method can be computationally expensive, especially if a large number of samples is taken on the parameter search curve, or there are a large number of parameters. On top of this, to ensure adequate sampling each curve is also resampled with a small adjustment to the frequency, creating more parameter sets on which the simulation should be run. This attempts to limit any correlations and limit the effect of repeated parameter value sets being chosen. Thus, for a system where 8 parameters are being analysed, and 3 different sample curves used, 24 different sets of parameter value sets will be produced. Each of these 24 sets then contains the parameter values chosen from the frequency curves. This number of samples should be no lower than 65 (see the Marino paper for an explanation of how to select sample size).

Once the sampling has been performed, simulation runs should be performed for each sample, and repeated for stochastic simulations ( a number of repeats that has become apparent through analysis of aleatory uncertainty, or use of Technique 1 within the spartan package). The eFAST algorithm then examines the simulation results for each parameter value set and, taking into account the sampling frequency used to produce those parameter values, partitions the variance in output between the input parameters. The spartan package includes methods to both create parameter value samples using fourier frequency sampling, and to analyse the simulation results. This method does the latter.

Note 1: From Spartan 2.0, you can specify your simulation data in two ways: A - Set folder structure (as in previous versions of Spartan): This is shown in figure eFAST_Folder_Struc.png within the extdata folder of this package, and described in detail in the tutorial. Using this structure, the parameter FILEPATH should point to a directory that contains a number of folders, one for each resample curve employed. Inside each of these folders will be a folder for each parameter, in turn holding a folder for each parameter sample set generated for that parameter. These in turn hold folders for each run of a simulation under those conditions. B - B - CSV file Input. From Spartan 2.0, you can specify all your results in a CSV file. For eFAST, this is more complex than the other techniques: the user needs to provide one CSV file per curve/parameter pairing. Thus if you had 3 resample curves and 7 parameters, you will need to provide 21 files. Each of these files will contain the parameters under which the simulation was run, and the simulation measures generated by those parameters. Where a simulation produces multiples of each measure (for example a number of cells), this should be the median of those responses. This file may contain muliple simulation responses per parameter set, where the simulation has been run a number of times. Note 2: From Spartan 2.0, performing this analysis at multiple timepoints is now performed using the same method calls below. There are no additional method calls for timepoint analysis.

There are three methods to this process: efast_generate_medians_for_all_parameter_subsets: Only to be applied in cases where simulation responses are supplied in the folder structure (as in all previous versions of Spartan), useful for cases where the simulation is agent-based. Iterates through the folder structure analysing the result of each replicate run under the same parameter conditions, creating a CSV file for each curve/parameter pair. This will hold the parameters of the run and the median of each simulation response for that run. As stated earlier, more than one run result can exist in this file. Where a simulation is being analysed for multiple timepoints, this will iterate through the results at all timepoints, creating curve/parameter pair CSV files for all specified timepoints. efast_get_overall_medians:This method produces a summary of the results for a particular resampling curve. This shows, for each parameter of interest, the median of each simulation output measure for each of the 65 parameter value sets generated. Here's an example. We examine resampling curve 1, and firstly examine parameter 1. For this parameter of interest, a number of different parameter value sets were generated from the frequency curves (lets say 65), thus we have 65 different sets of simulation results. The previous method produced a summary showing the median of each output measure for each run. Now, this method calculates the median of these medians, for each output measure, and stores these in the summary. Thus, for each parameter of interest, the medians of each of the 65 sets of results are stored. The next parameter is then examined, until all have been analysed. This produces a snapshot showing the median simulation output for all parameter value sets generated for the first resample curve. These are stored with the file name Curve[Number]_Results_Summary in the directory specified in FILEPATH. Again this can be done recursively for a number of timepoints if required. efast_run_Analysis: Produces a file summarising the analysis; partitioning the variance between parameters and providing relevant statistics. These include, for each parameter of interest, first-order sensitivity index (Si), total-order sensitivity index (STi), complementary parameters sensitivity index (SCi), and relevant p-values and error bar data calculated using a two-sample t-test and standard error respectively. For a more detailed examination of this analysis, see the Marino paper or Saltelli book references, or the tutorial on the package website. An example of the output file generated can be seen in the data folder of this package (eFAST_Analysis.csv) For ease of representation, the method also produces a graph showing this data for each simulation output measure. Two examples can be seen in the extdata folder of this package (eFAST_Displacement.pdf and eFAST_Velocity.pdf). Again, these graphs and summaries can be produced for multiple timepoints. There is an additional method that can plot the Si measure for each parameter at different timepoints, ploteFASTSiFromTimepointFiles

Usage

efast_generate_medians_for_all_parameter_subsets(FILEPATH,
	NUMCURVES,PARAMETERS,NUMSAMPLES,NUMRUNSPERSAMPLE,
	MEASURES,RESULTFILENAME,ALTERNATIVEFILENAME,
	OUTPUTCOLSTART,OUTPUTCOLEND,TIMEPOINTS=NULL,
	TIMEPOINTSCALE=NULL)
efast_get_overall_medians(FILEPATH,NUMCURVES,PARAMETERS,
	NUMSAMPLES,MEASURES,TIMEPOINTS=NULL,
	TIMEPOINTSCALE=NULL)
efast_run_Analysis(FILEPATH,MEASURES,PARAMETERS,NUMCURVES,
	NUMSAMPLES,OUTPUTMEASURES_TO_TTEST,TTEST_CONF_INT,
	GRAPH_FLAG,EFASTRESULTFILENAME,
	TIMEPOINTS=NULL,TIMEPOINTSCALE=NULL)
	
ploteFASTSiFromTimepointFiles(FILEPATH,PARAMETERS,MEASURES,
	EFASTRESULTFILENAME,TIMEPOINTS,TIMEPOINTSCALE)

Arguments

FILEPATH

Directory where the simulation runs can be found, in folders or in CSV file format

NUMCURVES

The number of 'resamples' to perform (see eFAST documentation) - recommend using at least 3

PARAMETERS

Array containing the names of the parameters of which parameter samples will be generated

NUMSAMPLES

The number of parameter subsets that were generated in the eFAST design

NUMRUNSPERSAMPLE

The number of runs performed for each parameter subset. This figure can be generated through Aleatory Analysis

MEASURES

Array containing the names of the output measures which are used to analyse the simulation

OUTPUTCOLSTART

Column number in the simulation results file where output begins - saves (a) reading in unnecessary data, and (b) errors where the first column is a label, and therefore could contain duplicates. Only required if running the first method (to process results directly)

OUTPUTCOLEND

Column number in the simulation results file where the last output measure is. Only required if running the first method.

RESULTFILENAME

Name of the simulation results file (e.g. "trackedCells_Close.csv"). In the current version, XML and CSV files can be processed. Only required if running the first method (to process results directly). If performing this analysis over multiple timepoints, it is assumed that the timepoint follows the file name, e.g. trackedCells_Close_12.csv.

ALTERNATIVEFILENAME

In some cases, it may be relevant to read from a further results file if the initial file contains no results. This filename is set here. In the current version, XML and CSV files can be processed. Only required if running the first method (to process results directly)

OUTPUTMEASURES_TO_TTEST

Which measures in the range should be tested to see if the result is statistically significant. To do all, and if there were 3 measures, this would be set to 1:3

EFASTRESULTFILENAME

File name under which the full eFAST analysis should be stored. This will contain the partitioning of variance for each parameter. Example: eFAST_Analysis

TTEST_CONF_INT

The level of significance to use for the T-Test

GRAPH_FLAG

Whether graphs should be produced summarising the output - should be TRUE or FALSE

TIMEPOINTS

Implemented so this method can be used when analysing multiple simulation timepoints. If only analysing one timepoint, this should be set to NULL. If not, this should be an array of timepoints, e.g. c(12,36,48,60)

TIMEPOINTSCALE

Implemented so this method can be used when analysing multiple simulation timepoints. Sets the scale of the timepoints being analysed, e.g. "Hours"

References

For detailed information on how eFAST works, see either of the following: (a) Marino et al (2008): "A methodology for performing global uncertainty and sensitivity analysis in systems biology", (b) Saltelli et al (2000): "Sensitivity Analysis". MATLAB code is also available via an associated site stated in (a)

Examples

Run this code

# NOT RUN {
# THE CODE IN THIS EXAMPLE IS THE SAME AS THAT USED IN THE TUTORIAL, AND
# THUS YOU NEED TO DOWNLOAD THE TUTORIAL DATA SET AND SET FILEPATH
# CORRECTLY TO RUN THIS

##---Firstly,declare the parameters required for the functions--
# Folder containing the simulation results, or CSV files
FILEPATH<-"/home/user/Downloads/eFAST/"
# Number of resample curves employed when the parameter space was
# sampled
NUMCURVES<-3
# Array of the parameters to be analysed
PARAMETERS <- c("BindProbability","ChemoThreshold",
"ChemoUpperLinearAdjust","ChemoLowerLinearAdjust",
"VCAMProbabilityThreshold","VCAMSlope","Dummy")
# The number of parameter value sets created in latin-hypercube
# sampling
NUMSAMPLES<-65
# Number of runs performed for each parameter value set
NUMRUNSPERSAMPLE<-300
# The simulation output measures being examined
MEASURES<-c("Velocity","Displacement")
# The output file containing the simulation results from that 
# simulation run
RESULTFILENAME<-"trackedCells_Close.csv"
# Not used in this case, but this is useful in cases where two 
# result files may exist (for example if tracking cells close 
# to an area, and those further away two output files could be used). 
# Here, results in a second file are processed if the first is blank
# or does not exist.
ALTERNATIVEFILENAME<-NULL
# Used with CSV result file formats
# The column within the csv results file where the results start. 
# This is useful as it restricts what is read in to R, getting round 
# potential errors where the first column contains an agent label 
# (as R does not read in CSV files where the first column contains 
# duplicates)
OUTPUTCOLSTART<-10
# Used with CSV result file formats
# Last column of the output measure results
OUTPUTCOLEND<-11
# Name of the final result file for this analysis, showing the 
# partitioning of the variance between input parameters
EFASTRESULTFILENAME<-"eFAST_Analysis.csv"
# Which of the output measures to T-Test for significance (if not all)
OUTPUTMEASURES_TO_TTEST<-1:2
# T-Test confidence level
TTEST_CONF_INT<-0.95
# Boolean to note whether summary graphs should be produced
GRAPH_FLAG<-TRUE
# Timepoints being analysed. Must be NULL if no timepoints being 
# analysed, or else be an array of timepoints. Scale sets the 
# measure of these timepoints
#TIMEPOINTS<-NULL; TIMEPOINTSCALE<-NULL
# Example Timepoints:
TIMEPOINTS<-c(12,36,48,60); TIMEPOINTSCALE<-"Hours"

# }
# NOT RUN {
# DONTRUN IS SET SO THIS IS NOT EXECUTED WHEN PACKAGE IS COMPILED - BUT THIS
# HAS BEEN TESTED WITH THE TUTORIAL DATA

library(spartan)
# Import the graphing package
library(gplots)

##--- NOW RUN THE FOUR METHODS IN THIS ORDER ----
# FIRSTLY, WHERE MULTIPLE RUNS ARE PERFORMED,
# MEDIAN DISTRIBUTIONS NEED TO BE GAINED FOR EVERY RUN
efast_generate_medians_for_all_parameter_subsets(FILEPATH,
	NUMCURVES,PARAMETERS,NUMSAMPLES,NUMRUNSPERSAMPLE,MEASURES,
	RESULTFILENAME,ALTERNATIVEFILENAME,OUTPUTCOLSTART,
	OUTPUTCOLEND,TIMEPOINTS,TIMEPOINTSCALE)


# NOW NEED TO CREATE THE OUTPUT FILE THAT THE EFAST ANALYSIS SCRIPTS
# USE - A FILE SHOWING THE OVERALL MEDIAN RESULTS FOR EACH THE RUNS
# PERFORMED FOR EVERY PARAMETER OF INTEREST, FOR THAT CURVE.
# ONE FILE IS CREATED PER CURVE
efast_get_overall_medians(FILEPATH,NUMCURVES,PARAMETERS,NUMSAMPLES,
	MEASURES,TIMEPOINTS,TIMEPOINTSCALE)

# NOW THESE ALLCURVE.CSV FILES HAVE BEEN GENERATED, FULL ANALYSIS
# CAN BEGIN
efast_run_Analysis(FILEPATH,MEASURES,PARAMETERS,NUMCURVES,
	NUMSAMPLES,OUTPUTMEASURES_TO_TTEST,TTEST_CONF_INT,
	GRAPH_FLAG,EFASTRESULTFILENAME,
	TIMEPOINTS,TIMEPOINTSCALE)

# IF ANALYSING A SIMULATION AT SET TIMEPOINTS, YOU CAN PLOT THE Si
# MEASURE OVER TIME
ploteFASTSiFromTimepointFiles(FILEPATH,PARAMETERS,MEASURES,
	EFASTRESULTFILENAME,TIMEPOINTS,TIMEPOINTSCALE)

# }

Run the code above in your browser using DataLab