Technique 3: Latin-Hypercube: Perform Analysis of Results: LHC: Perform Analysis of Results

Description

Though Technique 2 does elucidate the effects of perturbations of one parameter, it cannot show any non-linear effects which occur when two or more are adjusted simultaneously. Thus we have included this method described by Read et al, Saltelli et al, and others (Reference's below). The technique above provides the means to sample the parameter space using a latin hypercube approach, and this technique allows for the analysis of simulation results generated using these parameter sets. Where a simulation is stochastic (such as agent-based simulations), the simulation should be run n number of times (n can be established using Technique 1 - Aleatory Analysis), after which the first method of this technique will generate median values for each output measure over the n runs. A summary file is then created, containing all parameter value sets that were created in sampling alongside these calculated median results. An example of such a file can be seen in the data directory of this package (LHCSummary.csv). With this summary complete, each parameter being analysed is processed in turn, to determine if there are any correlations between the value of this parameter and simulation output result, although all other parameter values are being perturbed. Partial Rank Correlation Coefficients are generated for each output measure, for each parameter. These give a statistical indication of any correlations that have now become apparent. An example of this can again be seen in the data directory (corCoeffs.csv). To ease identification of such effects, graphs are produced for each parameter, showing the parameter value against the simulation result (output measure).

Note 1: From Spartan 2.0, you can specify your simulation data in two ways: A - Set folder structure (as in previous versions of Spartan): This is shown in figure LHC_Folder_Struc.png within the extdata folder of this package, and described in detail in the tutorial. Using this structure, the parameter FILEPATH should point to a directory that contains a number of folders, one for each of the parameter samples generated by the hypercube. These will in turn contain folders for all simulations run under those parameter conditions. B - B - Single CSV file Input. From Spartan 2.0, you can specify all your results in a single CSV file. An example of this file can be found in the extdata folder of the package, named LHC_AllResults.csv. Each row of this CSV file should consist of the parameters that were generated by the hypercube and the simulation responses these generate. There may be more than one row of responses per set of simulation responses, if performing replicate runs. Note 2: From Spartan 2.0, performing this analysis at multiple timepoints is now performed using the same method calls below. There are no additional method calls for timepoint analysis.

This technique consists of five methods: lhc_process_sample_run_subsets: Only to be applied in cases where simulation responses are supplied in the folder structure (as in all previous versions of Spartan), useful for cases where the simulation is agent-based. Takes each parameter value set generated by the hypercube in turn, and analyses the replicate simulation results for that set. Produces a CSV file containing the parameters of the run and the median of each simulation response for each run. In cases where, for example, 300 runs have been performed for a parameter set, this file will contain 300 rows for that set, each accompanied by the median of each simulation response for that run. This file will be named as specified by parameter LHC_ALL_SIM_RESULTS_FILE. This method can be performed for a number of simulation timepoints, producing CSV files for each timepoint taken. lhc_generateLHCSummary: Processes either the CSV file generated by the method above or one that has been supplied, going through each method and generating a file that summarises simulation responses under each parameter set. This CSV file, named as specified by parameter LHCSUMMARYFILENAME, will contain one row for each parameter set, accompanied by the median of all the responses contained in the LHC_ALL_SIM_RESULTS_FILE. An example of this file can be found in the extdata folder of this package, named LHC_Summary.csv. This method can also be performed for a number of simulation timepoints. lhc_generatePRCoEffs: For each parameter, and each simulation output measure, calculates the Partial Rank Correlation Coefficient between the parameter value and the simulation results, giving a statistical measurement of any effect that is present. This is output to a CSV file, an example of which can be seen in the data folder of this package (EgSet_LHC_corCoeffs.csv). Can again be performed for a number of timepoints if required. lhc_plotCoEfficients: Plots the Partial Rank Correlation Coefficients for either all measures or for one individual measure, for all simulation parameters. lhc_graphMeasuresForParameterChange: Produces a graph for each parameter, and each output measure, showing the simulation output achieved when that parameter was assigned that value. Eases identification of any non-linear effects. Two examples can be seen in the extdata folder (LHC_maxVCAMeffectProbabilityCutoff_Velocity.pdf and LHC_chemoThreshold_Velocity.pdf). Can again be performed for a number of timepoints if required. Since Spartan 2.3, there are additional methods that can process latin-hypercube results for a number of timepoints: lhc_generateTimepointFiles: Used when a simulation is being processed where the results of each timepoint are in the same file. This method places these in multiple files to make the results compatible with spartan. lhc_calculatePRCCForMultipleTimepoints: Calculates the PRCC for each parameter at each timepoint in the TIMEPOINTS vector. Unlike the other methods in Spartan, this stores PRCC and P-Value in 2 different files to make the plot function easier. plotPRCCSFromTimepointFiles: Plots Graphs for Partial Rank Correlation Coefficients Over Time, to show how the impact of a parameter changes over time. lhc_graphPRCCForMultipleTimepoints: Also plots graphs for PRCC over time, yet only for one parameter, and contrasts the PRCCs for that parameter with a dummy parameter. lhc_countSignificantParametersOverTime: Counts the number of parameters across a number of timepoints where the p-value is significant (p<0.01).

Usage

lhc_process_sample_run_subsets(FILEPATH,
	SPARTAN_PARAMETER_FILE,PARAMETERS,NUMSAMPLES,
	NUMRUNSPERSAMPLE,MEASURES,RESULTFILENAME,
	ALTERNATIVEFILENAME,OUTPUTCOLSTART,OUTPUTCOLEND,
	LHC_ALL_SIM_RESULTS_FILE,TIMEPOINTS=NULL,
	TIMEPOINTSCALE=NULL)
lhc_generateLHCSummary(FILEPATH,PARAMETERS,MEASURES,
	LHC_ALL_SIM_RESULTS_FILE,LHCSUMMARYFILENAME,
	SPARTAN_PARAMETER_FILE,TIMEPOINTS=NULL,
	TIMEPOINTSCALE=NULL)
lhc_generatePRCoEffs(FILEPATH,PARAMETERS,MEASURES,
	LHCSUMMARYFILENAME,CORCOEFFSOUTPUTFILE,
	TIMEPOINTS=NULL,TIMEPOINTSCALE=NULL)
	
lhc_plotCoEfficients(FILEPATH, CORCOEFFSOUTPUTFILE, 
	MEASURES,PRINTOPT,TIMEPOINTS=NULL,
	TIMEPOINTSCALE=NULL)
lhc_graphMeasuresForParameterChange(FILEPATH,PARAMETERS,
	MEASURES,MEASURE_SCALE,CORCOEFFSOUTPUTFILE,
	LHCSUMMARYFILENAME,TIMEPOINTS=NULL,
	TIMEPOINTSCALE=NULL)
	
lhc_generateTimepointFiles(FILEPATH,SPARTAN_PARAMETER_FILE,
	RUN_SUMMARY_FILE_NAME,NUMSAMPLES,NUMRUNSPERSAMPLE,
	TIMEPOINTS)
	
lhc_calculatePRCCForMultipleTimepoints(FILEPATH, 
	CORCOEFFSOUTPUTFILE,TIMEPOINTS,MEASURES)
	
plotPRCCSFromTimepointFiles(FILEPATH,PARAMETERS,
	MEASURES,CORCOEFFSFILENAME,TIMEPOINTS,TIMEPOINTSCALE,
	DISPLAYPVALS=FALSE)
	
lhc_graphPRCCForMultipleTimepoints(FILEPATH,MEASURES,
	TIMEPOINTS)
	
lhc_countSignificantParametersOverTime(FILEPATH,
	MEASURES,TIMEPOINTS)

Arguments

FILEPATH

Directory where the simulation runs of single CSV file can be found

NUMSAMPLES

The number of parameter subsets that were generated in the LHC design. Only required if analysing results provided within Folder structure setup.

NUMRUNSPERSAMPLE

The number of runs performed for each parameter subset. This figure is generated through Aleatory Analysis. Only required if analysing results provided within Folder structure setup.

MEASURES

Array containing the names of the output measures which are used to analyse the simulation

RESULTFILENAME

Name of the simulation results file (e.g. "trackedCells_Close.csv"). In the current version, XML and CSV files can be processed. Only required if running the first method (to process results directly). If performing this analysis over multiple timepoints, it is assumed that the timepoint follows the file name, e.g. trackedCells_Close_12.csv.

ALTERNATIVEFILENAME

In some cases, it may be relevant to read from a further results file if the initial file contains no results. This filename is set here. In the current version, XML and CSV files can be processed. Only required if running the first method (to process results directly)

OUTPUTCOLSTART

Column number in the simulation results file where output begins - saves (a) reading in unnecessary data, and (b) errors where the first column is a label, and therefore could contain duplicates. Only required if running the first method (to process results directly)

OUTPUTCOLEND

Column number in the simulation results file where the last output measure is. Only required if running the first method.

SPARTAN_PARAMETER_FILE

Location of the file output by the latin-hypercube sampling method. Note if providing a single CSV file with parameter/response pairings, you do not need to provide this file, and can thus enter this parameter as NULL.

LHC_ALL_SIM_RESULTS_FILE

If lhc_process_sample_run_subsets is used (i.e. results processed by folder structure), this will contain the output of that method. If specifying responses using a single CSV file, this will contain the name of that file (which should be in the FILEPATH folder).

PARAMETERS

Array containing the names of the parameters of which parameter samples will be generated

LHCSUMMARYFILENAME

Name of the LHC Summary file to be generated by lhc_generateLHCSummary. Contains each parameter set alongside the result gained when the simulation was run under that criteria. Example - LHC_Summary

CORCOEFFSOUTPUTFILE

Name of the file to be generated by lhc_generatePRCoEffs. Contains the Partial Rank Correlation Coefficients for each parameter. Example - CorCoEffs

CORCOEFFSFILENAME

Name of the file generated by lhc_generatePRCoEffs. Contains the Partial Rank Correlation Coefficients for each parameter. Example - CorCoEffs

PRINTOPT

Used in plotting Partial Rank Correlation Coefficients, should be either "ALL" or "INDIVIDUAL"

MEASURE_SCALE

An array containing the measure used for each of the output measures (i.e. microns, microns/min). Used to label graphs

RUN_SUMMARY_FILE_NAME

Used in processing timepoints. Should be the name of the file summarising the results at a particular timepoint

TIMEPOINTS

Implemented so this method can be used when analysing multiple simulation timepoints. If only analysing one timepoint, this should be set to NULL. If not, this should be an array of timepoints, e.g. c(12,36,48,60)

TIMEPOINTSCALE

Implemented so this method can be used when analysing multiple simulation timepoints. Sets the scale of the timepoints being analysed, e.g. "Hours"

DISPLAYPVALS

For the graph of P-Values over time, whether the p-values should be displayed on the graph as a table

References

This technique is described by Read et al (2011) in their paper: Techniques for Grounding Agent-Based Simulations in the Real Domain: a case study in Experimental Autoimmune Encephalomyelitis", and also in the Saltelli et al book: "Senstivity Analysis". Code to perform Partial Rank Correlation Coeffient has been downloaded from http://www.yilab.gatech.edu/pcor.R

Examples

Run this code

# NOT RUN {
# THE CODE IN THIS EXAMPLE IS THE SAME AS THAT USED IN THE TUTORIAL, AND
# THUS YOU NEED TO DOWNLOAD THE TUTORIAL DATA SET AND SET FILEPATH
# CORRECTLY TO RUN THIS

##--Firstly, declare the parameters required for the 4 functions--
# Folder containing the simulation results or single CSV result file
FILEPATH<-"/home/kieran/Downloads/LHC_Spartan2/"
# Array of the parameters to be analysed
PARAMETERS<-c("thresholdBindProbability","chemoThreshold",
"chemoUpperLinearAdjust","chemoLowerLinearAdjust",
"maxVCAMeffectProbabilityCutoff","vcamSlope")
# The simulation output measures being examined
MEASURES<-c("Velocity","Displacement")
# What each measure represents. Used in graphing results
MEASURE_SCALE<-c("microns/min","microns")
# The number of parameter value sets created in latin-hypercube
# sampling
NUMSAMPLES <- 500
# Number of runs performed for each parameter value set
NUMRUNSPERSAMPLE<-500
# The output file containing the simulation results from that 
# simulation run
RESULTFILENAME<-"trackedCells_Close.csv"
# Not used in this case, but this is useful in cases where 
# two result files may exist (for example if tracking cells close to 
# an area, and those further away two output files could be used). 
# Here, results in a second file are processed if the first is blank 
# or does not exist. Note no file extension if used.
ALTERNATIVEFILENAME<-NULL
# Use this if simulation results are in CSV format.
# The column within the csv results file where the results start. 
# This is useful as it restricts what is read in to R, getting round 
# potential errors where the first column contains an agent label 
# (as R does not read in CSV files where the first column contains 
# duplicates)
OUTPUTCOLSTART<-10
# Use this if simulation results are in CSV format.
# Last column of the output measure results
OUTPUTCOLEND<-11
# For each parameter value set being analysed, a file is created 
# containing the median of each output measure, of each simulation run 
# for that value. This sets the name of this file. 
LHC_ALL_SIM_RESULTS_FILE<-"LHC_AllResults.csv"
# Location of a file containing the parameter value sets generated 
# by the hypercube sampling (i.e. the file generated in the previous 
# method of this tutorial.) However if providing a CSV file with all 
# results, you do not need to  provide this
LHC_PARAM_CSV_LOCATION<-"Tutorial_Parameters_for_Runs.csv"
# File name to give to the summary file that is produced showing the 
# parameter value sets alongside the median results for each simulation 
# output measure. 
LHCSUMMARYFILENAME<-"LHC_Summary.csv"
# File name to give to the file showing the Partial Rank Correlation 
# Coefficients for each parameter.
CORCOEFFSOUTPUTFILE<-"EgSet_corCoeffs.csv"
# Option to print the Partial Rank Correlation Coefficients
PRINTOPT<-"ALL"
# Timepoints being analysed. Must be NULL if no timepoints being analysed,
# or else be an array of timepoints. Scale sets the measure of these 
# timepoints
TIMEPOINTS<-NULL; TIMEPOINTSCALE<-NULL
# Example Timepoints:
#TIMEPOINTS<-c(12,36,48,60); TIMEPOINTSCALE<-"Hours"
# Whether to display the p-values on the graph of PRCCS over time
# This can look unsightly if you have more than 3 output measures
# You would then be better producing these in a table. The p-values are placed in a CSV file 
# produced by spartan
DISPLAYPVALS<-TRUE

# }
# NOT RUN {
# DONTRUN IS SET SO THIS IS NOT EXECUTED WHEN PACKAGE IS COMPILED - BUT THIS
# HAS BEEN TESTED WITH THE TUTORIAL DATA

##--- NOW RUN THE FOUR METHODS IN THIS ORDER ----
# A - FOR STOCHASTIC SIMS IN SET FOLDER STRUCTURE, GENERATE 
# THE MEDIANS FOR EACH SET OF PARAMETER VALUES
# GENERATED BY THE HYPERCUBE

lhc_process_sample_run_subsets(FILEPATH,
	LHC_PARAM_CSV_LOCATION,PARAMETERSNUMSAMPLES,
	NUMRUNSPERSAMPLE,MEASURES,RESULTFILENAME,
	ALTERNATIVEFILENAME,OUTPUTCOLSTART,OUTPUTCOLEND,
	LHC_ALL_SIM_RESULTS_FILE,TIMEPOINTS=NULL,
	TIMEPOINTSCALE=NULL)

# B - GENERATE THE SUMMARY FILE SHOWING THE PARAMETERS USED AND
# MEDIAN RESULTS FOR THE MEASURES OVER THE n RUNS
lhc_generateLHCSummary(FILEPATH,PARAMETERS,MEASURES,
	LHC_ALL_SIM_RESULTS_FILE,LHCSUMMARYFILENAME,
	LHC_PARAM_CSV_LOCATION=NULL,TIMEPOINTS=NULL,
	TIMEPOINTSCALE=NULL)

# C- CALCULATE THE PARTIAL RANK CORRELATION COEFFICIENTS
lhc_generatePRCoEffs(FILEPATH,PARAMETERS,MEASURES,
	LHCSUMMARYFILENAME,CORCOEFFSOUTPUTFILE,
	TIMEPOINTS=NULL,TIMEPOINTSCALE=NULL)
	
# D - GRAPH THE CORRELATION COEFFICIENTS
lhc_plotCoEfficients<-function(FILEPATH, CORCOEFFSOUTPUTFILE, 
	MEASURES, PRINTOPT, TIMEPOINTS=NULL,
	TIMEPOINTSCALE=NULL)

# E - GRAPH THE RESULTS FOR EACH PARAMETER MEASURE PAIRING
lhc_graphMeasuresForParameterChange(FILEPATH,PARAMETERS,
	MEASURES,MEASURE_SCALE,CORCOEFFSOUTPUTFILE,
	LHCSUMMARYFILENAME,TIMEPOINTS=NULL,
	TIMEPOINTSCALE=NULL)
	
# F - IF USING RESULTS FILES FROM MULTIPLE TIMEPOINTS, YOU 
# CAN PLOT THE PRCC FOR EACH TIMEPOINT, FOR ALL MEASURES, 
# ON ONE PARAMETER GRAPH
plotPRCCSFromTimepointFiles(FILEPATH,PARAMETERS,MEASURES,
	CORCOEFFSFILENAME,TIMEPOINTS,TIMEPOINTSCALE,
	DISPLAYPVALS)
	
# G - Now to move the Median results for each parameter set 
# into files containing results of all parameter sets. 
# One file per timepoint. If we do this we make the output 
# compatible with methods in the spartan toolkit
lhc_timepoint_files(FILEPATH,SPARTAN_PARAMETER_FILE,
	RUN_SUMMARY_FILE_NAME,NUMSAMPLES,NUMRUNS,TIMEPOINTS)
	
# H - Now summarise all the PRCCS for all timepoints & 
# parameters in one file - this will make graphing much easier
prccsEachTimepoint(FILEPATH, CORCOEFFSOUTPUTFILE, 
	TIMEPOINTS,MEASURES)
	
# I - Now graph each PRCC over time in comparison with the 
# dummy parameter (something new to spartan does not do), 
# to judge the impact of that parameter. These graphs are 
# output as PDF's in the stated filepath
graphPRCC_Over_Time(FILEPATH,MEASURES,TIMEPOINTS)

# J - Count and graph the number of significant parameters 
# for each measure
countSignificantParameters(FILEPATH,MEASURES,TIMEPOINTS)

# }
# NOT RUN {
# }

Run the code above in your browser using DataLab