Learn R Programming

R2GUESS (version 2.0)

as.ESS.object: Compiles the main input and output files from a previous run of R2GUESS and creates an ESS object.

Description

The as.ESS.object function compiles the main information relating to a previous run of R2GUESS and compiles them into an ESS object to be further analyzed. Main parameters (e.g. nsweep, burn.in) are read from the 'feature' file automatically generated at the end of every R2GUESS run. Main outputs are also included in the object to enable post-processing and further analyses.

Usage

as.ESS.object(dataY, dataX, path.input, path.output,
  root.file.output, label.X = NULL, label.Y = NULL, path.par,
  path.init = NULL, file.par, file.init = NULL, file.log = NULL,
  MAP.file = NULL, command=TRUE)

Arguments

dataY

a character vector (such as 'dataY.txt') specifying, assuming that data are in the path.input folder, the location of the response matrix. In the corresponding file observations are presented in rows, and the (possibly multivariate) outcome(s) in columns. The first two rows (single integers) represent the number of rows (n) and columns (q) in the matrix.

dataX

is a character vector (such as 'dataX.txt') specifying, assuming that data are in the path.input folder, the location of the predictor matrix. In the corresponding file observations are presented in rows, and the predictors in columns. The first two rows (single integers) represent the number of rows (n) and columns (p) in the matrix.

path.input

path linking to the directory containing the data (dataX and dataY). If dataX or/and dataY have been entered as data frame(s), the function will generate the corresponding text files required to run GUESS in the path.input folder.

path.output

path indicating the directory in which output files are saved.

root.file.output

name specifying the file stem of the different output files in the path.output directory.

label.X

a character vector specifying the name of the predictors. If not specified (=NULL), the variables are labelled by their position in the matrix. Predictors name and information can be given in the MAP.file in the case of SNP data (field SNPName).

label.Y

a character vector specifying the name of the outcomes. If not specified (=NULL), the outcomes are labelled by Y1,..Yq where q is the dimension of the response matrix, or will be the name of the argument dataY (if specified by a data frame).

path.par

path to the directory containing the parameter file (argument file.par)

path.init

path to the directory containing the init file (argument file.init) specifying which variable were included at the first iteration of the MCMC run. By default (file.init=NULL) no init file is required.

file.par

name of the parameter file containing all the user-specified parameters used to set up the run and the features of the moves. This file is located in path.par and contains fields that are extensively described in http://www.bgx.org.uk/software/GUESS_Doc_short.pdf.

file.init

name of the file specifying which variable have been included at the iteration of the MCMC run.

file.log

name of the log file. This file compiles in real time summary information describing the initial parameters, the computational time and state of the run. This file will also contains information about moves sampled at each sweep. By default (=NULL), the name is given by the argument root.file.output extended by '_log' and for computational efficiency (especially when GPU is enabled) a minimal amount of information is returned.

MAP.file

is either a one element character vector or a data frame. If a character vector is used, it specifies, assuming that data are in the path.input folder, the location of the annotation file. In the corresponding file each predictor is presented in rows, and are described as a MAP.file. If a data frame argument is passed, it links to a px3 matrix.

command

Boolean specifying whether the automatically generated C++ command line is saved in the object or not.

Value

An object of class ESS which compiles the following information:

dataY

a character vector defining the location of the response matrix, assuming that data are in the path.input folder.

dataX

a character vector defining the location of the predictor matrix, assuming that data are in the path.input folder.

path.input

path linking to the directory containing the data (dataX and dataY).

path.output

path indicating the directory in which output files were saved.

path.par

path indicating the directory in which to find the parameter file used for the run.

path.init

path indicating the location of file describing the initial guess of the MCMC procedure. If no init files were specified, the field is set to NULL.

time

Boolean value indicating if a file recording the time each sweep took has been created and saved in path.output directory.

file.par

name of the parameter file containing all the user-specified parameters used to set up the run and the features of the moves.

file.init

name of the file specifying which variables were arbitrarily included at the iteration of the MCMC run. If no init file was specified (=NULL), initial guesses were defined by a stepwise regression approach.

file.log

location of the log file.

root.file.output

file name specifying the file stem used to write the output files in the directory specified by path.output.

nsweep

integer specifying the number of sweeps of the MCMC run (including the burn-in).

top

the number of top models that are reported in the output.

BestModels

A list describing the best model visited, with respect to the fields listed in the summary.ESS.

label.X

a character vector specifying the name of the predictors. If not specified (=NULL), the variables are labelled by their position in the matrix from 1 to p.

label.Y

a character vector specifying the name of the outcomes. If not specified (=NULL), the outcomes are labelled by Y1,..Yq, where q is the dimension of the outcome matrix.

p

the number of predictors in the X matrix.

q

the number of outcomes in the response matrix.

n

the number of observations.

nb.chain

the number of chains in the evolutionary algorithm.

burn.in

integer specifying the number of sweeps which were discarded to account for burn-in.

conf

a character vector defining the location of the file compiling observed values for the confounders of interest.

cuda

a boolean value indicating if linear algebra operations have been re-routed towards the GPU.

Egam

a priori average model size.

Sgam

a priori standard deviation of the model size.

MAP.file

a character vector specifying the location of the predictor annotation file, assuming that data are in path.input.

command

a character vector describing the C++ command line used to generate the results, if saved.

seed

the random seed used to initialise the pseudo-random number generator.

Finish

a Boolean value indicating if the run terminated, or was interrupted before reaching the user-defined time limit.

Examples

Run this code
# NOT RUN {
dataX <- "data-X-C-CODE.txt"
dataY <- "data-Y-ALL-C-CODE.txt"

path.input <- system.file("Input", package="R2GUESS")
path.output <- tempdir()
file.copy(system.file("Output", package="R2GUESS"), path.output, recursive = TRUE)
path.output <- file.path(path.output, "Output")
path.par <- system.file("extdata", package="R2GUESS")
file.par <- "Par_file_example_Hopx.xml"
root.file.output <- "Example-GUESS-Y-Hopx"
label.Y <- c("ADR","Fat","Heart","Kidney")
my.env <- new.env()
data(MAP.file,envir=my.env)
MAP.file <- my.env$MAP.file
modelY_Hopx <-as.ESS.object(dataY=dataY,dataX=dataX,path.input=path.input,
    path.output=path.output,root.file.output=root.file.output,label.X=NULL,
    label.Y=label.Y,path.par=path.par,file.par=file.par,MAP.file=MAP.file)

print(modelY_Hopx)
class(modelY_Hopx)
# }

Run the code above in your browser using DataLab