R2GUESS
and creates an ESS object.The as.ESS.object
function compiles the main information
relating to a previous run of R2GUESS
and compiles them into
an ESS
object to be further analyzed. Main parameters (e.g. nsweep, burn.in
) are read from the
'feature
' file automatically generated at the end of every
R2GUESS
run. Main outputs are also included in the object to
enable post-processing and further analyses.
as.ESS.object(dataY, dataX, path.input, path.output,
root.file.output, label.X = NULL, label.Y = NULL, path.par,
path.init = NULL, file.par, file.init = NULL, file.log = NULL,
MAP.file = NULL, command=TRUE)
a character vector (such as
'dataY.txt
') specifying, assuming that data are in the
path.input
folder, the location of the response
matrix. In the corresponding file observations are presented in
rows, and the (possibly multivariate) outcome(s) in columns. The
first two rows (single integers) represent the number of rows
(n
) and columns (q
) in the matrix.
is a character vector (such as
'dataX.txt
') specifying, assuming that data are in the
path.input
folder, the location of the predictor
matrix. In the corresponding file observations are presented in
rows, and the predictors in columns. The first two rows (single
integers) represent the number of rows (n
) and columns
(p
) in the matrix.
path linking to the directory containing the data
(dataX
and dataY
). If
dataX
or/and dataY
have been entered
as data frame(s), the function will generate the corresponding
text files required to run GUESS
in the path.input
folder.
path indicating the directory in which output files are saved.
name specifying the file stem of the
different output files in the path.output
directory.
a character vector specifying the name of the
predictors. If not specified (=NULL), the variables are labelled
by their position in the matrix. Predictors name and information
can be given in the MAP.file
in the case of SNP
data (field SNPName
).
a character vector specifying the name of
the outcomes. If not specified (=NULL), the outcomes are
labelled by Y1,..Yq where q is the dimension of the
response matrix, or will be the name of the argument
dataY
(if specified by a data frame).
path to the directory containing the
parameter file (argument file.par
)
path to the directory containing the init
file (argument file.init
) specifying which variable
were included at the first iteration of the MCMC run. By
default (file.init=NULL
) no init file is
required.
name of the parameter file containing all the
user-specified parameters used to set up the run and the features
of the moves. This file is located in path.par
and contains
fields that are extensively described in
http://www.bgx.org.uk/software/GUESS_Doc_short.pdf.
name of the file specifying which variable have been included at the iteration of the MCMC run.
name of the log file. This file compiles in real time
summary information describing the initial parameters, the
computational time and state of the run. This file will also
contains information about moves sampled at each sweep. By default
(=NULL
), the name is given by the argument
root.file.output
extended by '_log'
and for
computational efficiency (especially when GPU is enabled) a
minimal amount of information is returned.
is either a one element character vector or a data
frame. If a character vector is used, it specifies, assuming that data are in the
path.input
folder, the location of the annotation
file. In the corresponding file each predictor is presented in
rows, and are described as a MAP.file
. If a data frame
argument is passed, it links to a px3
matrix.
Boolean specifying whether the automatically generated C++ command line is saved in the object or not.
An object of class ESS
which compiles the following
information:
dataY
a character vector defining the location of the response matrix, assuming that data are in the path.input
folder.
dataX
a character vector defining the location of the predictor matrix, assuming that data are in the path.input
folder.
path.input
path linking to the directory containing the data (dataX
and dataY
).
path.output
path indicating the directory in which output files were saved.
path.par
path indicating the directory in which to find the parameter file used for the run.
path.init
path indicating the location of file
describing the initial guess of the MCMC procedure. If no
init
files were specified, the field is set to NULL
.
time
Boolean value indicating if a file recording the
time each sweep took has been created and saved in path.output
directory.
file.par
name of the parameter file containing all the user-specified parameters used to set up the run and the features of the moves.
file.init
name of the file specifying which variables
were arbitrarily included at the iteration of the MCMC run. If no init
file was specified (=NULL
),
initial guesses were defined by a stepwise regression approach.
file.log
location of the log file.
root.file.output
file name specifying the file stem used
to write the output files in the directory specified by path.output
.
nsweep
integer specifying the number of sweeps of the MCMC run (including the burn-in).
top
the number of top models that are reported in the output.
BestModels
A list describing the best model
visited, with respect to the fields listed in the summary.ESS
.
label.X
a character vector specifying the name of the predictors. If not specified (=NULL), the variables are labelled by their position in the matrix from 1 to p.
label.Y
a character vector specifying the name of the outcomes. If not specified (=NULL), the outcomes are labelled by Y1,..Yq, where q is the dimension of the outcome matrix.
p
the number of predictors in the X matrix.
q
the number of outcomes in the response matrix.
n
the number of observations.
nb.chain
the number of chains in the evolutionary algorithm.
burn.in
integer specifying the number of sweeps which were discarded to account for burn-in.
conf
a character vector defining the location of the file compiling observed values for the confounders of interest.
cuda
a boolean value indicating if linear algebra operations have been re-routed towards the GPU.
Egam
a priori average model size.
Sgam
a priori standard deviation of the model size.
MAP.file
a character vector specifying the location of
the predictor annotation file, assuming that data are in
path.input
.
command
a character vector describing the C++
command line
used to generate the results, if saved.
seed
the random seed used to initialise the pseudo-random number generator.
Finish
a Boolean value indicating if the run terminated, or was interrupted before reaching the user-defined time limit.
# NOT RUN {
dataX <- "data-X-C-CODE.txt"
dataY <- "data-Y-ALL-C-CODE.txt"
path.input <- system.file("Input", package="R2GUESS")
path.output <- tempdir()
file.copy(system.file("Output", package="R2GUESS"), path.output, recursive = TRUE)
path.output <- file.path(path.output, "Output")
path.par <- system.file("extdata", package="R2GUESS")
file.par <- "Par_file_example_Hopx.xml"
root.file.output <- "Example-GUESS-Y-Hopx"
label.Y <- c("ADR","Fat","Heart","Kidney")
my.env <- new.env()
data(MAP.file,envir=my.env)
MAP.file <- my.env$MAP.file
modelY_Hopx <-as.ESS.object(dataY=dataY,dataX=dataX,path.input=path.input,
path.output=path.output,root.file.output=root.file.output,label.X=NULL,
label.Y=label.Y,path.par=path.par,file.par=file.par,MAP.file=MAP.file)
print(modelY_Hopx)
class(modelY_Hopx)
# }
Run the code above in your browser using DataLab