ESS
objectThe R2GUESS
function reads and compiles data,
input files and parameters that are required to run GUESS
source code. It automatically runs GUESS
(enabling or not the
GPU capacity), saves the results and summary files in text
files. For portability, R2GUESS
generates an
ESS
object which compiles information about the input and
parameters used to run GUESS
, and outputs as detailed in
as.ESS.object
.
R2GUESS(dataY, dataX, path.input, path.output, path.par,
path.init = NULL, file.par, file.init = NULL,
file.log = NULL, nsweep, burn.in, Egam,
Sgam, root.file.output, time = TRUE, top = 100,
history = TRUE, label.X = NULL, label.Y = NULL,
choice.Y = NULL, nb.chain, conf = NULL, cuda = TRUE,
MAP.file = NULL, time.limit=NULL,seed=NULL)
either a one element character vector (such as
'dataY.txt
') or a data frame. If dataY
is entered as
a character vector, it specifies, assuming that data are in the
path.input
folder, the location of the response
matrix. In the corresponding file observations are presented in
rows, and the (possibly multivariate) outcome(s) in columns. The
first two rows (single integers) represent the number of rows
(n
) and columns (q
) in the matrix. If a data frame
argument is passed, it links to a nxq
numerical matrix
compiling the observed responses.
either a one element character vector (such as
'dataX.txt
') or a data frame. If dataX
is entered as
a character vector, it specifies, assuming that data are in the
path.input
folder, the location of the predictor
matrix. In the corresponding file observations are presented in
rows, and the predictors in columns. The first two rows (single
integers) represent the number of rows (n
) and columns
(p
) in the matrix. If a data frame argument is passed, it
links to a nxq
numerical matrix compiling the observed
predictors.
path linking to the directory containing the data
(dataX
and dataY
). If
dataX
or/and dataY
have
been entered as data frame(s), the function will generate the
corresponding text files required to run GUESS
in the
path.input
folder.
path indicating the directory in which output files will be saved.
path indicating the directory in which to find the
parameter file needed to run GUESS
.
path indicating the location of the file describing the initial guess of the MCMC procedure (i.e. the variables to include in the initial model).
name of the parameter file containing all
user-specified parameters required to set up the run and the
features of the moves. This file is located in path.par
and contains fields that are extensively described in
http://www.bgx.org.uk/software/GUESS_Doc_short.pdf. These
parameters are not mandatory and, if not specified, they will be
set to their default values, also given in documentation. An
example of this file is provided in the package.
name of the file specifying which
variables to include at the first iteration of the MCMC
run. The first row of the file is a single scalar
representing the number of rows (# variables to include).
Subsequent rows indicate the position of the covariates
to include. This file is optional and if not specified
(default=NULL
), initial guesses of the MCMC algorithm
will be derived from a step-wise regression approach.
name of the log file. This file compiles in real time
summary information describing the initial parameters, the
computational time and state of the run. This file will also
contain information about moves sampled at each sweep. By default
(=NULL
), the name is given by the argument
root.file.output
extended by '_log'
and for
computational efficiency (especially when GPU is enabled), a
minimal amount of information is returned.
integer specifying the number of sweeps for the MCMC run (including the burn-in).
integer specifying the number of sweeps to be discarded to account the burn-in.
numeric representing the 'a priori' average model size.
numeric representing the 'a priori' standard deviation of the model size.
name specifying the file stem for writing the
output files in the directory specified by
path.output
.
Boolean value. When time=TRUE
(default value)
a file recording the time each sweep took will be
created and saved in path.output
directory.
number of top models to be reported in the output. The default value is 100.
Boolean value. When history=TRUE
(default
value), a number of additional output files that record the
history of each move is provided. See section 5 of
http://www.bgx.org.uk/software/GUESS_Doc_short.pdf for more
details.
a character vector specifying the name of the
predictors. If not specified (=NULL), variables are labelled by
their position in the matrix. Predictors name and information is
given in the MAP.file
in the case of SNP data (field
SNPName
).
a character vector specifying the name of
the outcomes. If not specified (=NULL), the outcomes are
labelled Y1,..Yq, where q is the number of columns in the
outcome matrix or will be named after the argument
dataY
(if specified by a data frame).
a character vector or a numeric vector specifying
which phenotypes in the response matrix dataY
to analyse
in a joint model. By default, all phenotypes in the response
matrix will be considered.
an integer specifying the number of chains to consider in the evolutionary procedure.
either a one element character vector (such as
'conf.txt
') or a data frame. If conf
is entered as a
character vector, it specifies, assuming that data are in the
path.input
folder, the location of the confounder
matrix. In the corresponding file observations are presented in
rows, and the values for the confounders in columns. The first two
rows (single integers) represent the number of rows (n
) and
columns (k
) in the matrix. If a data frame argument is
passed, it links to a nxk
numerical matrix compiling the
observed confounders. If specified, the function will substitute
the response matrix by the residuals from the linear model
regressing the confounders against the outcomes.
a boolean value. cuda=TRUE redirects linear algebra operations towards the GPU. On non-CULA compatible platforms, this option will be ignored.
either a one element character vector or a data
frame. If a character vector is used, it specifies, assuming that data are in the
path.input
folder, the location of the annotation
file. In the corresponding file, predictors are presented in
rows, and are described as a MAP.file
. If a data frame
argument is passed, it links to a px3
matrix.
a numerical value specifying the maximum computing
time (in hours) for the run. If the run exceeds that value,
modelling options, parameters value, state of the pseudo random
number generator, and state of each chain will be saved to enable
to resume the run exactly at the same point it was interrupted
(using resume
option). By default (=NULL
) the run
will go on until its completion.
a integer specifying the random seed used to initialize the pseudo-random number generator. If not specified, the seed will be initialised using the CPU clock.
An object of class ESS
containing information listed in
as.ESS.object
. The object can subsequently be used to post-process the results using
provided R
functions (such as summary.ESS
,
plotMPPI
, plot.ESS
).
For any of the dataX, dataY
parameters, if a data
frame argument is passed, a text file labelled
data-*-C-CODE.txt
will be created in the path.input
directory. If conf
is specified, and additional files
representing the adjusted responses will be created according to the
file labelling system.This file will be formatted to have the
suitable structure to be read by the C++ code: individuals presented
in rows, and observations in columns, with the first two rows
indicating the number of rows and columns in the matrix. The
returned ESS
object will include all result files produced by
the code. The number and type of outputs produced depend on the
running options chosen. A full description of the available
output can be found in
http://www.bgx.org.uk/software/GUESS_Doc_short.pdf
as.ESS.object, summary.ESS,as.ESS.object, plotMPPI, plot.ESS
# NOT RUN {
path.input <- system.file("Input", package="R2GUESS")
path.output <- tempdir()
path.par <- system.file("extdata", package="R2GUESS")
file.par.Hopx <- "Par_file_example_Hopx.xml"
#you can have a look of the parameter file in
print(paste(path.par,file.par.Hopx,sep=""))
##To reach convergence you may need to increase nsweep=110000 and the burn.in=10000
## RUNNING is APPROX 5 minutes
root.file.output.Hopx <- "Example-GUESS-Y-Hopx"
label.Y <- c("ADR","Fat","Heart","Kidney")
data(data.Y.Hopx)
data(data.X)
data(MAP.file)
modelY_Hopx<-R2GUESS(dataY=data.Y.Hopx,dataX=data.X,choice.Y=1:4,
label.Y=label.Y,,MAP.file=MAP.file,file.par=file.par.Hopx,file.init=NULL,
file.log=NULL,root.file.output=root.file.output.Hopx,path.input=path.input,
path.output=path.output,path.par=path.par,path.init=NULL,nsweep=11000,
burn.in=1000,Egam=5,Sgam=5,top=100,history=TRUE,time=TRUE,
nb.chain=3,conf=NULL,cuda=FALSE)
summary(modelY_Hopx,20) # 20 best models
print(modelY_Hopx)
# }
Run the code above in your browser using DataLab