integrated.analysis(samples, input.regions = "all chrs", input.region.indep = NULL, zscores = FALSE, method = c("full", "smooth", "window", "overlap"), dep.end = 1e5, window = c(1e6, 1e6), smooth.lambda=2, adjust = ~1, run.name = "analysis_results", ...)
vector
with either the names of the columns in the dependent and independent data
corresponding to the samples, or a numerical vector containing the column numbers to include in
the analysis, e.g. 5:10 means columns 5 till 10. Make sure that both datasets have the same number
of samples with the same column names!vector
indicating the dependent regions to be analyzed. Can be defined in four ways:
1) predefined input region:
insert a predefined input region, choices are:
all chrs,
all chrs auto,
all arms,
all arms auto
In the predefined regions all arms and all arms auto the arms 13p, 14p, 15p, 21p and 22p
are left out, because in most studies there are no or few probes in these regions.
To include them, just make your own vector of arms.
2) whole chromosome(s):
insert a single chromosome or a list of chromosomes as a
vector:
c(1, 2, 3)
.
3) chromosome arms:
insert a single chromosome arm or a list of chromosome arms like
c("1q", "2p", "2q")
.
4) subregions of a chromosome:
insert a chromosome number followed by the start and end position like
"chr1:1-1000000"
These regions can also be combined, e.g. c("chr1:1-1000000","2q", 3)
.
See details
for more information.logical
indicates whether the Z-scores are calculated (takes longer time to run).
If zscores = FALSE
, only P-values are calculated.integrated.analysis
.
full: the whole dependent data region is taken.
window: takes the middle of the dependent probe and does the integration on the independent probes that are within
the window given at window-size given by window
.
overlap: does the integration on the independent probes that are within the start and end of the dependent probes given at
dep.end
.
smooth: does smooth on the dependent probes with smoothing factor given at smooth.lambda
, finds the value of smooth
for each independent probe and does the integration on them. Only needed when method = "smooth"
, default smooth.lambda = 2
numeric
or character
either the name of the column end in the dependent data or, when not available, an numeric value which
indicates the end deviating from the start. When a numeric value is inserted, the function will do:
$start + dep.end = end$. Only needed when method = "window"
or overlap.method = "window"
.numeric
factor used for smoothing the dependent data. Only needed when method
= "smooth". See quantsmooth for more information.
By default the segment = min(nrow(dep.data), 100)
.formula
a formula like ~gender, where gender is a vector of the same size as samples. The regression models is correct for the gender effect,
see gt.character
name of the analysis. The results will be stored in a folder with this name in the current working directory
(use getwd()
to print the current working directory). If missing the default folder "analysis\_results"
will be generated.run.name
. E.g. the z-score matrices
are saved in subfolder method
.The following functions can be used to visualize the data:zscores = TRUE
)zscores = TRUE
?gt
).
This function splits the datasets into separate sets for each region (as specified by the input.regions
) and runs the analysis for each region
separately.
When running the Integrated Analysis for a predefined input region, like all arms
and all chrs, output can be obtained for all input regions, as well as
subsets of it. But note that the genomic unit must be the same: if integrated.analysis
was run using chromosomes as units, any of the functions and plots must also use chromosomes
as units, and not chromosome arms. Similarly, if integrated analysis
was run using
chromosome arms as units, these units must also be used to produce plots and outputs.
For example if the input.regions = "all arms"
was used, P-value plots
(see sim.plot.pvals.on.region can be produced by inserting the input.regions = "all arms"
,
but also for instance 1p or 20q. However, to produce a plot of the whole
chromosome, for example chromosome 1, the integrated should be re-run with input.region=1
.
The same goes for all chrs: P-value plots etc. can be produced for chromosome 1,2 and so on...
but to produce plots for an arm, the integrated.analysis
should be re-run for that region.
This also goes for subregions of the chromosome like "chr1:1-1000000". By default the gt uses a linear model, only when the dependent data is a logical matrix
containing
TRUE
and FALSE
a logistic model is selected. All other models need model = ""
, see gt
for available models.
Goeman JJ, van de Geer SA, de Kort F, van Houwelingen HC (2004). A global test for groups of genes: testing association with a clinical outcome. Bioinformatics, 20, 93-109.
#first run example(assemble.data)
data(samples)
#perform integrated analysis without Z-scores using the method = "full"
integrated.analysis(samples=samples,
input.regions="8q",
zscores=FALSE,
method="full",
run.name="chr8q")
Run the code above in your browser using DataLab