The list to describe the options that are used in sARTP
, rARTP
. It will be set by function options.default
by default.
The format is a list.
out.dir
output directory for temporary and output files. The default is the working directory getwd
.
id.str
character string that is appended to temporary file names. The default is "PID".
seed
integer for random number generation. The default is 1.
Options for testing an association:
method
1 = AdaJoint, 2 = AdaJoint2, 3 = ARTP. The default is 3. It can also be 'AdaJoint', 'AdaJoint2', or 'ARTP'. The package will convert it into upper case, so for example, 'Adajoint' is also accepted. The ARTP method was the proposed in Yu et al. (2009) Genet Epi, while AdaJoint and AdaJoint2 methods were proposed in Zhang et al. (2014) EJHG. Note that AdaJoint2 could be more powerful if (1) two functional SNPs are negative correlated and have effects in the same direction; or (2) two functional SNPs are positively correlated and have opposite directions of their effects.
nperm
the number of permutations. The default is 1E5.
nthread
the number of threads for multi-threaded processors in Unix/Linux OS. The default is detectCores()
to use all available processors.
Options for controlling data cleaning:
snp.miss.rate
any SNP with missing rate greater than snp.miss.rate
will be removed from the analysis. The default is 0.05.
maf
any SNP with minor allele frequency less than maf
will be removed from the analysis. The default is 0.05.
HWE.p
any SNP with HWE exact p-value less than HWE.p
will be removed from the analysis. The test is applied to the genotype data or reference data. The test is ignored if the imputed genotype are not encoded as 0/1/2. The default is 1E-5.
gene.R2
a number between 0 and 1 to filter out SNPs that are highly correlated within each gene. The cor
function will be called to compute the R^2 values between each pair of SNPs and remove one SNP with lower MAF in each pair with R^2 greater than gene.R2
. The default is 0.95.
chr.R2
a number between 0 and 1 to filter out SNPs that are highly correlated within each chromosome. The cor
function will be called to compute the R^2 values between each pair of SNPs and remove one SNP with lower MAF in each pair with R^2 greater than chr.R2
. The default is 0.95.
gene.miss.rate
threshold to remove genes based on their missing rate. Genes with missing rate greater than gene.miss.rate
will be removed from the analysis. The missing rate is calculated as the number of subjects with at least one missing genotype among all SNPs in the gene divided by the total number of subjects. The default is 1.0.
rm.gene.subset
TRUE
to remove genes which are subsets of other genes. The default is TRUE
.
turn.off.filters
a shortcut to turn off all SNP filters. If TRUE
, it is equivalent to set snp.miss.rate = 1
, maf = 0
, trim.huge.chr
, gene.R2 = 1
, chr.R2 = 1
, huge.gene.R2 = 1
, huge.chr.R2 = 1
, and HWE.p = 0
. The default is FALSE
.
impute
TRUE
to impute missing genotypes with the mean of a SNP. FALSE
to use another way other than imputation to handle missing data when constructing the score statistics, which is considered to be more power but also more time-consuming. The default is FALSE
. If the pathway is large and the missing rates are expected to be low, consider to set it to be TRUE
manually for reducing computational burden. It could be beneficial in terms of power with impute
set as FALSE
if the missing rate is high, e.g., the data are combined from multiple studies, and a SNP has missing genotypes because it is not measured or successfully imputed in some of the participating studies.
group.gap
an integer to regroup SNPs in a chromosome into independent groups. The unit is base-pair (bp). The position information will be collected from the fourth column of bim files. The default is NULL
, i.e., regrouping is not performed.
delete
TRUE
to delete temporary files containing the test statistics for each gene. The default is TRUE
.
print
TRUE
to print information to the console. The default is TRUE
.
tidy
the data frame deleted.snps
in the returned object of sARTP
containing information of SNPs excluded from the analysis and their reasons. Possible reason codes include RM_BY_SNP_NAMES
, RM_BY_REGIONS
, NO_SUM_STAT
, NO_RAW_GENO
, NO_REF
, SNP_MISS_RATE
, SNP_LOW_MAF
, SNP_CONST
, SNP_HWE
, GENE_R2
, HUGE_GENE_R2
, CHR_R2
, HUGE_CHR
, HUGE_CHR2
, HUGE_CHR3
, GENE_MISS_RATE
, GENE_SUBSET
, CONF_ALLELE_INFO
, LACK_OF_ACCU_BETA
. Set tidy
as TRUE
to hide the SNPs with codes NO_SUM_STAT
and NO_REF
. The default is TRUE
.
save.setup
TRUE
to save necessary data, e.g., working options, observed scores and covariance matrix, to local to repeat the analysis more quicly (skip loading and filtering data). It will be set to be TRUE
if only.setup
is TRUE
. The default is FALSE
.
path.setup
character string of file name to save the setup for warm.start
if save.setup
is TRUE
. The default is NULL
so that it is set as paste(out.dir, "/setup.", id.str, ".rda", sep = "")
.
only.setup
TRUE
if only the setup is needed while the testing procedure is not. The R code to create the setup uses single thread but the testing procedure can be multi-threaded. The best practice to use ARTP2
on a multi-threaded cluster is to firstly create the setup in single-thread mode, and then call the warm.start
to compute the p-values in multiple-thread mode, which uses the saved setup at path.setup
as input. save.setup
will be set to be TRUE
if only.setup
is TRUE
. The default is FALSE
.
keep.geno
TRUE
if the reference genotypes of SNPs in pathway is returned. The default is FALSE
.
excluded.snps
character vector of SNPs to be excluded in the analysis. NULL
if no SNP is excluded. The default is NULL
.
selected.snps
character vector of SNPs to be selected in the analysis. NULL
if all SNPs are selected but other filters may be applied. The default is NULL
.
excluded.regions
data frame with three columns Chr
, Start
, End
, or three columns Chr
, Pos
, Radius
. The unit is base-pair (bp). SNPs within [Start, End]
or [Pos - Radius, Pos + Radius]
will be excluded. See Examples
in sARTP
. This option is only available for sARTP
. The default is NULL
.
excluded.subs
character vector of subject IDs to be excluded in the analysis. These IDs must match with those in the second column (Individual ID) of the fam
files in reference
. The default is NULL
.
selected.subs
character vector of subject IDs to be selected in the analysis. These IDs must match with those in the second column (Individual ID) of the fam
files in reference
. The default is NULL
.
excluded.genes
character vector of genes to be excluded in the analysis. NULL
if no gene is excluded. The default is NULL
.
meta
TRUE
if return meta-analysis summary data from sARTP
. The default is FALSE
.
ambig.by.AF
TRUE or FALSE to align SNPs with ambiguous alleles by allele frequency (see details). The default is FALSE.
Options for handling huge pathways:
trim.huge.chr
oversized chromosomes could be further trimmed to accelerate the testing procedure. If TRUE
the additional options below are in effect. The default is TRUE
.
huge.gene.size
a gene with number of SNPs larger than huge.gene.size
will be further trimmed with huge.gene.R2
if trim.huge.chr
is TRUE
. The default is 1000.
huge.chr.size
a chromosome with number of SNPs larger than huge.chr.size
will be further trimmed with huge.chr.R2
if trim.huge.chr
is TRUE
. The default is 2000.
huge.gene.R2
more stringent R^2 threshold to filter out SNPs in a gene. Similar to gene.R2
. The default is gene.R2
- 0.05.
huge.chr.R2
more stringent R^2 threshold to filter out SNPs in a chromosome. Similar to chr.R2
. The default is chr.R2
- 0.05.
Options for gene-based test:
inspect.snp.n
the number of candidate truncation points to inspect the top SNPs in a gene. The default is 5. (See Details
)
inspect.snp.percent
a value x
between 0 and 1 such that a truncation point will be defined at every x
percent of the top SNPs. The default is 0 so that the truncation points will be 1:inspect.snp.n
. (See Details
)
Options for pathway-based test:
inspect.gene.n
the number of candidate truncation points to inspect the top genes in the pathway. The default is 10.
inspect.gene.percent
a value x
between 0 and 1 such that a truncation point will be defined at every x
percent of the top genes. If 0 then the truncation points will be 1:inspect.gene.n
. The default is 0.05.
Order of removing SNPs, genes and subjects:
1. Apply the options excluded.snps
and selected.snps
if non-NULL. Code: RM_BY_SNP_NAMES
.
2. Apply the option excluded.regions
if non-NULL and if sARTP
is used. Code: RM_BY_REGIONS
.
3. Remove SNPs without summary statistics in summary.files
. Code: NO_SUM_STAT
; or remove SNPs without raw genotype data in data
or geno.files
. Code: NO_RAW_GENO
.
4. Remove SNPs not in bim
files in reference
if sARTP
is used. Code: NO_REF
.
5. Remove SNPs with conflictive allele information in summary and reference data if sARTP
is used. Code: CONF_ALLELE_INFO
.
6. Remove SNPs with missing RAF or EAF if sARTP
and options$ambig.by.AF
are used. Code: NO_VALID_EAF_RAF
.
7. Remove SNPs with high missing rate. Code: SNP_MISS_RATE
.
8. Remove SNPs with low MAF. Code: SNP_LOW_MAF
.
9. Remove constant SNPs. Code: SNP_CONST
.
10. Remove SNPs fail to pass HWE test. Code: SNP_HWE
.
11. Remove highly correlated SNPs within each gene. Code: GENE_R2
or HUGE_GENE_R2
.
12. Remove highly correlated SNPs within each chromosome. Code: CHR_R2
, HUGE_CHR
, HUGE_CHR2
or HUGE_CHR3
.
13. Remove genes with high missing rate. Code: GENE_MISS_RATE
.
14. Remove genes which are subsets of other genes. Code: GENE_SUBSET
.
Example truncation points defined by inspect.snp.n
and inspect.snp.percent
:
Assume the number of SNPs in a gene is 100. Below are examples of the truncation points for different values of inspect.snp.n
and inspect.snp.percent
. Similar values are applied to inspect.gene.n
and inspect.gene.percent
.
inspect.snp.n | inspect.snp.percent | truncation points |
1 | 0 | 1 |
1 | 0.05 | 5 |
1 | 0.25 | 25 |
1 | 1 | 100 |
2 | 0 | 1, 2 |
2 | 0.05 | 5, 10 |
2 | 0.25 | 25, 50 |
2 | 1 | 100 |
SNPs with ambiguous alleles: A SNP with alleles A and T (or C and G) is ambiguous because the strand cannot be determined. Without strand information, it is sometimes better to match SNPs with ambiguous alleles by allele frequency instead of by matching the alleles. By default, this package matches all SNPs by alleles. If matching by allele frequency for the SNPs with ambiguous alleles is desired, then summary files must contain a variable called "RAF" (reference allele frequency) or a variable "EAF" (effect allele frequency).
# NOT RUN {
options <- options.default()
str(options)
names(options)
# }
Run the code above in your browser using DataLab