Learn R Programming

CGEN (version 3.8.0)

pheno.list: List to describe the covariate and outcome data

Description

The list to describe the covariate and outcome data for GxE.scan.

Arguments

Format

The format is: List of 14
file
Covariate data file. This file must have variable names, two of which being an id variable and a response variable (see id.var and response.var). No default.
id.var
Name of the id variable(s). No default.
response.var
Name of the binary response variable. This variable must be coded as 0 and 1. No default.
strata.var
Stratification variable name or a formula for variables in file. See the individual model documentation for the allowable stratifications. The default is NULL so that all observations belong to the same strata.
main.vars
Character vector of variables names or a formula for variables in file that will be included in the model as main effects. The default is NULL.
int.vars
Character vector of variable names or a formula for variables in file that will be included in the model as interactions with each SNP in the genotype data. The default is NULL.
file.type
1, 3, 4. 1 is for an R object file created with the save() function. 3 is for a table that will be read in with read.table(). 4 is for a SAS data set. The default is 3.
delimiter
The delimiter in file. The default is "".
factor.vars
Vector of variable names to convert into factors. The default is NULL.
in.miss
Vector of character strings to define the missing values. This option corresponds to the option na.strings in read.table(). The default is "NA".
subsetData
List of sublists to subset the phenotype data for analyses. Each sublist should contain the names "var", "operator" and "value" corresponding to a variable name, operator and values of the variable. Multiple sublists are logically connected by the AND operator. For example, subsetData=list(list(var="GENDER", operator="==", value="MALE")) will only include subjects with the string "MALE" for the GENDER variable. subsetData=list(list(var="AGE", operator=">", value=50), list(var="STUDY", operator="%in%", value=c("A", "B", "C"))) will include subjects with AGE > 50 AND in STUDY A, B or C. The default is NULL.
cc.var
Name of the cc.var variable used in snp.matched. The default is NULL.
nn.var
Name of the nn.var variable used in snp.matched. The default is NULL.

Details

In this list, file, id.var, and response.var must be specified. The variable id.var is the link between the covariate data and the genotype data. For each subject id, there must be the same subject id in the genotype data for that subject to be included in tha analysis. If the genotype data is in a PLINK format, then id.var must be of length 2 corresponding the the family id and subject id.

Missing data: If any of the variables defined in main.vars, int.vars, strata.var, or response.var contain missing values, then those subjects will be removed from the covariate and outcome data. After the subjects with missing values are removed, the subject ids are matched with the genotype data.