This function performs a Multifactor Dimensionality Reduction (MDR).
mdr(X, status, fold=2, t=NULL, top=3, NAvalues=NA, cvc=0,
fix=NULL, verbose=FALSE)
Matrix with genotype information, see details.
Vector with group information of individuals in X
.
Maximum dimension of used contingency tables, see details.
Threshold for high/low risk.
Length of each top list.
Label of missing data.
Number of cross-validation splits.
Column number of the SNP to be fixed.
Logical, if detailed status messages shall be given.
An object of class mdr
.
The matrix X
contains the genotype information, each column corresponds to a SNP, each row to an individuum. SNPs need to be coded as 0,1,2.
In case the matrix X
is not given in 0,1,2 format the function recodeData
recodes the data into the required 0,1,2 format.
The status
vector is as long as X
has individuals/rows and specifies the group labels for each individual. Healthy individual need to be
encoded as 0 and cases as 1. If the labeling is different the smaller values are used as controls and the larger one as cases.
The fold
option specifies up to which dimension the contingency tables should be used. The current maximum is four.
The t
option gives the threshold for the classification between high and low risk classes. The default
is the ratio of the groups sizes.
With the top
option the amount of results are set.
Setting the fix
option to a column number, forces the mdr function to include that particular SNP into the result.
Missing data is labeled in different ways. The definition of missing data is given to the NAvalues
option. By default missing data is encoded as NA, another
possible option is e.g. 3. The downstream analysis ignors then missing data. Missing data is interally coded as 9999, so do not use this value to encode other genotypes.
The number of cross-validation splits can be set with the cvc
option. If cvc
is larger than 0, then the origianl data is split into cvc
-many equally large
subsets and the mdr function is called for each of them. Then, for each results from the top results of the full data is checked, in how many splits they also appear in the
top result list.
Moore JH, Gilbert JC, Tsai CT, Chiang FT, Holden T, Barney N, White BC. (2006): A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. J Theor Biol.2006 Jul 21;241(2):252-61.
# NOT RUN {
data(mdrExample)
mdrSNP <- mdrExample[,1:20]
fit.mdr <- mdr(mdrSNP, mdrExample$Class, fold=4, top=5)
fit.mdr
fit.mdr <- mdr(mdrSNP, mdrExample$Class)
fit.mdr
# }
Run the code above in your browser using DataLab