Learn R Programming

HapEstXXR (version 0.1-8)

msr: Multi-locus stepwise regression

Description

Stepwise regression for snp selection and haplotype testing

Usage

msr(snps, trait, famid, patid, fid, mid, adj.var = NA, lim = 0.05, maxSNP = 3, nt = 10, sort.by = "AICc", selection = 0, p.threshold = NA, pair.begin = FALSE, pattern.begin.mat = NA, type = "gaussian", baseline.hap = "max", min.count = 10, sort = FALSE)

Arguments

snps
(n, m)-Matrix; n=No. of individuals; m=no. of SNPs; Rohde-Code
trait
numeric; Outcome, phenotype
famid
vector; Identifier for every family; only need in case of type=families
patid
vector; Identifier for every individuals; only need in case of type=families
fid
vector; Identifier for father (0=unkown); only need in case of type=families
mid
vector; Identifier for mother (0=unkown); only need in case of type=families
adj.var
(n, m)-Matrix; n = No. of individuals; m = no. of covariates; variables for adjustment
lim
numeric; threshold for skipping haplotypes from analysis
maxSNP
integer; Number of SNPs maximal group to multilocus genotypes
nt
integer; Number of notice best hits (for every step)
sort.by
the results in each step were sorted by "AIC", corrected ("AICc""), or p value ("p.value"). default = "AICc".
selection
0 = none, 1 = improve of the lowest corrected AIC (AICc) of the step before, 2 = improve of the lowest AIC of the step before, 3 = improve of p value, 4 = improve of best ten log10(p values), 5 = improve of the single AICc by adding one SNP to the noticed pattern
p.threshold
numeric vector; if global p value is lower than p.threshold[i], then the pattern will be stored for further processing. I indicates the number of SNPs. If your calculation should start with all pairwise SNPs, then p.threshold[1] will be not used but should be included.
pair.begin
If true then will be begin with first 2 SNP genotypes. Attention: k SNP lead to choose(k, 2) = k * (k - 1) / 2 possible pairs
pattern.begin.mat
if begin.pattern.mat is not NA then is this starting point of msr n = No. of snp pattern, m = No. of SNPs
type
type of depending variable
baseline.hap
Choose baseline haplotype for statistical test to avoid singularity. "max" for most frequent haplotype and "min" for less frequent haplotype
min.count
minimal count of rare haplotypes. If the count of estimated haplotypes < min.count, then the combined rare haplotypes were excluded from the analysis of that specific pattern.
sort
A logical value (TRUE or FALSE). If TRUE, family data will be sorted.

Value

msr provides a list with maxSNP components.
list
for every step one component: SNP numbers and test details.

Details

Haplotypes are infered by EM algorithm (Excoffier and Slatkin 1995). Family haplotypes are inferred by modified EM algorithm proposed by Rohde (2001, 2003).

For normal distributed phenotypes from independent individuals we prefer an F test and for case control data we prefer the likelihood ratio test (logistic regression) in comparison of full model with genetic and non-genetic factors to a reduced model, which includes only non-genetic variables. In the case of no specified non-genetic variable only the intercept is used. If one of these tests are significance we assume a genetic effect. In case of family data the weigthed TDT statistic is used.

The procedure of multi-locus stepwise regression could be time consuming.

References

Excoffier L, Slatkin M. Mol Biol Evol. 1995 Sep;12(5):921-7.

Rohde K, Fuerst R. Hum Hered. 2003;56(1-3):41-7.

Rohde K, Fuerst R. Hum Mutat. 2001 Apr;17(4):289-95.

Knueppel S, Esparza-Gordillo J, Marenholz I, Holzhuetter HG,

Bauerfeind A, Ruether A, Weidinger S, Lee Y-A, Rohde K.

Multi-locus stepwise regression: a haplotype-based algorithm

for finding genetic associations applied to atopic dermatitis.

BMC Med Genet 2012;13(1):8.