Learn R Programming

scrime (version 1.3.5)

snp2bin: Transformation of SNPs to Binary Variables

Description

Transforms SNPs to binary variables.

Usage

snp2bin(mat, domrec = TRUE, refAA = FALSE, snp.in.col = TRUE, 
   monomorph = 0)

Arguments

mat

a matrix or data frame in which the genotypes of all SNPs are coded either by 0, 1 and 2, or by 1, 2 and 3, or by "AA", "AB" and "BB". Missing values are allowed. In the latter coding not only NA, but also "NN" is allowed for specifying missing values. Using the former two codings it is assumed that the smallest value codes the homozygous reference genotype, the second value the heterozygous genotype, and the largest value the homozygous variant genotype. For the third coding, see refAA.

domrec

should each SNP be coded by two dummy variables from which one codes for a recessive, and the other for a dominant effect? If TRUE, then the first binary variable is set to 1 if the SNP is of the heterozygous or the homozygous variant genotype, and the second dummy variable is set to 1 if the SNP is of the homozygous variant genotype. If FALSE, three dummy variables are used and each of the three genotypes of a SNP is coded by one of these binary variables.

refAA

codes "AA" always for the homozygous reference gentoype? Only considered if the SNPs are coded by "AA", "AB" and "BB". If FALSE, it is evaluated SNPwise whether "AA" or "BB" occurs more often, and the more frequently occuring value is assumed to be the homozygous reference genotype.

snp.in.col

does each column of mat correspond to a SNP (and each row to an observation)? If FALSE, it is assumed that each row represents a SNP, and each column an observation.

monomorph

a non-negative number. If a dummy variable contains monomorph or less values that differ from the more frequent value of this variable, then the variable is removed from the data set.

Value

A matrix containing the binary dummy variables.

See Also

recodeSNPs, recodeAffySNP

Examples

Run this code
# NOT RUN {
# Generate an example data set consisting of 10 rows (observations)
# and 5 columns (SNPs).

mat <- matrix(sample(3, 50, TRUE), 10)
colnames(mat) <- paste("SNP", 1:5, sep = "")

# Transform each SNP into two dummy variables, one that codes for
# a recessive effect and one that codes for a dominant effect.

snp2bin(mat)

# Transform each SNP into three dummy variables, where each of
# these variables codes for one of the three genotypes.

snp2bin(mat, domrec = FALSE)
# }

Run the code above in your browser using DataLab