Learn R Programming

optiSel (version 2.0.7)

segIBDandN: Calculates Probabilities that Alleles belong to a Shared Native Segment

Description

Calculates the segment based probability of alleles to be IBD (identical by descent) and Native: For each pair of individuals the probability is computed that two SNPs taken at random position from randomly chosen haplotypes belong to a shared segment and are native.

Usage

segIBDandN(files, Native, map, minSNP=20, unitP="Mb", minL=1.0, 
   unitL="Mb", a=0.0, keep=NULL, skip=NA, cskip=NA, cores=1)

Value

NxN matrix with N being the number of individuals from this breed included in all files (and in parameter keep).

Arguments

files

Vector with names of the phased marker files, one file for each chromosome. The required format is described under Details. File names must contain the chromosome name as specified in the map in the form "ChrNAME.", e.g. "Breed2.Chr1.phased".

Native

This parameter is either

(1) Mx(2N) indicator matrix, with 1, if the segment containing the SNP is considered native, and 0 otherwise. The row names are the marker names, and the non-unique column names are the IDs of the individuals. The matrix is typically computed from the output of function haplofreq.

or

(2) Vector with file names. The files contain for every SNP and for each haplotype from this breed 1 if the segment containing the SNP is considered native. These files are typically created by function haplofreq. There is one file per chromosome and file names must contain the chromosome name as specified in the map in the form "ChrNAME.", e.g. "Breed2.Chr1.nat".

map

Data frame providing the marker map with columns including marker name 'Name', chromosome number 'Chr', and possibly the position on the chromosome in mega base pairs 'Mb', and the position in centimorgan 'cM'. The markers must be in the same order as in files and in Native.

minSNP

Minimum number of marker SNPs included in a segment.

unitP

The unit for measuring the proportion of the genome included in shared segments. Possible units are the number of marker SNPs included in shared segments ('SNP'), the number of Mega base pairs ('Mb'), and the total length of the shared segments in centiMorgan ('cM'). In the last two cases the map must include columns with the respective names.

minL

Minimum length of a segment in unitL (e.g. in cM or Mb).

unitL

The unit for measuring the length of a segment. Possible units are the number of marker SNPs included in the segment ('SNP'), the number of Mega base pairs ('Mb'), and the genetic distances between the first and the last marker in centiMorgan ('cM'). In the last two cases the map must include columns with the respective names.

a

The Function providing the weighting factor for each segment is w(x)=x*x/(a+x*x). The parameter of the function is the length of the segment in unitL. The default value a=0.0 implies no weighting, whereas a>0.0 implies that old inbreeding has less influence on the result than new inbreeding.

keep

Vector with IDs of individuals (from this breed) for which the probabilities are to be computed. By default, they will be computed for all individuals included in Native.

skip

Take line skip+1 of the genotype files as the line with column names. By default, the number is determined automatically.

cskip

Take column cskip+1 of the genotype files as the first column with genotypes. By default, the number is determined automatically.

cores

Number of cores to be used for parallel processing of chromosomes. By default one core is used. For cores=NA the number of cores will be chosen automatically. Using more than one core increases execution time if the function is already fast.

Author

Robin Wellmann

Details

For each pair of individuals the probability is computed that two SNPs taken at random position from randomly chosen haplotypes belong to a shared segment and are native. That is, they are not introgressed from other breeds.

Genotype file format: Each file containing phased genotypes has a header and no row names. Cells are separated by blank spaces. The number of rows is equal to the number of markers from the respective chromosome and the markers are in the same order as in the map. The first cskip columns are ignored. The remaining columns contain genotypes of individuals written as two alleles separated by a character, e.g. A/B, 0/1, A|B, A B, or 0 1. The same two symbols must be used for all markers. Column names are the IDs of the individuals. If the blank space is used as separator then the ID of each individual should repeated in the header to get a regular delimited file. The columns to be skipped and the individual IDs must have no white spaces. The name of each file must contain the chromosome name as specified in the map in the form "ChrNAME.", e.g. "Breed2.Chr1.phased".

Examples

Run this code
data(map)
data(Cattle)
dir    <- system.file("extdata", package = "optiSel")
GTfile <- file.path(dir, paste("Chr", unique(map$Chr), ".phased", sep=""))
Freq   <- haplofreq(GTfile, Cattle, map, thisBreed="Angler", refBreeds="others", minSNP=20)$freq

fIBDN  <- segIBDandN(GTfile, Freq<0.01, map=map, minSNP=20)
mean(fIBDN)
#[1] 0.01032261

if (FALSE) {
fIBDN  <- segIBDandN(GTfile, Freq<0.01, map=map, minSNP=20, cores=NA)
mean(fIBDN)
#[1] 0.01032261
}

## using files:
if (FALSE) {
wdir   <- file.path(tempdir(),"HaplotypeEval")
chr    <- unique(map$Chr)
GTfile <- file.path( dir, paste("Chr", chr, ".phased",     sep=""))
file   <- haplofreq(GTfile, Cattle, map, thisBreed="Angler", minSNP=20, ubFreq=0.01, w.dir=wdir)

fIBDN  <- segIBDandN(GTfile, file$match, map=map, minSNP=20)
mean(fIBDN)
#[1] 0.01032261

fIBDN  <- segIBDandN(GTfile, file$match, map=map, minSNP=20, cores=NA)
mean(fIBDN)
#[1] 0.01032261


#unlink(wdir, recursive = TRUE)
}

Run the code above in your browser using DataLab