Learn R Programming

GWASTools (version 1.18.0)

setMissingGenotypes: Write a new netCDF or GDS file, setting certain SNPs to missing

Description

setMissingGenotypes copies an existing GDS or netCDF genotype file to a new one, setting SNPs in specified regions to missing.

Usage

setMissingGenotypes(parent.file, new.file, regions, file.type=c("gds", "ncdf"), sample.include=NULL, compress="ZIP_RA", copy.attributes=TRUE, verbose=TRUE)

Arguments

parent.file
Name of the parent file
new.file
Name of the new file
regions
Data.frame of chromosome regions with columns "scanID", "chromosome", "left.base", "right.base", "whole.chrom".
file.type
The type of parent.file and new.file ("gds" or "ncdf")
sample.include
Vector of sampleIDs to include in new.file
compress
The compression level for variables in a GDS file (see add.gdsn for options).
copy.attributes
Logical value specifying whether to copy chromosome attributes to the new file.
verbose
Logical value specifying whether to show progress information.

Details

setMissingGenotypes removes chromosome regions by setting SNPs that fall within the anomaly regions to NA (i.e., the missing value in the netCDF/GDS file). Optionally, entire samples may be excluded from the netCDF/GDS file as well: if the sample.include argument is given, only the scanIDs in this vector will be written to the new file, so the sample dimension will be length(sample.include).

For regions with whole.chrom=TRUE, the entire chromosome will be set to NA for that sample. For other regions, only the region between left.base and right.base will be set to NA.

See Also

gdsSubset, anomSegStats for chromosome anomaly regions

Examples

Run this code
gdsfile <- system.file("extdata", "illumina_geno.gds", package="GWASdata")
gds <- GdsGenotypeReader(gdsfile)
sample.sel <- getScanID(gds, index=1:10)
close(gds)

regions <- data.frame("scanID"=sample.sel[1:3], "chromosome"=c(21,22,23),
  "left.base"=c(14000000, 30000000, NA), "right.base"=c(28000000, 450000000, NA),
  whole.chrom=c(FALSE, FALSE, TRUE))

newgds <- tempfile()
setMissingGenotypes(gdsfile, newgds, regions, file.type="gds", sample.include=sample.sel)
file.remove(newgds)

Run the code above in your browser using DataLab