gls.batch: Generalized least-squares batch analysis.

Description

Fits a generalized least-squares regression model to test association between a quantitative phenotype and all SNPs in a genotype file, one at a time, via Rapid Feasible Generalized Least Squares. For each SNP, genotype is treated as a fixed effect, and the residual variance-covariance matrix is also estimated. In each trait-SNP association test, the fgls() function is used for parameter estimation.

Usage

gls.batch(phenfile,genfile,pedifile,outfile,covmtxfile.in=NULL,
  covmtxfile.out=paste(phen,"_cov_matrix.txt",sep=""),phen,covars=NULL,
  med="rfgls",  sizeLab="OOPP",Mz=TRUE,Bo=TRUE,Ad=TRUE,Mix=TRUE,
  indobs=TRUE,col.names=TRUE,pediheader=FALSE,
  pedicolname=c("FAMID","ID","PID","MID","SEX"),
  sep.phe=" ",sep.gen=" ",sep.ped=" ")

Arguments

phenfile

This can be either (1) a character string specifying a phenotype file on disk which includes the phenotypes and other covariates, or (2) a data frame object containing the same data. In either case, the data must be appropriately structured. See below

genfile

This can be either (1) a character string specifying a genotype file of genotype scores (such as 0,1,2, for the additive genetic model) to be read from disk, or (2) a data frame object containing them. In such a file, each row must represent a SNP, each

pedifile

This can be either (1) a character string specifying the pedigree file corresponding to , to be read from disk, or (2) a data frame object containing this pedigree information. At minimum, must have a col

outfile

A character string specifying the path and filename for the output file to be written. If a file with the same path and filename already exists, gls.batch() appends the output to that file, rather than overwriting it. Users are war

covmtxfile.in

Optional; can be either (1) a character string specifying a file on disk from which the residual variance-covariance matrix is to be read, or (2) the matrix itself. If NULL, then gls.batch() will estimate this matrix. The

covmtxfile.out

An optional character string specifying the filename and path to which the residual variance-covariance matrix, if it is to be calculated (i.e., covmtxfile.in=NULL), will be written. The default is a generic filename that refers to the pheno

phen

A character string specifying the phenotype (column name) in the phenotype file to be analyzed.

covars

A character string or character vector that holds the (column) names of the covariates, in the phenotype file, to be used in the regression model.

med

"Method." Presently, only "rfgls", the default, is implemented.

sizeLab

A character string indicating the maximum size of the families in the data. Must be one of the following strings:

"OOPP", if the largest family has two offspring and both parents;
"OPP", if the largest family ha

Logical (TRUE or FALSE). An indicator of whether Mz-twin families are in the data; must be set to FALSE if sizeLab="PP". Defaults to TRUE.

A logical indicator of whether bio-offspring (including DZ-twin) families are in the data; must be set to FALSE if sizeLab ="PP". Defaults to TRUE.

A logical indicator of whether adopted-offspring families are in the data; must be set to FALSE if sizeLab ="PP". Defaults to TRUE.

Mix

A logical indicator of whether "mixed" families, with 1 biological and 1 adopted offspring, are in the data; must be set to FALSE if sizeLab ="PP". Defaults to TRUE.

indobs

A logical indicator of whether there are "independent observations" who do not fit into a four-person nuclear family present in the data. If TRUE, a separate residual variance parameter will be estimated for those individuals.

col.names

A logical indicator specifying whether to write column names in the output file. Defaults to TRUE.

pediheader

A logical indicator specifying whether the pedigree file to be read from disk has a header row, to ensure it is read in correctly. Even if TRUE, gls.batch() assigns the values in to the columns

pedicolname

A vector of character strings giving the column names that gls.batch() will assign to the columns of the pedigree file. The default, c("FAMID","ID","PID","MID","SEX"), is the familiar "pedigree table" format. The two crit

sep.phe

Separator character of the phenotype file to be read from disk. Defaults to a single space.

sep.gen

Separator character of the genotype file to be read from disk. Defaults to a single space.

sep.ped

Separator character of the pedigree file. Defaults to a single space.

Value

gls.batch() writes an output file with the following columns: "phen","snp","beta","se","t-stat","df","model","pval","method". However, the actual value returned by the function is simply NULL.

Details

Reference is frequently made throughout this documentation to the "phenotype file," the "genotype file," and so forth, because gls.batch() was intended to be used with potentially large datafiles to be read from disk. This should be evident from the presence of the word "file" in the names of many of this function's arguments, and the fact that all of those arguments may be character strings providing a filename and path. However, it can also accept the data if the file has already been loaded into R's workspace as a data frame object, in which case "the [whatever] file" should be taken to refer to such a data frame. For details specific to each argument, see above. The function gls.batch() first reads in the files and merges them into a data frame with columns of pedigree information, phenotypes, covariates, and genotypes. Then, it creates a vector and a vector, which comprise the family labels and family sizes in the data. Finally, it carries out single-SNP association analyses for all the SNPs in the genotype file. The phenotype file must conform to the following guidelines:

It must have the following four named columns:'FAMID'(family ID),'ID'(uniqueindividual ID),'FTYPE'(family type), and'INDIV'(individual code). The value of"FTYPE"and"FAMID"will be the same for all members of a given family. There are six recognized family types:FTYPE=1for MZ-twin,FTYPE=2for DZ-twin,FTYPE=3for adoptive-offspring,FTYPE=4for non-twin bio-offspring,FTYPE=5for "mixed" families with one bio and one adopted offspring, andFTYPE=6for "independent observations" who do not fit into a four-person nuclear family. The individual code"INDIV"represents how the subject fits into his/her family:INDIV=1is for "Offspring #1,"INDIV=2is for "Offspring 2,"INDIV=3is for the mother, andINDIV=4is for the father. Note that subjects withFTYPE=6MUST haveINDIV=1. The distinction between "Offspring #1" and "#2" is mostly arbitrary, except that in "mixed" families, the biological offspring MUST haveINDIV=1, and the adopted offspring,INDIV=2.
Within each family, members must be ordered byINDIV, as: offspring, mother, father. For mixed family type, members must be ordered as: bio-offspring, adopted-offspring, mother, father. For purposes of ordering the phenotype file, subjects with the same family ID but different values forFTYPEare treated as being in different family units.
The phenotype file has rows as subjects and columns as variables, whereas the genotype file provided tomust have rows as SNPs and columns as subjects.

This function handles the following family structures (see ): "OOPP", 2 offspring and 2 parents; "OO", 2 offspring; "PP", 2 parents; "OP", 1 offspring and 1 parent; and "OPP", 1 offspring and two parents. For each family structure, it handles any combination of the following family types: Mz-twin family type ("Mz"), non-Mz-twin-bio-offspring family type ("Bo"), adopted-offspring family type ("Ad"), and bio/adopted-offspring ("Mix") family type. When one is conducting parallel analyses on a computing array, judicious use of arguments and can save time. For example, suppose one is analyzing different SNP sets in parallel but using a common phenotype file for all. In this case, one should calculate the residual variance-covariance matrix ahead of time and write it to a file. Then, use the same filename and path for argument , for all jobs running in parallel. The matrix can be calculated by using gls.batch.get() and then fgls().

References

Li X, Basu S, Miller MB, Iacono WG, McGue M: A Rapid Generalized Least Squares Model for a Genome-Wide Quantitative Trait Association Analysis in Families. Hum Hered 2011;71:67-82 (DOI: 10.1159/000324839)

Examples

Run this code

setwd(tempdir()); getwd() #<--Temp directory to write to.
data(pheno)
data(geno)
data(pedigree)
data(resVCmtx)
gls.batch(
  phenfile=pheno,
  genfile=data.frame(t(geno)),
  pedifile=pedigree,
  outfile="example_output.txt",
  covmtxfile.in=resVCmtx, #<--Precomputed, to save time.
  covmtxfile.out=NULL,
  phen="Zscore",covars="IsFemale",med="rfgls",sizeLab="OOPP",
  Mz=TRUE,Bo=TRUE,Ad=TRUE,Mix=TRUE,indobs=TRUE,
  col.names=TRUE,pediheader=FALSE,pedicolname=c("FAMID","ID","PID","MID","SEX"),
  sep.phe="",sep.gen="",sep.ped="")

Run the code above in your browser using DataLab