gls.batch.get: Data restructuring for `fgls()`.

Description

Carries out the data restructuring performed by gls.batch(), before it estimates the residual covariance matrix. Useful if calling fgls() directly.

Usage

gls.batch.get(phenfile,genfile,pedifile,outfile,covmtxfile.in=NULL,
  covmtxfile.out=paste(phen,"_cov_matrix.txt",sep=""),phen,covars=NULL,
  med="rfgls",sizeLab="OOPP",Mz=TRUE,Bo=TRUE,Ad=TRUE,Mix=TRUE,
  indobs=TRUE,col.names=TRUE,pediheader=FALSE,
  pedicolname=c("FAMID","ID","PID","MID","SEX"),
  sep.phe=" ",sep.gen=" ",sep.ped=" ")

Arguments

phenfile

This can be either (1) a character string specifying a phenotype file on disk which includes the phenotypes and other covariates, or (2) a data frame object containing the same data. In either case, the data must be appropriately structured. See below

genfile

This can be either (1) a character string specifying a genotype file of genotype scores (such as 0,1,2, for the additive genetic model) to be read from disk, or (2) a data frame object containing them. In such a file, each row must represent a SNP, each

pedifile

This can be either (1) a character string specifying the pedigree file corresponding to , to be read from disk, or (2) a data frame object containing this pedigree information. At minimum, must have a col

phen

A character string specifying the phenotype (column name) in the phenotype file to be analyzed.

covars

A character string or character vector that holds the (column) names of the covariates, in the phenotype file, to be used in the regression model.

pediheader

A logical indicator specifying whether the pedigree file to be read from disk has a header row, to ensure it is read in correctly. Even if TRUE, gls.batch() assigns the values in to the columns

pedicolname

A vector of character strings giving the column names that gls.batch() will assign to the columns of the pedigree file. The default, c("FAMID","ID","PID","MID","SEX"), is the familiar "pedigree table" format. The two crite

sep.phe

Separator character of the phenotype file to be read from disk. Defaults to a single space.

sep.gen

Separator character of the genotype file to be read from disk. Defaults to a single space.

sep.ped

Separator character of the pedigree file to be read from disk. Defaults to a single space.

covmtxfile.in, covmtxfile.out, med, outfile

These arguments are accepted but not used, in order for gls.batch.get() to parallel gls.batch() as closely as possible.

sizeLab, Mz, Bo, Ad, Mix, indobs, col.names

These arguments are likewise accepted but not used.

Value

A list with these three components:
test.datThe merged data frame of pedigree information, phenotypes, covariates, and genotypes.
tlistA vector of family labels, with length equal to the number of families in the data (each "independent observation" is treated as a separate family). The names of its components are the family IDs.
sizelistA vector of family sizes, with length equal to the number of families in the data (each "independent observation" is treated as a separate family). The names of its components are the family IDs.

Details

Though originally used for debugging purposes, gls.batch.get() was included because it facilitates directly invoking fgls() when the need arises. This function first reads in the files and merges the files into a data frame with columns of pedigree information, phenotypes, covariates, and genotypes. It then creates a vector and a vector, which comprise the family labels and family sizes in the data. It returns a list containing the merged data frame, and the and vectors. The phenotype file must conform to the following guidelines:

It must have the following four named columns:'FAMID'(family ID),'ID'(uniqueindividual ID),'FTYPE'(family type), and'INDIV'(individual code). The value ofFTYPEandFAMIDwill be the same for all members of a given family. There are six recognized family types:FTYPE=1for MZ-twin,FTYPE=2for DZ-twin,FTYPE=3for adoptive-offspring,FTYPE=4for non-twin bio-offspring,FTYPE=5for "mixed" families with one bio and one adopted offspring, andFTYPE=6for "independent observations" who do not fit into a four-person nuclear family. The individual codeINDIVrepresents how the subject fits into his/her family:INDIV=1is for "Offspring #1,"INDIV=2is for "Offspring #2,"INDIV=3is for the mother, andINDIV=4is for the father. The distinction between "Offspring #1" and "#2" is mostly arbitrary, except that in "mixed" families, the biological offspring MUST haveINDIV=1, and the adopted offspring,INDIV=2.
Within each family, members must be ordered byINDIV, as: offspring, mother, father. For mixed family type, members must be ordered as: bio-offspring, adopted-offspring, mother, father.
The phenotype file has rows as subjects and columns as variables, whereas the genotype file provided tomust have rows as SNPs and columns as subjects.

Examples

Run this code

data(pheno)
data(geno)
data(pedigree)
foo <- gls.batch.get(
  phenfile=pheno,
  genfile=data.frame(t(geno)),
  pedifile=pedigree, 
  outfile="example_output.txt", 
  covmtxfile.in=NULL,covmtxfile.out=paste(phen,"_cov_matrix.txt",sep=""),
  phen="Zscore", covars = "IsFemale",
  med = "rfgls", sizeLab = "OOPP", Mz = TRUE, Bo = TRUE, Ad = TRUE, Mix = TRUE,
  indobs = TRUE, col.names = TRUE, pediheader = FALSE,
  pedicolname=c("FAMID","ID","PID","MID","SEX"),
  sep.phe = "", sep.gen = "", sep.ped = "")
olsmod <- lm(   ##<--OLS regression could be applied to the merged dataset...
    Zscore ~ snp.1 + IsFemale, data=foo$test.dat)
summary(olsmod)  #<--...but the standard errors and t-statistics will not be valid.