optim()
in the R base distribution. The residual variance-covariance matrix is block-diagonal sparse, constructed with bdsmatrix()
from the bdsmatrix
package.
fgls(fixed, data=parent.frame(), tlist=tlist, sizelist=sizelist, med=c("UN","VC"), vmat=NULL, start=NULL, theta=NULL, drop=NULL, get.hessian=FALSE, optim.method="BFGS", control=list(), weights=NULL, sizeLab=NULL,Mz=NULL,Bo=NULL,Ad=NULL,Mix=NULL,indobs=NULL)
as.data.frame()
to a data frame) containing the variables in the model, as specified in argument fixed. If not found in data the variables are taken from environment(formula)
, typically the environment from which fgls()
is called. If it contains a column named "ID"
, then that column's values will be the row and column names of output element sigma
(see below, under "Value").
gls.batch()
and gls.batch.get()
functions.
gls.batch()
and gls.batch.get()
functions.
"UN"
or "VC"
, which are the two RFGLS methods described by Li et al. (2011). If "UN"
(default), which stands for "unstructured," the residual covariance matrix will be constructed from, at most, 12 parameters (8 correlations and 4 variances). If "VC"
, which stands for "variance components," the residual covariance matrix will be constructed from, at most, 3 variance components (additive-genetic, shared-environmental, and unshared-environmental).
NULL
(default), the matrix will either be jointly estimated with the fixed-effects coefficients (FGLS), or will be constructed values supplied to theta. If not NULL
, must be either (1) an object of class 'bdsmatrix' (from the bdsmatrix
package), or (2) a character string specifying the filename and path for a single-column text file, with header, containing the "blocks" of a bdsmatrix
. Note that at least one of vmat and theta must be NULL
.
NULL
(default), generic start values are used. Otherwise, it must be a numerical vector of either length 12 if med="UN"
, or of length 3 if med="VC"
. Each vector element provides the initial value for the parameter corresponding to its index (serial position). Values of NA
are accepted, and will be replaced with the generic start value for that parameter. See below under "Details" for which parameters correspond to which indices. Users should bear in mind that especially poor start values can cause optimization to fail. Ignored if no residual-covariance parameters are estimated.
NULL
, in which case it is ignored. Otherwise, it must be a numerical vector of of either length 12 if med="UN"
, or of length 3 if med="VC"
. Each vector element provides the value for the parameter corresponding to its index (serial position). Values of NA
are accepted for extraneous parameters. See below under "Details" for which parameters correspond to which indices. The residual covariance matrix is constructed from the elements of theta, exactly as-is. Note that at least one of vmat and theta must be NULL
.
fgls()
automatically identifies which parameters are completely unidentified from the data (i.e., zero observations in the data are informative about them), and drops them as well. If drop is NULL
(default), no user-specified parameters are dropped. Otherwise, it must be a vector of integers, either between 1 and 12 (inclusive) if med="UN"
, or between 1 and 2 (inclusive) if med="VC"
. Note that if a user-specified-dropped parameter ends up being needed to construct the residual covariance matrix, its value is taken to be that of its OLS equivalent: zero for correlations (med="UN"
) and for the familial variance components (med="VC"
), and the OLS residual variance for variances (med="UN"
). It may be prudent to drop parameters when very few observations in the data are informative about them, which can at least save computation time. See below under "Details" for which parameters correspond to which indices. Ignored if no residual-covariance parameters are estimated.
FALSE
. If TRUE
, fgls()
will include the Hessian matrix from optim()
in its output list. Otherwise, the entry 'hessian' in the list will be NULL
. Ignored if no residual-covariance parameters are estimated.
optim()
. Ignored if no residual-covariance parameters are estimated. The default, "BFGS"
, is usually fast and is recommended for general use. If method "L-BFGS-B"
is used, fgls()
will supply optim()
with reasonable box constraints on the parameters, intended for use with optim()
's default control parameters (see argument control below).Method "BFGS"
(the default) may fail when any of the residual-covariance parameters are poorly identified from the data. In these cases, it may be wise simply to drop the offending parameters. Other optimization methods, including "L-BFGS-B"
, can succeed where "BFGS"
fails. Method "SANN"
should not generally be relied upon to find the global optimum, but it can sometimes produce reasonable, approximate solutions in instances where no other method works. As a last-resort diagnostic, one can combine optim.method="SANN"
with hessian=TRUE
, since the resulting Hessian matrix may provide clues as to which parameters are causing problems.
NULL
.
NULL
; otherwise, must be a character string. If the number of characters in the string is not equal to the size of the largest family in the data, fgls()
will produce a warning.
summary.lm()
. Each fixed-effect term (including the intercept) has one row of the table,
which are ordered as the terms appear in argument fixed. Each row contains a point estimate, an estimated standard error, a t-statistic, and a two-tailed p-value.
NA
. If no residual-covariance parameters are estimated, will instead be a single NA
.
NULL
if no parameters were dropped or estimated.
NULL
if no residual-covariance parameters were estimated. Otherwise, a single-row data frame, containing miscellaneous output pertaining to the optimization, specifically, the following named columns:
iterations
(integer): the number of function iterations, as returned from optim()
.
convergence
(integer): convergence code, as returned from optim()
; value 0 means that convergence was successful.
message
(character): additional information from the optimizer; a single whitespace means that optim()
returned a message of NULL
.
first_try
(logical): Did fgls()
's first attempt at optimization succeed? If FALSE
, then during the second attempt, fgls()
had to force the block matrices of the residual covariance matrix to be positive-definite, as described above, under "Details."
data
, otherwise its row and column names are sequential numbers. One of its slots, sigma@blocks
, can be written to a single-column text file and subsequently read in by gls.batch()
. Due to its potential size, it is not advised to return sigma
to R's standard output or print it to the console.
get.hessian=TRUE
and residual-covariance parameters were estimated, the Hessian matrix from optim()
for those parameters; NULL
otherwise.
NA
's).
lm()
. Note that it only reflects the number of regression coefficients, and not the number of residual-covariance parameters that were estimated.
n
, i.e. it is not padded with NA
's for participants with missing data.
n
, i.e. it is not padded with NA
's for participants with missing data.
fgls()
function call.
fgls()
also prints to console the estimates of non-dropped residual-covariance parameters (if any).
fgls()
was originally intended to be called automatically, from within gls.batch()
. However, calling it directly is likely to be useful to advanced users. The difficulty when directly invoking fgls()
is supplying the function with arguments tlist and sizelist. But, these can be obtained easily via gls.batch.get()
.When residual-covariance parameters are to be estimated, fgls()
will attempt optimization, at most, two times. If the initial attempt fails, fgls()
prints a message saying so to the console, and tries a second time. On the second attempt, before each evaluation of the objective function, the blocks composing the block-diagonal residual covariance matrix are forced to be positive definite. This uses nearPD()
from the Matrix package, which turns each block matrix into its nearest positive-definite approximation (where "nearest" is meant in a least-squares sense). Forcing positive-definiteness in this way is only used for the second attempt, and not for the initial attempt (which has its own way of ensuring a positive-definite solution), since it slows down optimization and is unnecessary when the parameters are well-identified. Furthermore, it can have consequences the user might not expect. For instance, in fgls()
's output (see below, under "Value"), the elements of the residual covariance matrix sigma
might not correspond to the parameter estimates in estimates
, or covariances that are supposed to be the same across families might not be so in the actual matrix sigma
. Nevertheless, the second attempt may succeed when the initial attempt fails.
When med="UN"
, the residual covariance matrix is constructed from, at most, 12 parameters--8 correlations and 4 variances. Below is an enumerated list of those 12 parameters, in which the number of each list entry is the index (serial position) of that parameter, and the quoted text is the element name of each estimated parameter as it appears in fgls()
output:
When med="VC"
, the residual covariance matrix is constructed from, at most, 3 variance components. Below is an enumerated list of those 3 parameters, in which the number of each list entry is the index (serial position) of that parameter, and the quoted text is the label of each estimated parameter as it appears in fgls()
output:
Additive-genetic variance contributes to covariance between family members commensurately to the expected proportion of segregating alleles they share: 1.0 for MZ twins, 0.5 for first-degree relatives, 0 for spouses and adoptive relatives. Shared-environmental variance, as defined here, represents covariance between biologically unrelated family members (including spouses).
In package version 1.0, arguments subset and na.action were accepted, and passed to lm()
. Neither are accepted any longer. Subsetting should be done before directly calling fgls()
; the function handles NA
's in the data by what is (in effect) na.action=na.omit
.
Buse, A: Goodness of Fit in Generalized Least Squares Estimation The American Statistician 1973;27:106-108
gls.batch
data(pheno)
data(geno)
data(map)
data(pedigree)
data(rescovmtx)
foo <- gls.batch.get(
phenfile=pheno,genfile=data.frame(t(geno)),pedifile=pedigree,
covmtxfile.in=NULL,theta=NULL,snp.names=map[,2],input.mode=c(1,2,3),
pediheader=FALSE,pedicolname=c("FAMID","ID","PID","MID","SEX"),
sep.phe=" ",sep.gen=" ",sep.ped=" ",
phen="Zscore",covars="IsFemale",med=c("UN","VC"),
outfile,col.names=TRUE,return.value=FALSE,
covmtxfile.out=NULL,
covmtxparams.out=NULL,
sizeLab=NULL,Mz=NULL,Bo=NULL,Ad=NULL,Mix=NULL,indobs=NULL)
bar <- fgls(
Zscore ~ rs3934834 + IsFemale, data=foo$test.dat, tlist=foo$tlist,
sizelist=foo$sizelist,med=c("UN","VC"),
vmat=rescovmtx, #<--Resid. cov. matrix from fgls onto IsFemale only.
start=NULL, theta=NULL, drop=NULL, get.hessian=FALSE,
optim.method="BFGS", control=list(), weights=NULL,
sizeLab=NULL,Mz=NULL,Bo=NULL,Ad=NULL,Mix=NULL,indobs=NULL)
bar$ctable
## To simultaneously estimate residual covariance matrix
## and regression coefficients for rs3934834 & IsFemale,
## use the same syntax, except with vmat = NULL .
Run the code above in your browser using DataLab