umxACE: Build and run a 2-group Cholesky ACE twin model (univariate or multivariate)

Description

Implementing a core task in twin modeling, umxACE models the genetic and environmental structure of one or more phenotypes (measured variables) using the Cholesky ACE model (Neale and Cardon, 1996).

Classical twin modeling uses the genetic and environmental differences among pairs of mono-zygotic (MZ) and di-zygotic (DZ) twins reared together.

umxACE implements a 2-group model to capture these data and represent the phenotypic variance as a sum of Additive genetic, unique environmental (E) and, optionally, either common or shared-environment (C) or non-additive genetic effects (D).

The following figure shows the ACE model for one variable "x" as a path diagram:

Figure: ACE univariate.png

umxACE allows multivariate analyses, and this brings us to the Cholesky part of the model.

The Cholesky decomposition creates as many latent A (and C and E) latent variables as there are phenotypes, and, moving from left to right, decomposes the variance in each phenotype into successively restricted factors. The following figure shows the multivariate ACE model for three variables:

Figure: ACE matrix.png

In this ACE model of three phenotypes, the expected variance-covariance matrix of the original data is the product of each lower Cholesky and its transform (i.e., A = a %*% t(a) summed for A+C+E.

This lower-triangle decomposition feature of the Cholesky yields a model which is certain to both account for all the variance (with some restrictions) in the data and be solvable.

This figure also contains the key to understanding how to modify models that umxACE produces using umxModify() to drop paths by label like "a_r1c1". nb: Read the "Matrices and Labels in ACE model" section in details below...

NOTE: Scroll down to details for how to use the function, a figure and multiple examples.

Usage

umxACE(
  name = "ACE",
  selDVs,
  selCovs = NULL,
  dzData = NULL,
  mzData = NULL,
  sep = NULL,
  data = NULL,
  zyg = "zygosity",
  type = c("Auto", "FIML", "cov", "cor", "WLS", "DWLS", "ULS"),
  numObsDZ = NULL,
  numObsMZ = NULL,
  boundDiag = 0,
  allContinuousMethod = c("cumulants", "marginals"),
  autoRun = getOption("umx_auto_run"),
  intervals = FALSE,
  tryHard = c("no", "yes", "ordinal", "search"),
  optimizer = NULL,
  residualizeContinuousVars = FALSE,
  nSib = 2,
  dzAr = 0.5,
  dzCr = 1,
  weightVar = NULL,
  equateMeans = TRUE,
  addStd = TRUE,
  addCI = TRUE
)

Value

mxModel() of subclass mxModel.ACE

Arguments

name: The name of the model (defaults to"ACE").
selDVs: The variables to include from the data: preferably, just "dep" not c("dep_T1", "dep_T2").
selCovs: (optional) covariates to include from the data (do not include sep in names)
dzData: The DZ dataframe.
mzData: The MZ dataframe.
sep: The separator in twin variable names, often "_T", e.g. "dep_T1". Simplifies selDVs.
data: If provided, dzData and mzData are treated as levels of zyg to select() MZ and DZ data sets (default = NULL)
zyg: If data provided, this column is used to select rows by zygosity (Default = "zygosity")
type: Analysis method one of c("Auto", "FIML", "cov", "cor", "WLS", "DWLS", "ULS")
numObsDZ: Number of DZ twins: Set this if you input covariance data.
numObsMZ: Number of MZ twins: Set this if you input covariance data.
boundDiag: Numeric lbound for diagonal of the a, c, and e matrices. Defaults to 0 since umx version 1.8
allContinuousMethod: "cumulants" or "marginals". Used in all-continuous WLS data to determine if a means model needed.
autoRun: Whether to run the model (default), or just to create it and return without running.
intervals: Whether to run mxCI confidence intervals (default = FALSE)
tryHard: Default ('no') uses normal mxRun. "yes" uses mxTryHard. Other options: "ordinal", "search"
optimizer: Optionally set the optimizer (default NULL does nothing).
residualizeContinuousVars: Not yet implemented.
nSib: Number of siblings in a family (default - 2). "3" = extra sib.
dzAr: The DZ genetic correlation (defaults to .5, vary to examine assortative mating).
dzCr: The DZ "C" correlation (defaults to 1: set to .25 to make an ADE model).
weightVar: If provided, a vector objective will be used to weight the data. (default = NULL).
equateMeans: Whether to equate the means across twins (defaults to TRUE).
addStd: Whether to add the algebras to compute a std model (defaults to TRUE).
addCI: Whether to add intervals to compute CIs (defaults to TRUE).

Details

Covariates umxACE handles covariates by modelling them in the means. On the plus side, there is no distributional assumption for this method. A downside of this approach is that all covariates must be non-NA, thus dropping any rows where one or more covariates are missing. This can waste data. See also umx_residualize()).

Data Input The function flexibly accepts raw data, and also summary covariance data (in which case the user must also supple numbers of observations for the two input data sets).

The type parameter can select how you want the model data treated. "FIML" is the normal treatment. "cov" and "cor" will turn raw data into cor data for analysis, or check that you've provided cor data as input.

Types "WLS", "DWLS", and "ULS" will process raw data into WLS data of these types.

The default, "Auto" will treat data as the type they are provided as.

Ordinal Data In an important capability, the model transparently handles ordinal (binary or multi-level ordered factor data) inputs, and can handle mixtures of continuous, binary, and ordinal data in any combination. An experimental feature is under development to allow Tobit modeling.

The function also supports weighting of individual data rows. In this case, the model is estimated for each row individually, then each row likelihood is multiplied by its weight, and these weighted likelihoods summed to form the model-likelihood, which is to be minimized. This feature is used in the non-linear GxE model functions.

Additional features The umxACE function supports varying the DZ genetic association (defaulting to .5) to allow exploring assortative mating effects, as well as varying the DZ “C” factor from 1 (the default for modeling family-level effects shared 100% by twins in a pair), to .25 to model dominance effects.

Matrices and Labels in ACE model

Matrices 'a', 'c', and 'e' contain the path loadings of the Cholesky ACE factor model.

So, labels relevant to modifying the model are of the form "a_r1c1", "c_r1c1" etc.

Variables are in rows, and factors are in columns. So to drop the influence of factor 2 on variable 3, you would say:

m2 = umxModify(m1, update = "c_r3c2")

Less commonly-modified matrices are the mean matrix expMean. This has 1 row, and the columns are laid out for each variable for twin 1, followed by each variable for twin 2.

So, in a model where the means for twin 1 and twin 2 had been equated (set = to T1), you could make them independent again with this script:

m1$top$expMean$labels[1, 4:6] = c("expMean_r1c4", "expMean_r1c5", "expMean_r1c6")

note: Only one of C or D may be estimated simultaneously. This restriction reflects the lack of degrees of freedom to simultaneously model C and D with only MZ and DZ twin pairs (Eaves et al. 1978, p267).

References

Eaves, L. J., Last, K. A., Young, P. A., & Martin, N. G. (1978). Model-fitting approaches to the analysis of human behaviour. Heredity, 41, 249-320. https://www.nature.com/articles/hdy1978101.pdf

Examples

Run this code

# \donttest{
require(umx)
# ============================
# = How heritable is height? =
# ============================

# 1. Height in meters has a tiny variance, and this makes optimising hard.
#    We'll scale it by 10x to make the Optimizer's task easier.
data(twinData) # ?twinData from Australian twins.
twinData[, c("ht1", "ht2")] = twinData[, c("ht1", "ht2")] * 10

# 2. Make mz & dz data.frames (no need to select variables: umx will do this)
mzData = twinData[twinData$zygosity %in% "MZFF", ]
dzData = twinData[twinData$zygosity %in% "DZFF", ]

# 3. Built & run the model, controlling for age in the means model
m1 = umxACE(selDVs = "ht", selCovs = "age", sep = "", dzData = dzData, mzData = mzData)

# sidebar: umxACE figures out variable names using sep: 
#    e.g. selVars = "wt" + sep= "_T" -> "wt_T1" "wt_T2"

umxSummary(m1, std = FALSE) # un-standardized

# tip 1: set report = "html" and umxSummary prints the table to your browser!
# tip 2: plot works for umx: Get a figure of the model and parameters
# plot(m1) # Also, look at the options for ?plot.MxModel.

# ===========================================
# = Test ADE, AE, CE, E, and generate table =
# ===========================================

umxReduce(m1, report="html", silent= TRUE)

# ============================
# = Model, with 2 covariates =
# ============================

# Create another covariate: cohort
twinData$cohort1 = twinData$cohort2 =twinData$part
mzData = twinData[twinData$zygosity %in% "MZFF", ]
dzData = twinData[twinData$zygosity %in% "DZFF", ]

# 1. def var approach
m2 = umxACE(selDVs = "ht", selCovs = c("age", "cohort"), sep = "", dzData = dzData, mzData = mzData)

# 2. Residualized approach: remove height variance accounted-for by age.
FFdata = twinData[twinData$zygosity %in% c("MZFF", "DZFF"), ]
FFdata = umx_residualize("ht", "age", suffixes = 1:2, data = FFdata)
mzData = FFdata[FFdata$zygosity %in% "MZFF", ]
dzData = FFdata[FFdata$zygosity %in% "DZFF", ]
m3 = umxACE(selDVs = "ht", sep = "", dzData = dzData, mzData = mzData)

# =============================================================
# = ADE: Evidence for dominance ? (DZ correlation set to .25) =
# =============================================================
m2 = umxACE(selDVs = "ht", sep = "", dzData = dzData, mzData = mzData, dzCr = .25)
umxCompare(m2, m1) # ADE is better
umxSummary(m2, comparison = m1) 
# nb: Although summary is smart enough to print d, the underlying 
#     matrices are still called a, c & e.

# tip: try umxReduce(m1) to automatically build and compare ACE, ADE, AE, CE
# including conditional probabilities!

# ===================================================
# = WLS example using diagonal weight least squares =
# ===================================================

m3 = umxACE(selDVs = "ht", sep = "", dzData = dzData, mzData = mzData, 
	type = "DWLS", allContinuousMethod='marginals'
)


# ==============================
# = Univariate model of weight =
# ==============================

# Things to note:

# 1. Weight has a large variance, and this makes solution finding very hard.
# Here, we residualize the data for age, which also scales weight and height.

data(twinData)
tmp = umx_residualize(c("wt", "ht"), cov = "age", suffixes= c(1, 2), data = twinData)
mzData = tmp[tmp$zygosity %in% "MZFF", ]
dzData = tmp[tmp$zygosity %in% "DZFF", ]

# tip: You might also want transform variables
# tmp = twinData$wt1[!is.na(twinData$wt1)]
# car::powerTransform(tmp, family="bcPower"); hist(tmp^-0.6848438)
# twinData$wt1 = twinData$wt1^-0.6848438
# twinData$wt2 = twinData$wt2^-0.6848438

# 4. note: the default boundDiag = 0 lower-bounds a, c, and e at 0.
#    Prevents mirror-solutions. If not desired: set boundDiag = NULL.

m2 = umxACE(selDVs = "wt", dzData = dzData, mzData = mzData, sep = "", boundDiag = NULL)

# A short cut (which is even shorter for "_T" twin data with "MZ"/"DZ" data in zygosity column is:
m1 = umxACE(selDVs = "wt", sep = "", data = twinData,
	dzData = c("DZMM", "DZFF", "DZOS"), mzData = c("MZMM", "MZFF"))
# |   |   a1|c1 |   e1|
# |:--|----:|:--|----:|
# |wt | 0.93|.  | 0.38|

# tip: umx_make_twin_data_nice() will make data into this nice format for you!

# ======================
# = MODEL MODIFICATION =
# ======================
# We can modify this model, e.g. test shared environment. 
# Set comparison to modify, and show effect in one step.

m2 = umxModify(m1, update = "c_r1c1", name = "no_C", comparison = TRUE)
#*tip* call umxModify(m1) with no parameters, and it will print the labels available to fix!
# nb: You can see parameters of any model with parameters(m1)

# =========================================================
# = Well done! Now you can make modify twin models in umx =
# =========================================================

# =====================================
# = Bivariate height and weight model =
# =====================================
data(twinData)
# We'll scale height (ht1 and ht2) and weight
twinData = umx_scale_wide_twin_data(data = twinData, varsToScale = c("ht", "wt"), sep = "")
mzData = twinData[twinData$zygosity %in% c("MZFF", "MZMM"),]
dzData = twinData[twinData$zygosity %in% c("DZFF", "DZMM", "DZOS"), ]
m1 = umxACE(selDVs = c("ht", "wt"), sep = '', dzData = dzData, mzData = mzData)
umxSummary(m1)

# ===================
# = Ordinal example =
# ===================

# Prep data
require(umx)
data(twinData)
# Cut BMI column to form ordinal obesity variables
obLevels = c('normal', 'overweight', 'obese')
cuts = quantile(twinData[, "bmi1"], probs = c(.5, .2), na.rm = TRUE)
twinData$obese1=cut(twinData$bmi1, breaks=c(-Inf,cuts,Inf), labels=obLevels)
twinData$obese2=cut(twinData$bmi2, breaks=c(-Inf,cuts,Inf), labels=obLevels)

# Make the ordinal variables into umxFactors
ordDVs = c("obese1", "obese2")
twinData[, ordDVs] = umxFactor(twinData[, ordDVs])

mzData = twinData[twinData$zygosity %in% "MZFF", ]
dzData = twinData[twinData$zygosity %in% "DZFF", ]

# Model and summary!
m1 = umxACE(selDVs = "obese", dzData = dzData, mzData = mzData, sep = '')

# And controlling age (otherwise manifests appearance as latent C)
m1 = umxACE(selDVs = "obese", selCov= "age", dzData = dzData, mzData = mzData, sep = '')
# umxSummary(m1)

# ============================================
# = Bivariate continuous and ordinal example =
# ============================================
data(twinData)
twinData= umx_scale_wide_twin_data(data=twinData,varsToScale="wt",sep= "")
# Cut BMI column to form ordinal obesity variables
obLevels   = c('normal', 'overweight', 'obese')
cuts       = quantile(twinData[, "bmi1"], probs = c(.5, .2), na.rm = TRUE)
twinData$obese1=cut(twinData$bmi1,breaks=c(-Inf,cuts,Inf),labels=obLevels)
twinData$obese2=cut(twinData$bmi2,breaks=c(-Inf,cuts,Inf),labels=obLevels)
# Make the ordinal variables into mxFactors
ordDVs = c("obese1", "obese2")
twinData[, ordDVs] = umxFactor(twinData[, ordDVs])
mzData = twinData[twinData$zygosity %in% "MZFF",] 
dzData = twinData[twinData$zygosity %in% "DZFF",]
mzData = mzData[1:80,] # just top 80 so example runs in a couple of secs
dzData = dzData[1:80,]
m1 = umxACE(selDVs= c("wt","obese"), dzData= dzData, mzData= mzData, sep='')

# And controlling age
m1 = umxACE(selDVs = c("wt","obese"), selCov= "age", dzData = dzData, mzData = mzData, sep = '')

# =======================================
# = Mixed continuous and binary example =
# =======================================
require(umx)
data(twinData)
twinData= umx_scale_wide_twin_data(data= twinData,varsToScale= "wt", sep="")
# Cut to form category of 20% obese subjects
# and make into mxFactors (ensure ordered is TRUE, and require levels)
obLevels   = c('normal', 'obese')
cuts       = quantile(twinData[, "bmi1"], probs = .2, na.rm = TRUE)
twinData$obese1= cut(twinData$bmi1, breaks=c(-Inf,cuts,Inf), labels=obLevels) 
twinData$obese2= cut(twinData$bmi2, breaks=c(-Inf,cuts,Inf), labels=obLevels) 
ordDVs = c("obese1", "obese2")
twinData[, ordDVs] = umxFactor(twinData[, ordDVs])

selDVs = c("wt", "obese")
mzData = twinData[twinData$zygosity %in% "MZFF",]
dzData = twinData[twinData$zygosity %in% "DZFF",]
m1 = umxACE(selDVs = selDVs, dzData = dzData, mzData = mzData, sep = '')
umxSummary(m1)

# ==============
# = Two binary =
# ==============
require(umx)
data(twinData)
htLevels   = c('short', 'tall')
obLevels   = c('normal', 'obese')
cuts       = quantile(twinData[, "bmi1"], probs = .2, na.rm = TRUE)
twinData$obese1= cut(twinData$bmi1, breaks=c(-Inf,cuts,Inf), labels=obLevels) 
twinData$obese2= cut(twinData$bmi2, breaks=c(-Inf,cuts,Inf), labels=obLevels) 
ordDVs = c("obese1", "obese2")
twinData[, ordDVs] = umxFactor(twinData[, ordDVs])

twinData$short1 = cut(twinData$ht1, breaks=c(-Inf,1.6,Inf), labels=htLevels) 
twinData$short2 = cut(twinData$ht2, breaks=c(-Inf,1.6,Inf), labels=htLevels) 
ordDVs = c("short1", "short2")
twinData[, ordDVs] = umxFactor(twinData[, ordDVs])

mzData = twinData[twinData$zygosity %in% "MZFF",]
dzData = twinData[twinData$zygosity %in% "DZFF",]
m1 = umxACE(selDVs = c("short", "obese"), dzData = dzData, mzData = mzData, sep = '')

# ===================================
# Example with covariance data only =
# ===================================

require(umx)
data(twinData)
twinData= umx_scale_wide_twin_data(data=twinData, varsToScale= "wt", sep="")
selDVs = c("wt1", "wt2")
mz = cov(twinData[twinData$zygosity %in%  "MZFF", selDVs], use = "complete")
dz = cov(twinData[twinData$zygosity %in%  "DZFF", selDVs], use = "complete")
m1 = umxACE(selDVs=selDVs, dzData=dz, mzData=mz, numObsDZ=569, numObsMZ=351)
umxSummary(m1)
plot(m1)
# }

Run the code above in your browser using DataLab