umxACEv: Build and run 2-group uni- or multi-variate ACE models based on VARIANCE (not paths).

Description

A common task in twin modeling involves using the genetic and environmental differences between large numbers of pairs of mono-zygotic (MZ) and di-zygotic (DZ) twins reared together to model the genetic and environmental structure of one, or, typically, several phenotypes. umxACEv directly estimates variance components (rather than paths, which are then squared to produce variance and therefore cannot be negative). It offers better power, correct Type I error and un-biased estimates (with no zero-bound for the variances) as a saturated model. (Verhulst et al, 2019).

The ACE variance-based model decomposes phenotypic variance into additive genetic (A), unique environmental (E) and, optionally, either common environment (shared-environment, C) or non-additive genetic effects (D). Scroll down to details for how to use the function, a figure and multiple examples.

The following figure shows the A components of a trivariate ACEv model:

Figure: ACEv.png

NOTE: This function does not use the Cholesky decomposition. Instead it directly models variance. This ensures unbiased type-I error rates. It means that occasionally estimates of variance may be negative. This should be used as an occasion to inspect you model choices and data. umxACEv can be used as a base model to validate the ACE Cholesky model, a core model in behavior genetics (Neale and Cardon, 1992).

Usage

umxACEv(
  name = "ACEv",
  selDVs,
  selCovs = NULL,
  sep = NULL,
  dzData,
  mzData,
  dzAr = 0.5,
  dzCr = 1,
  type = c("Auto", "FIML", "cov", "cor", "WLS", "DWLS", "ULS"),
  allContinuousMethod = c("cumulants", "marginals"),
  data = NULL,
  zyg = "zygosity",
  weightVar = NULL,
  numObsDZ = NULL,
  numObsMZ = NULL,
  addStd = TRUE,
  addCI = TRUE,
  boundDiag = NULL,
  equateMeans = TRUE,
  bVector = FALSE,
  autoRun = getOption("umx_auto_run"),
  tryHard = c("no", "yes", "ordinal", "search"),
  optimizer = NULL,
  nSib = 2
)

Value

mxModel() subclass mxModelACEv

Arguments

name: The name of the model (defaults to"ACE").
selDVs: The variables to include from the data: preferably, just "dep" not c("dep_T1", "dep_T2").
selCovs: (optional) covariates to include from the data (do not include sep in names)
sep: The separator in twin var names, often "_T" in vars like "dep_T1". Simplifies selDVs.
dzData: The DZ dataframe.
mzData: The MZ dataframe.
dzAr: The DZ genetic correlation (defaults to .5, vary to examine assortative mating).
dzCr: The DZ "C" correlation (defaults to 1: set to .25 to make an ADE model).
type: Analysis method one of c("Auto", "FIML", "cov", "cor", "WLS", "DWLS", "ULS").
allContinuousMethod: "cumulants" or "marginals". Used in all-continuous WLS data to determine if a means model needed.
data: If provided, dzData and mzData are treated as valid levels of zyg to select() data sets (default = NULL)
zyg: If data provided, this column is used to select rows by zygosity (Default = "zygosity")
weightVar: = If provided, a vector objective will be used to weight the data. (default = NULL).
numObsDZ: = Number of DZ twins: Set this if you input covariance data.
numObsMZ: = Number of MZ twins: Set this if you input covariance data.
addStd: Whether to add the algebras to compute a std model (defaults to TRUE).
addCI: Whether to add intervals to compute CIs (defaults to TRUE).
boundDiag: = Numeric lbound for diagonal of the a, c, and e matrices. Default = NULL (no bound)
equateMeans: Whether to equate the means across twins (defaults to TRUE).
bVector: Whether to compute row-wise likelihoods (defaults to FALSE).
autoRun: Whether to run the model (default), or just to create it and return without running.
tryHard: Default ('no') uses normal mxRun. "yes" uses mxTryHard. Other options: "ordinal", "search"
optimizer: Optionally set the optimizer (default NULL does nothing).
nSib: Number of sibs, default is 2. Working on 3 :-)

Details

Data Input The function flexibly accepts raw data, and also summary covariance data (in which case the user must also supple numbers of observations for the two input data sets).

Ordinal Data In an important capability, the model transparently handles ordinal (binary or multi-level ordered factor data) inputs, and can handle mixtures of continuous, binary, and ordinal data in any combination.

The function also supports weighting of individual data rows. In this case, the model is estimated for each row individually, then each row likelihood is multiplied by its weight, and these weighted likelihoods summed to form the model-likelihood, which is to be minimized. This feature is used in the non-linear GxE model functions.

Additional features The umxACEv function supports varying the DZ genetic association (defaulting to .5) to allow exploring assortative mating effects, as well as varying the DZ “C” factor from 1 (the default for modeling family-level effects shared 100% by twins in a pair), to .25 to model dominance effects.

note: Only one of C or D may be estimated simultaneously. This restriction reflects the lack of degrees of freedom to simultaneously model C and D with only MZ and DZ twin pairs (Eaves et al. 1978 p267).

References

Verhulst, B., Prom-Wormley, E., Keller, M., Medland, S., & Neale, M. C. (2019). Type I Error Rates and Parameter Bias in Multivariate Behavioral Genetic Models. Behav Genet, 49, 99-111. tools:::Rd_expr_doi("10.1007/s10519-018-9942-y") Eaves, L. J., Last, K. A., Young, P. A., & Martin, N. G. (1978). Model-fitting approaches to the analysis of human behaviour. Heredity, 41, 249-320. https://www.nature.com/articles/hdy1978101.pdf

Examples

Run this code

if (FALSE) {

# ==============================
# = Univariate model of weight =
# ==============================
require(umx)
data(twinData) # ?twinData from Australian twins.

# Things to note: ACE model of weight will return a NEGATIVE variance in C.
#  This is exactly why we have ACEv! It suggests we need a different model
#  In this case: ADE.
# Other things to note:
# 1. umxACEv can figure out variable names: provide "sep", and selVars. 
#    Function generates: "wt" -> "wt1" "wt2"
# 2. umxACEv picks the variables it needs from the data.

mzData = twinData[twinData$zygosity %in% "MZFF", ]
dzData = twinData[twinData$zygosity %in% "DZFF", ]
m1 = umxACEv(selDVs = "wt", sep = "", dzData = dzData, mzData = mzData)

# A short cut (which is even shorter for "_T" twin data with "MZ"/"DZ" data in zygosity column is:
m1 = umxACEv(selDVs = "wt", sep = "", dzData = "MZFF", mzData = "DZFF", data = twinData)
# ========================================================
# = Evidence for dominance ? (DZ correlation set to .25) =
# ========================================================
m2 = umxACEv("ADE", selDVs = "wt", sep = "", dzData = dzData, mzData = mzData, dzCr = .25)
# note: the underlying matrices are still called A, C, and E.
# I catch this in the summary table, so columns are labeled A, D, and E.
# However, currently, the plot will say A, C, E.

# We can modify this model, dropping dominance component (still called C), 
# and see a comparison:
m3 = umxModify(m2, update = "C_r1c1", comparison = TRUE, name="AE")
# =========================================================
# = Well done! Now you can make modify twin models in umx =
# =========================================================

# ============================
# = How heritable is height? =
# ============================
# 
# Note: Height has a small variance. umx can typically picks good starts,
#    but scaling is advisable.
# 
require(umx)
# Load data and rescale height to cm (var in m too small)
data(twinData) # ?twinData from Australian twins.
twinData[,c("ht1", "ht2")]= twinData[,c("ht1", "ht2")]*100

mzData = twinData[twinData$zygosity %in% "MZFF", ]
dzData = twinData[twinData$zygosity %in% "DZFF", ]
m1 = umxACEv(selDVs = "ht", sep = "", dzData = dzData, mzData = mzData)

umxSummary(m1, std = FALSE) # unstandardized
plot(m1)

# tip: with report = "html", umxSummary can print the table to your browser!
# tip: You can turn off auto-plot with umx_set_auto_plot(FALSE)

# ========================================================
# = Evidence for dominance ? (DZ correlation set to .25) =
# ========================================================
m2 = umxACEv("ADE", selDVs = "ht", dzCr = .25, sep="", dzData = dzData, mzData = mzData)
umxCompare(m2, m1) # Is ADE better?
umxSummary(m2, comparison = m1) # nb: though this is ADE, matrices are still called A,C,E

# We can modify this model, dropping shared environment, and see a comparison:
m3 = umxModify(m2, update = "C_r1c1", comparison = TRUE, name = "AE")

# =====================================
# = Bivariate height and weight model =
# =====================================

data(twinData)
twinData[,c("ht1", "ht2")]= twinData[,c("ht1", "ht2")]*100
mzData = twinData[twinData$zygosity %in% c("MZFF", "MZMM"), ]
dzData = twinData[twinData$zygosity %in% c("DZFF", "DZMM", "DZOS"), ]
m1 = umxACEv(selDVs = c("ht", "wt"), sep = '', dzData = dzData, mzData = mzData)

# ===================
# = Ordinal example =
# ===================
require(umx)
data(twinData)

# Cut bmi column to form ordinal obesity variables
cutPoints = quantile(twinData[, "bmi1"], probs = c(.5, .2), na.rm = TRUE)
obesityLevels = c('normal', 'overweight', 'obese')
twinData$obese1 = cut(twinData$bmi1, breaks = c(-Inf, cutPoints, Inf), labels = obesityLevels) 
twinData$obese2 = cut(twinData$bmi2, breaks = c(-Inf, cutPoints, Inf), labels = obesityLevels) 

# Make the ordinal variables into mxFactors (ensure ordered is TRUE, and require levels)
twinData[, c("obese1", "obese2")] = umxFactor(twinData[, c("obese1", "obese2")])
mzData = twinData[twinData$zygosity %in% "MZFF", ]
dzData = twinData[twinData$zygosity %in% "DZFF", ]
m2 = umxACEv(selDVs = "obese", dzData = dzData, mzData = mzData, sep = '')

# FYI: Show mz, dz, and t1 and t2 have the same levels!
str(mzData)

# ============================================
# = Bivariate continuous and ordinal example =
# ============================================
data(twinData)
# Cut bmi column to form ordinal obesity variables
ordDVs = c("obese1", "obese2")
obesityLevels = c('normal', 'overweight', 'obese')
cutPoints = quantile(twinData[, "bmi1"], probs = c(.5, .2), na.rm = TRUE)
twinData$obese1 = cut(twinData$bmi1, breaks = c(-Inf, cutPoints, Inf), labels = obesityLevels) 
twinData$obese2 = cut(twinData$bmi2, breaks = c(-Inf, cutPoints, Inf), labels = obesityLevels) 

# Make the ordinal variables into ordered mxFactors
twinData[, ordDVs] = umxFactor(twinData[, ordDVs])

# umxACEv can trim out unused variables on its own
mzData = twinData[twinData$zygosity %in% "MZFF", ]
dzData = twinData[twinData$zygosity %in% "DZFF", ]

m1 = umxACEv(selDVs = c("wt", "obese"), dzData = dzData, mzData = mzData, sep = '')
plot(m1)

# =======================================
# = Mixed continuous and binary example =
# =======================================
require(umx)
data(twinData)
# Cut to form category of 20% obese subjects
# and make into mxFactors (ensure ordered is TRUE, and require levels)
cutPoints = quantile(twinData[, "bmi1"], probs = .2, na.rm = TRUE)
obesityLevels = c('normal', 'obese')
twinData$obese1 = cut(twinData$bmi1, breaks = c(-Inf, cutPoints, Inf), labels = obesityLevels) 
twinData$obese2 = cut(twinData$bmi2, breaks = c(-Inf, cutPoints, Inf), labels = obesityLevels) 
ordDVs = c("obese1", "obese2")
twinData[, ordDVs] = umxFactor(twinData[, ordDVs])

selDVs = c("wt", "obese")
mzData = twinData[twinData$zygosity %in% "MZFF", ]
dzData = twinData[twinData$zygosity %in% "DZFF", ]
m1 = umxACEv(selDVs = selDVs, dzData = dzData, mzData = mzData, sep = '')
umxSummary(m1)

# ===================================
# Example with covariance data only =
# ===================================

require(umx)
data(twinData)
selDVs = c("wt")
mz = cov(twinData[twinData$zygosity %in% "MZFF", tvars(selDVs, "")], use = "complete")
dz = cov(twinData[twinData$zygosity %in% "DZFF", tvars(selDVs, "")], use = "complete")
m1 = umxACEv(selDVs = selDVs, sep= "", dzData = dz, mzData= mz, numObsDZ= 569, numObsMZ= 351)
umxSummary(m1, std = FALSE)
}

Run the code above in your browser using DataLab