umxACE: Build and run a 2-group Cholesky twin model (uni-variate or multi-variate)

Description

Implementing a core task in twin modeling, umxACE models the genetic and environmental structure of one or more phenotypes (measured variables) using the Cholesky ACE model (Neale and Cardon, 1996).

Classical twin modeling uses the genetic and environmental differences among pairs of mono-zygotic (MZ) and di-zygotic (DZ) twins reared together.

umxACE implements a 2-group model to capture these data and represent the phenotypic variance as a sum of Additive genetic, unique environmental (E) and, optionally, either common or shared-environment (C) or non-additive genetic effects (D).

The following figure shows how the ACE model appears as a path diagram (for one variable):

Figure: ACE univariate.png

umxACE allows multivariate analyses, and this brings us to the Cholesky part of the model.

This model creates as many latent A C and E variables as there are phenotypes, and, moving from left to right, decomposes the variance in each manifest into successively restricted factors. The following figure shows how the ACE model appears as a path diagram:

Figure: ACE matrix.png

In this model, the variance-covariance matrix of the raw data is recovered as the product of the lower Cholesky and its transform.

This Cholesky or lower-triangle decomposition allows a model which is both sure to be solvable, and also to account for all the variance (with some restrictions) in the data.

This figure also contains the key to understanding how to modify models that umxACE produces. read the "Matrices and Labels in ACE model" section in details below...

NOTE: Scroll down to details for how to use the function, a figure and multiple examples.

Usage

umxACE(
  name = "ACE",
  selDVs,
  selCovs = NULL,
  dzData = NULL,
  mzData = NULL,
  sep = NULL,
  data = NULL,
  zyg = "zygosity",
  type = c("Auto", "FIML", "cov", "cor", "WLS", "DWLS", "ULS"),
  numObsDZ = NULL,
  numObsMZ = NULL,
  boundDiag = 0,
  allContinuousMethod = c("cumulants", "marginals"),
  autoRun = getOption("umx_auto_run"),
  intervals = FALSE,
  tryHard = c("no", "yes", "ordinal", "search"),
  optimizer = NULL,
  residualizeContinuousVars = FALSE,
  nSib = 2,
  dzAr = 0.5,
  dzCr = 1,
  weightVar = NULL,
  equateMeans = TRUE,
  addStd = TRUE,
  addCI = TRUE
)

Arguments

name

The name of the model (defaults to"ACE").

selDVs

The variables to include from the data: preferably, just "dep" not c("dep_T1", "dep_T2").

selCovs

(optional) covariates to include from the data (do not include sep in names)

dzData

The DZ dataframe.

mzData

The MZ dataframe.

sep

The separator in twin variable names, often "_T", e.g. "dep_T1". Simplifies selDVs.

data

If provided, dzData and mzData are treated as levels of zyg to select() MZ and DZ data sets (default = NULL)

zyg

If data provided, this column is used to select rows by zygosity (Default = "zygosity")

type

Analysis method one of c("Auto", "FIML", "cov", "cor", "WLS", "DWLS", "ULS")

numObsDZ

Number of DZ twins: Set this if you input covariance data.

numObsMZ

Number of MZ twins: Set this if you input covariance data.

boundDiag

Numeric lbound for diagonal of the a, c, and e matrices. Defaults to 0 since umx version 1.8

allContinuousMethod

"cumulants" or "marginals". Used in all-continuous WLS data to determine if a means model needed.

autoRun

Whether to run the model (default), or just to create it and return without running.

intervals

Whether to run mxCI confidence intervals (default = FALSE)

tryHard

Default ('no') uses normal mxRun. "yes" uses mxTryHard. Other options: "ordinal", "search"

optimizer

Optionally set the optimizer (default NULL does nothing).

residualizeContinuousVars

Not yet implemented.

nSib

Number of siblings in a family (default - 2). "3" = extra sib.

dzAr

The DZ genetic correlation (defaults to .5, vary to examine assortative mating).

dzCr

The DZ "C" correlation (defaults to 1: set to .25 to make an ADE model).

weightVar

If provided, a vector objective will be used to weight the data. (default = NULL).

equateMeans

Whether to equate the means across twins (defaults to TRUE).

addStd

Whether to add the algebras to compute a std model (defaults to TRUE).

addCI

Whether to add intervals to compute CIs (defaults to TRUE).

Value

mxModel() of subclass mxModel.ACE

Details

Covariates umxACE handles covariates by modelling them in the means. On the plus side, there is no distributional assumption for this method. A downside of this approach is that all covariates must be non-NA, thus dropping any rows where one or more covariates are missing. This can waste data. See also umx_residualize()).

Data Input The function flexibly accepts raw data, and also summary covariance data (in which case the user must also supple numbers of observations for the two input data sets).

The type parameter can select how you want the model data treated. "FIML" is the normal treatment. "cov" and "cor" will turn raw data into cor data for analysis, or check that you've provided cor data as input.

Types "WLS", "DWLS", and "ULS" will process raw data into WLS data of these types.

The default, "Auto" will treat data as the type they are provided as.

Ordinal Data In an important capability, the model transparently handles ordinal (binary or multi-level ordered factor data) inputs, and can handle mixtures of continuous, binary, and ordinal data in any combination. An experimental feature is under development to allow Tobit modeling.

The function also supports weighting of individual data rows. In this case, the model is estimated for each row individually, then each row likelihood is multiplied by its weight, and these weighted likelihoods summed to form the model-likelihood, which is to be minimized. This feature is used in the non-linear GxE model functions.

Additional features The umxACE function supports varying the DZ genetic association (defaulting to .5) to allow exploring assortative mating effects, as well as varying the DZ “C” factor from 1 (the default for modeling family-level effects shared 100% by twins in a pair), to .25 to model dominance effects.

Matrices and Labels in ACE model

Matrices 'a', 'c', and 'e' contain the path loadings of the Cholesky ACE factor model.

So, labels relevant to modifying the model are of the form "a_r1c1", "c_r1c1" etc.

Variables are in rows, and factors are in columns. So to drop the influence of factor 2 on variable 3, you would say:

m2 = umxModify(m1, update = "c_r3c2")

Less commonly-modified matrices are the mean matrix expMean. This has 1 row, and the columns are laid out for each variable for twin 1, followed by each variable for twin 2.

So, in a model where the means for twin 1 and twin 2 had been equated (set = to T1), you could make them independent again with this script:

m1$top$expMean$labels[1, 4:6] = c("expMean_r1c4", "expMean_r1c5", "expMean_r1c6")

note: Only one of C or D may be estimated simultaneously. This restriction reflects the lack of degrees of freedom to simultaneously model C and D with only MZ and DZ twin pairs (Eaves et al. 1978, p267).

References

Eaves, L. J., Last, K. A., Young, P. A., & Martin, N. G. (1978). Model-fitting approaches to the analysis of human behaviour. Heredity, 41, 249-320. https://www.nature.com/articles/hdy1978101.pdf

Examples

Run this code

# NOT RUN {
require(umx)
# ============================
# = How heritable is height? =
# ============================

# 1. Height in meters has a tiny variance, and this makes optimising hard.
#    We'll scale it by 10x to make the Optimizer's task easier.
data(twinData) # ?twinData from Australian twins.
twinData[, c("ht1", "ht2")] = twinData[, c("ht1", "ht2")] * 10

# 2. Make mz & dz data.frames (no need to select variables: umx will do this)
mzData = twinData[twinData$zygosity %in% "MZFF", ]
dzData = twinData[twinData$zygosity %in% "DZFF", ]

# 3. Built & run the model, controlling for age in the means model
m1 = umxACE(selDVs = "ht", selCovs = "age", sep = "", dzData = dzData, mzData = mzData)

# sidebar: umxACE figures out variable names using sep: 
#    e.g. selVars = "wt" + sep= "_T" -> "wt_T1" "wt_T2"

umxSummary(m1, std = FALSE) # un-standardized

# tip 1: set report = "html" and umxSummary prints the table to your browser!
# tip 2: plot works for umx: Get a figure of the model and parameters
# plot(m1) # Also, look at the options for ?plot.MxModel.

# }
# NOT RUN {
# ============================
# = Model, with 2 covariates =
# ============================

# Create another covariate: cohort
twinData$cohort1 = twinData$cohort2 =twinData$part
mzData = twinData[twinData$zygosity %in% "MZFF", ]
dzData = twinData[twinData$zygosity %in% "DZFF", ]

# 1. def var approach
m2 = umxACE(selDVs = "ht", selCovs = c("age", "cohort"), sep = "", dzData = dzData, mzData = mzData)

# 2. Residualized approach: remove height variance accounted-for by age.
FFdata = twinData[twinData$zygosity %in% c("MZFF", "DZFF"), ]
FFdata = umx_residualize("ht", "age", suffixes = 1:2, data = FFdata)
mzData = FFdata[FFdata$zygosity %in% "MZFF", ]
dzData = FFdata[FFdata$zygosity %in% "DZFF", ]
m3 = umxACE(selDVs = "ht", sep = "", dzData = dzData, mzData = mzData)

# =============================================================
# = ADE: Evidence for dominance ? (DZ correlation set to .25) =
# =============================================================
m2 = umxACE(selDVs = "ht", sep = "", dzData = dzData, mzData = mzData, dzCr = .25)
umxCompare(m2, m1) # ADE is better
umxSummary(m2, comparison = m1) 
# nb: Although summary is smart enough to print d, the underlying 
#     matrices are still called a, c & e.

# tip: try umxReduce(m1) to automatically build and compare ACE, ADE, AE, CE
# including conditional probabilities!

# ===================================================
# = WLS example using diagonal weight least squares =
# ===================================================

m3 = umxACE(selDVs = "ht", sep = "", dzData = dzData, mzData = mzData, 
	type = "DWLS", allContinuousMethod='marginals'
)


# ==============================
# = Univariate model of weight =
# ==============================

# Things to note:

# 1. Weight has a large variance, and this makes solution finding very hard.
# Here, we residualize the data for age, which also scales weight and height.

data(twinData)
tmp = umx_residualize(c("wt", "ht"), cov = "age", suffixes= c(1, 2), data = twinData)
mzData = tmp[tmp$zygosity %in% "MZFF", ]
dzData = tmp[tmp$zygosity %in% "DZFF", ]

# tip: You might also want transform variables
# tmp = twinData$wt1[!is.na(twinData$wt1)]
# car::powerTransform(tmp, family="bcPower"); hist(tmp^-0.6848438)
# twinData$wt1 = twinData$wt1^-0.6848438
# twinData$wt2 = twinData$wt2^-0.6848438

# 4. note: the default boundDiag = 0 lower-bounds a, c, and e at 0.
#    Prevents mirror-solutions. If not desired: set boundDiag = NULL.

m2 = umxACE(selDVs = "wt", dzData = dzData, mzData = mzData, sep = "", boundDiag = NULL)

# A short cut (which is even shorter for "_T" twin data with "MZ"/"DZ" data in zygosity column is:
m1 = umxACE(selDVs = "wt", sep = "", data = twinData,
	dzData = c("DZMM", "DZFF", "DZOS"), mzData = c("MZMM", "MZFF"))
# |   |   a1|c1 |   e1|
# |:--|----:|:--|----:|
# |wt | 0.93|.  | 0.38|

# tip: umx_make_twin_data_nice() will make data into this nice format for you!

# ======================
# = MODEL MODIFICATION =
# ======================
# We can modify this model, e.g. test shared environment. 
# Set comparison to modify, and show effect in one step.

m2 = umxModify(m1, update = "c_r1c1", name = "no_C", comparison = TRUE)
#*tip* call umxModify(m1) with no parameters, and it will print the labels available to fix!
# nb: You can see parameters of any model with parameters(m1)

# =========================================================
# = Well done! Now you can make modify twin models in umx =
# =========================================================

# =====================================
# = Bivariate height and weight model =
# =====================================
data(twinData)
# We'll scale height (ht1 and ht2) and weight
twinData = umx_scale_wide_twin_data(data = twinData, varsToScale = c("ht", "wt"), sep = "")
mzData = twinData[twinData$zygosity %in% c("MZFF", "MZMM"),]
dzData = twinData[twinData$zygosity %in% c("DZFF", "DZMM", "DZOS"), ]
m1 = umxACE(selDVs = c("ht", "wt"), sep = '', dzData = dzData, mzData = mzData)
umxSummary(m1)

# ===================
# = Ordinal example =
# ===================

# Prep data
require(umx)
data(twinData)
# Cut BMI column to form ordinal obesity variables
obLevels = c('normal', 'overweight', 'obese')
cuts = quantile(twinData[, "bmi1"], probs = c(.5, .2), na.rm = TRUE)
twinData$obese1=cut(twinData$bmi1, breaks=c(-Inf,cuts,Inf), labels=obLevels)
twinData$obese2=cut(twinData$bmi2, breaks=c(-Inf,cuts,Inf), labels=obLevels)

# Make the ordinal variables into umxFactors
ordDVs = c("obese1", "obese2")
twinData[, ordDVs] = umxFactor(twinData[, ordDVs])

mzData = twinData[twinData$zygosity %in% "MZFF", ]
dzData = twinData[twinData$zygosity %in% "DZFF", ]

# Model and summary!
m1 = umxACE(selDVs = "obese", dzData = dzData, mzData = mzData, sep = '')

# And controlling age (otherwise manifests appearance as latent C)
m1 = umxACE(selDVs = "obese", selCov= "age", dzData = dzData, mzData = mzData, sep = '')
# umxSummary(m1)

# ============================================
# = Bivariate continuous and ordinal example =
# ============================================
data(twinData)
twinData= umx_scale_wide_twin_data(data=twinData,varsToScale="wt",sep= "")
# Cut BMI column to form ordinal obesity variables
obLevels   = c('normal', 'overweight', 'obese')
cuts       = quantile(twinData[, "bmi1"], probs = c(.5, .2), na.rm = TRUE)
twinData$obese1=cut(twinData$bmi1,breaks=c(-Inf,cuts,Inf),labels=obLevels)
twinData$obese2=cut(twinData$bmi2,breaks=c(-Inf,cuts,Inf),labels=obLevels)
# Make the ordinal variables into mxFactors
ordDVs = c("obese1", "obese2")
twinData[, ordDVs] = umxFactor(twinData[, ordDVs])
mzData = twinData[twinData$zygosity %in% "MZFF",] 
dzData = twinData[twinData$zygosity %in% "DZFF",]
mzData = mzData[1:80,] # just top 80 so example runs in a couple of secs
dzData = dzData[1:80,]
m1 = umxACE(selDVs= c("wt","obese"), dzData= dzData, mzData= mzData, sep='')

# And controlling age
m1 = umxACE(selDVs = c("wt","obese"), selCov= "age", dzData = dzData, mzData = mzData, sep = '')

# =======================================
# = Mixed continuous and binary example =
# =======================================
require(umx)
data(twinData)
twinData= umx_scale_wide_twin_data(data= twinData,varsToScale= "wt", sep="")
# Cut to form category of 20% obese subjects
# and make into mxFactors (ensure ordered is TRUE, and require levels)
obLevels   = c('normal', 'obese')
cuts       = quantile(twinData[, "bmi1"], probs = .2, na.rm = TRUE)
twinData$obese1= cut(twinData$bmi1, breaks=c(-Inf,cuts,Inf), labels=obLevels) 
twinData$obese2= cut(twinData$bmi2, breaks=c(-Inf,cuts,Inf), labels=obLevels) 
ordDVs = c("obese1", "obese2")
twinData[, ordDVs] = umxFactor(twinData[, ordDVs])

selDVs = c("wt", "obese")
mzData = twinData[twinData$zygosity %in% "MZFF",]
dzData = twinData[twinData$zygosity %in% "DZFF",]
m1 = umxACE(selDVs = selDVs, dzData = dzData, mzData = mzData, sep = '')
umxSummary(m1)

# ===================================
# Example with covariance data only =
# ===================================

require(umx)
data(twinData)
twinData= umx_scale_wide_twin_data(data=twinData, varsToScale= "wt", sep="")
selDVs = c("wt1", "wt2")
mz = cov(twinData[twinData$zygosity %in%  "MZFF", selDVs], use = "complete")
dz = cov(twinData[twinData$zygosity %in%  "DZFF", selDVs], use = "complete")
m1 = umxACE(selDVs=selDVs, dzData=dz, mzData=mz, numObsDZ=569, numObsMZ=351)
umxSummary(m1)
plot(m1)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab