umx_make_TwinData: Simulate twin data with control over A, C, and E parameters, as well as moderation of A.

Description

Makes MZ and DZ twin data, optionally with moderated A. By default, the three variance components must sum to 1.

See examples for how to use this: it is pretty flexible.

If you provide 2 varNames, they will be used for twin 1 and twin 2. If you provide one, it will be expanded to var_T1 and var_T2

You supply the number of pairs of each zygosity that wish to simulate (nMZpairs, nDZpairs), along with the values of AA, CC,and EE.

Note, if you want a power calculator, see power.ACE.test() and mxPower().

Shortcuts

You can omit nDZpairs. You can also give any two of A, C, or E and the function deduces the missing parameter so A+C+E == 1.

Moderation

Univariate GxE Data To simulate data for umxGxE, offer up a list of the average, min and max values for AA, i.e., c(avg = .5, min = 0, max = 1).

umx_make_TwinData will then return moderated heritability, with average value = avg, and swinging down to min and up to max across 3-SDs of the moderator.

Bivariate GxE Data

To simulate data with a moderator that is not shared by both twins. Moderated heritability is specified via the bivariate relationship (AA, CC, EE) and two moderators in each component. AA = list(a11 = .4, a12 = .1, a22 = .15) CC = list(c11 = .2, c12 = .1, c22 = .10) EE = list(e11 = .4, e12 = .3, e22 = .25) Amod = list(Beta_a1 = .025, Beta_a2 = .025) Cmod = list(Beta_c1 = .025, Beta_c2 = .025) Emod = list(Beta_e1 = .025, Beta_e2 = .025)

Usage

umx_make_TwinData(
  nMZpairs,
  nDZpairs = nMZpairs,
  AA = NULL,
  CC = NULL,
  EE = NULL,
  DD = NULL,
  varNames = "var",
  MZr = NULL,
  DZr = MZr,
  nSib = 2,
  dzAr = 0.5,
  scale = FALSE,
  mean = 0,
  sd = 1,
  nThresh = NULL,
  sum2one = TRUE,
  bivAmod = NULL,
  bivCmod = NULL,
  bivEmod = NULL,
  seed = NULL,
  empirical = FALSE
)

Arguments

nMZpairs

Number of MZ pairs to simulate

nDZpairs

Number of DZ pairs to simulate (defaults to nMZpairs)

value for A variance. NOTE: See options for use in GxE and Bivariate GxE

value for C variance.

value for E variance.

varNames

name for variables (defaults to 'var')

MZr

If MZr and DZr are set (default = NULL), the function returns dataframes of the request n and correlation.

DZr

Set to return dataframe using MZr and Dzr (Default NULL)

nSib

Number of siblings in a family (default - 2). "3" = extra sib.

dzAr

DZ Ar (default .5)

scale

Whether to scale output to var=1 mean=0 (Default FALSE)

mean

mean for traits (default = 0) (not applied to moderated cases)

sd of traits (default = 1) (not applied to moderated cases)

nThresh

If supplied, use as thresholds and return mxFactor output? (default is not to)

sum2one

Whether to enforce AA + CC + EE summing the one (default = TRUE)

bivAmod

Used for Bivariate GxE data: list(Beta_a1 = .025, Beta_a2 = .025)

bivCmod

Used for Bivariate GxE data: list(Beta_c1 = .025, Beta_c2 = .025)

bivEmod

Used for Bivariate GxE data: list(Beta_e1 = .025, Beta_e2 = .025)

seed

Allows user to set.seed() if wanting reproducible dataset

empirical

Passed to mvrnorm

Value

list of mzData and dzData dataframes containing T1 and T2 plus, if needed M1 and M2 (moderator values)

References

https://github.com/tbates/umx, https://tbates.github.io

Examples

Run this code

# NOT RUN {
# =====================================================================
# = Basic Example, with all elements of std univariate data specified =
# =====================================================================
tmp = umx_make_TwinData(nMZpairs = 10000, AA = .30, CC = .00, EE = .70)
# Show dataframe with 20,000 rows and 3 variables: var_T1, var_T2, and zygosity
str(tmp)

# ===============================
# = How to consume the datasets =
# ===============================

mzData = tmp[tmp$zygosity == "MZ", ]
dzData = tmp[tmp$zygosity == "DZ", ]
str(mzData); str(dzData); 
cov(mzData[, c("var_T1", "var_T2")])
cov(dzData[, c("var_T1", "var_T2")])
umxAPA(mzData[, c("var_T1", "var_T2")])

# Prefer to work in path coefficient values? (little a?)
tmp    = umx_make_TwinData(2000, AA = .7^2, CC = .0)
mzData = tmp[tmp$zygosity == "MZ", ]
dzData = tmp[tmp$zygosity == "DZ", ]
m1 = umxACE(selDVs="var", sep="_T", mzData= mzData, dzData= dzData)

# Examine correlations
cor(mzData[,c("var_T1","var_T2")])
cor(dzData[,c("var_T1","var_T2")])

# Example with D (left un-modeled in ACE)
tmp = umx_make_TwinData(nMZpairs = 500, AA = .4, DD = .2, CC = .2)
m1 = umxACE(selDVs="var", data = tmp, mzData= "MZ", dzData= "DZ")
# |    |   a1|   c1|   e1|
# |:---|----:|----:|----:|
# |var | 0.86| 0.24| 0.45|

m1 = umxACE(selDVs="var", data = tmp, mzData= "MZ", dzData= "DZ", dzCr=.25)
# |    |  a1|d1 |   e1|
# |:---|---:|:--|----:|
# |var | 0.9|.  | 0.44|


# =============
# = Shortcuts =
# =============

# Omit nDZpairs (equal numbers of both by default)
tmp = umx_make_TwinData(nMZpairs = 100, AA = 0.5, CC = 0.3) # omit any one of A, C, or E (sums to 1)
cov(tmp[tmp$zygosity == "DZ", c("var_T1","var_T2")])

# Not limited to unit variance
tmp = umx_make_TwinData(100, AA = 3, CC = 2, EE = 3, sum2one = FALSE) 
cov(tmp[tmp$zygosity == "MZ", c("var_T1","var_T2")])

# Output can be scaled (mean=0, std=1)
tmp = umx_make_TwinData(100, AA = .7, CC = .1, scale = TRUE) 
cov(tmp[tmp$zygosity == "MZ", c("var_T1","var_T2")])

# }
# NOT RUN {
# ===============
# = GxE Example =
# ===============

AA = c(avg = .5, min = .1, max = .8)
tmp = umx_make_TwinData(nMZpairs = 140, nDZpairs = 240, AA = AA, CC = .35, EE = .65, scale= TRUE)
mzData = tmp[tmp$zygosity == "MZ", ]
dzData = tmp[tmp$zygosity == "DZ", ]
m1 = umxGxE(selDVs = "var", selDefs = "M", sep = "_T", mzData = mzData, dzData = dzData)

# =====================
# = Threshold Example =
# =====================
tmp = umx_make_TwinData(100, AA = .6, CC = .2, nThresh = 3)
str(tmp)
umx_polychoric(subset(tmp, zygosity=="MZ", c("var_T1", "var_T2")))$polychorics
# Running model with 7 parameters
#           var_T1    var_T2
# var_T1 1.0000000 0.7435457
# var_T2 0.7435457 1.0000000


# =================================================
# = Just use MZr and DZr (also works with nSib>2) =
# =================================================
tmp = umx_make_TwinData(100, MZr = .86, DZr = .60, nSib= 3, varNames = "IQ")
umxAPA(subset(tmp, zygosity == "MZ", paste0("IQ_T", 1:2)))
umxAPA(subset(tmp, zygosity == "DZ", paste0("IQ_T", 1:2)))
m1 = umxACE(selDVs= "IQ", data = tmp)
m1 = umxACE(selDVs= "IQ", data = tmp, nSib=3)
# TODO tmx_ examples of unmodeled D etc.

# Bivariate GxSES example (see umxGxEbiv)

AA   = list(a11 = .4, a12 = .1, a22 = .15)
CC   = list(c11 = .2, c12 = .1, c22 = .10)
EE   = list(e11 = .4, e12 = .3, e22 = .25)
Amod = list(Beta_a1 = .025, Beta_a2 = .025)
Cmod = list(Beta_c1 = .025, Beta_c2 = .025)
Emod = list(Beta_e1 = .025, Beta_e2 = .025)
tmp = umx_make_TwinData(5000, AA =AA, CC = CC, EE = EE, 
			bivAmod = Amod, bivCmod =Cmod, bivEmod =Emod)
str(tmp)
# 'data.frame':	10000 obs. of  7 variables:
#  $ defM_T1 : num  0.171 0.293 -0.173 0.238 -0.73 ...
#  $ defM_T2 : num  0.492 -0.405 -0.696 -0.829 -0.858 ...
#  $ M_T1    : num  0.171 0.293 -0.173 0.238 -0.73 ...
#  $ var_T1  : num  0.011 0.1045 0.5861 0.0583 1.0225 ...
#  $ M_T2    : num  0.492 -0.405 -0.696 -0.829 -0.858 ...
#  $ var_T2  : num  -0.502 -0.856 -0.154 0.065 -0.268 ...
#  $ zygosity: Factor w/ 2 levels "MZ","DZ": 1 1 1 1 1 1 1 1 1 1 ...

# TODO tmx example showing how moderation of A introduces heteroscedasticity in a regression model:
# More residual variance at one extreme of the x axis (moderator) 
# m1 = lm(var_T1~ M_T1, data = x); 
# x = rbind(tmp[[1]], tmp[[2]])
# plot(residuals(m1)~ x$M_T1, data=x)
# }

Run the code above in your browser using DataLab