umx_long2wide: Take a long twin-data file and make it wide (one family per row)

Description

umx_long2wide merges on famID. Family members are ordered by twinID.

twinID is equivalent to birth order. Up to 10 twinIDs are allowed (family order).

Note: Not all data sets have an order column, but it is essential to rank subjects correctly.

You might start off with a TWID which is a concatenation of a familyID and a 2 digit twinID

Generating famID and twinID as used by this function

You can capture the last 2 digits with the mod function: twinID = df$TWID %% 100

You can drop the last 2 digits with integer div: famID = df$TWID %/% 100

Note: The functions assumes that if zygosity or any passalong variables are NA in the first family member, they are NA everywhere. i.e., it does not hunt for values that are present elsewhere to try and self-heal missing data.

Usage

umx_long2wide(
  data,
  famID = NA,
  twinID = NA,
  zygosity = NA,
  vars2keep = NA,
  passalong = NA,
  twinIDs2keep = NA
)

Arguments

data

The original (long-format) data file

famID

The unique identifier for members of a family

twinID

The twinID. Typically 1, 2, 50 51, etc...

zygosity

Typically MZFF, DZFF MZMM, DZMM DZOS

vars2keep

= The variables you wish to analyse (these will be renamed with paste0("_T", twinID)

passalong

= Variables you wish to pass-through (keep, even though not twin vars)

twinIDs2keep

= If NA (the default) all twinIDs are kept, else only those listed here. Useful to drop sibs.

Value

dataframe in wide format

References

https://github.com/tbates/umx, https://tbates.github.io

Examples

Run this code

# NOT RUN {
# ==============================================
# = First make a long format file for the demo =
# ==============================================
data(twinData)
tmp = twinData[, -2]
tmp$twinID1 = 1; tmp$twinID2 = 2
long = umx_wide2long(data = tmp, sep = "")
str(long)
# 'data.frame':	7616 obs. of  11 variables:
#  $ fam     : int  1 2 3 4 5 6 7 8 9 10 ...
#  $ zyg     : int  1 1 1 1 1 1 1 1 1 1 ...
#  $ part    : int  2 2 2 2 2 2 2 2 2 2 ...
#  $ cohort  : chr  "younger" "younger" "younger" "younger" ...
#  $ zygosity: Factor w/ 5 levels "MZFF","MZMM",..: 1 1 1 1 1 1 1 1 1 1 ...
#  $ wt      : int  58 54 55 66 50 60 65 40 60 76 ...
#  $ ht      : num  1.7 1.63 1.65 1.57 1.61 ...
#  $ htwt    : num  20.1 20.3 20.2 26.8 19.3 ...
#  $ bmi     : num  21 21.1 21 23 20.7 ...
#  $ age     : int  21 24 21 21 19 26 23 29 24 28 ...
#  $ twinID  : num  1 1 1 1 1 1 1 1 1 1 ...

# OK. Now to demo long2wide...

# Keeping all columns
wide = umx_long2wide(data= long, famID= "fam", twinID= "twinID", zygosity= "zygosity")
namez(wide) # some vars, like part, should have been passed along instead of made into "part_T1"

# ======================================
# = Demo requesting specific vars2keep =
# ======================================

# Just keep bmi and wt
wide = umx_long2wide(data= long, famID= "fam", twinID= "twinID", 
    zygosity = "zygosity", vars2keep = c("bmi", "wt")
)

namez(wide)
# "fam" "twinID" "zygosity" "bmi_T1" "wt_T1" "bmi_T2" "wt_T2"

# ==================
# = Demo passalong =
# ==================
# Keep bmi and wt, and pass through 'cohort'
wide = umx_long2wide(data= long, famID= "fam", twinID= "twinID", zygosity= "zygosity", 
	vars2keep = c("bmi", "wt"), passalong = "cohort"
)
namez(wide)

# }
# NOT RUN {
# }

Run the code above in your browser using DataLab