Learn R Programming

RFGLS (version 1.1)

FSV.frompedi: Family-Structure Variables from Pedigree File

Description

This function creates the family-structure variables "FTYPE" (family-type) and "INDIV" (individual code) from available information in a pedigree file. Note that FSV.frompedi() is called internally by gls.batch() and gls.batch.get() when their argument input.mode is set to 3.

Usage

FSV.frompedi(pedi.dat,phen.dat)

Arguments

pedi.dat
A pedigree file, as a data frame, with named columns. Typically, it will contain at least the following five named columns (which correspond to the default for argument pedicolname to gls.batch()): "FAMID", (family IDs), "ID" (unique individual IDs), "PID" (paternal ID), "MID" (maternal ID), and "SEX" (coded 1 for male, 2 for female). The paternal and maternal IDs of founders must either be 0 or NA.

Argument pedi.dat may also contain any/all of the following three named columns, the effects of which are described below under "Details": "ZYGOSITY", "ADOPTED", and "INDEP". The "ZYGOSITY" column must contain a value of 1 for each MZ twin, and a value of 2 for each DZ twin. The "ADOPTED" column must be a dummy variable for adoptive status, i.e. with value 1 for adoptees and value 0 otherwise (NA's are treated as 0). The "INDEP" column must be a dummy variable for whether the individual should be treated as an "independent observation" (family-type 6), with 1 for "yes" and 0 for "no" (NA's are treated as 0).

phen.dat
A phenotype file, as a data frame with named columns. At the bare minimum, it must contain a column of unique individual IDs, named "ID". The value returned by FSV.frompedi is this same data frame, with columns named "FTYPE" and "INDIV" appended thereto, unless columns with those names were already present, in which case their contents will be overwritten with new values. Any other named columns in phen.dat are ignored.

Value

A data frame, containing the same columns as phen.dat, with the addition of "FTYPE" and "INDIV". Usually, this data frame will simply be phen.dat with "FTYPE" and "INDIV" appended thereto. However, if phen.dat contained columns named "FTYPE" or "INDIV", the values in these columns will be overwritten with the new values produced by FSV.frompedi().

Details

RFGLS recognizes six recognized family types, which are distinguished primarily by how the offspring in the family are related to one another:
  • FTYPE=1, containing MZ twins;
  • FTYPE=2, containing DZ twins;
  • FTYPE=3, containing adoptees;
  • FTYPE=4, containing non-twin full siblings;
  • FTYPE=5, "mixed" families containing one biological offspring and one adoptee;
  • FTYPE=6, containing "independent observations" who do not fit into a four-person nuclear family.

It is assumed that all offspring except adoptees are biological children of the parents in the family. The four individual codes are:

  • INDIV=1 is for "Offspring #1;"
  • INDIV=2 is for "Offspring #2;"
  • INDIV=3 is for mothers;
  • INDIV=4 is for fathers.

The distinction between "Offspring #1" and "#2" is mostly arbitrary, except that in "mixed" families(FTYPE=5), the biological offspring MUST have INDIV=1, and the adopted offspring, INDIV=2.

The way that FSV.frompedi() assigns family-types and individual codes to participants depends upon the presence/absence of eight named columns in pedi.dat: "ID", "FAMID", "PID", "MID", "SEX", "ZYGOSITY", "ADOPTED", "INDEP". If any of the first five of these are absent, all participants are assigned FTYPE=6 and INDIV=1, with a warning. Assuming that those first five columns are present, what FSV.frompedi() does depends upon the presence/absence of the other three columns, as follows.

If "INDEP" is present, then FSV.frompedi() assigns FTYPE=6, INDIV=1 to participants with INDEP=1. These participants are then disregarded for the rest of the job. Like the other functions in this package, FSV.frompedi() treats participants with FTYPE=6 as the sole members of their own family units, and not as part of the family corresponding to their family ID.

If "ZYGOSITY" and "ADOPTED" are both absent, then (after first checking for "INDEP", as above), all participants are assigned FTYPE=4. Non-founders are identified as offspring, and participants whose IDs appear in "MID" or "PID" are assigned INDIV=3 or INDIV=4, respectively. Offspring individual codes are adjusted so that each family has only one instance each of INDIV=1 and INDIV=2. If more than two offspring are identified in a family, or if more than one mother or more than one father are identified in family, these participants are forced to FTYPE=6, INDIV=1. Also, any participant not otherwise assigned an individual code is given FTYPE=6, INDIV=1.

If "ZYGOSITY" is present but "ADOPTED" is absent, then FSV.frompedi() behaves similarly, except that (after first checking for "INDEP", as above) known twins are identified as offspring, and participants belonging to a family containing at least one twin are assigned FTYPE=1 (for MZ) or FTYPE=2 (for DZ), as the case may be. Member of families with no twins are assigned FTYPE=4. The program then proceeds as described in the immediately preceding paragraph.

If "ADOPTED" is present, FSV.frompedi() first makes some simple family-type assignments: if "ZYGOSITY" is present, to FTYPE=1 and FTYPE=2 as appropriate (see above), and then if "INDEP" is present, to FTYPE=6, INDIV=1 as appropriate (see above). Then, within each family, the program resolves each member in order of ID, from least to greatest. The first non-founder is assigned INDIV=1, the second, INDIV=2, and any thereafter, FTYPE=6, INDIV=1. The first adoptee is assigned INDIV=2, the second, INDIV=1, and any thereafter, FTYPE=6, INDIV=1. The first female non-adoptee non-founder is assigned INDIV=3, and any others are assigned FTYPE=6, INDIV=1. The first male non-adoptee non-founder is assigned INDIV=4, and any others are assigned FTYPE=6, INDIV=1. If family-type has not yet been assigned, then it is resolved as FTYPE=3 if there are two adoptees, FTYPE=5 if there is one adoptee and one biological offspring, and as FTYPE=4 otherwise.

Function FSV.frompedi() produces a warning whenever it forces a non-founder to FTYPE=6, INDIV=1.

Note that there is definitely a degree of arbitrariness in how ambiguous cases are resolved, in that FSV.frompedi() scans through the pedigree file from top to bottom after it has sorted the file by family ID, and by ID within the same family. So for example, if two participants in the same family are both provisionally assigned INDIV=3, then the apparent mother with the smaller ID retains INDIV=3, and the other is forced to FTYPE=6, INDIV=1.

See Also

gls.batch, gls.batch.get

Examples

Run this code
data(pheno)
data(pedigree)
table(pheno$FTYPE) ##<--Frequencies of correct family types.


fsvtest1 <- FSV.frompedi(pedi.dat=pedigree,
  phen.dat=data.frame(ID=pheno[,2])) ##<--Bare minimum phenotype file.
table(fsvtest1$FTYPE) ##<--Not correct, because pedigree file
                      ##doesn't have enough additional info
                      ##to recover the actual family-types
                      ##and individual codes.

#Create "ZYGOSITY" column:
pedigree$ZYGOSITY <- NA
pedigree$ZYGOSITY[pheno$FTYPE==1 & pheno$INDIV<3] <- 1
pedigree$ZYGOSITY[pheno$FTYPE==2 & pheno$INDIV<3] <- 2

fsvtest2 <- FSV.frompedi(pedi.dat=pedigree,phen.dat=data.frame(ID=pheno[,2]))
table(fsvtest2$FTYPE) ##<--Still not right, because pedigree file
                      ##lacks info about adoptees.
                      
#Create "ADOPTED" column:
pedigree$ADOPTED <- 0
pedigree$ADOPTED[pheno$FTYPE==3 & pheno$INDIV<3] <- 1
pedigree$ADOPTED[pheno$FTYPE==5 & pheno$INDIV==2] <- 1
fsvtest3 <- FSV.frompedi(pedi.dat=pedigree,phen.dat=data.frame(ID=pheno[,2]))
table(fsvtest3$FTYPE) ##<--Almost there.

#Create "INDEP" column:
pedigree$INDEP <- 0
pedigree$INDEP[pheno$FTYPE==6] <- 1
fsvtest4 <- FSV.frompedi(pedi.dat=pedigree,phen.dat=data.frame(ID=pheno[,2]))
table(fsvtest4$FTYPE) ##<--Correct family types have been recovered.
table(pheno$FTYPE) ##<--Compare.
all(pheno$FTYPE==fsvtest4$FTYPE) ##<--TRUE.

Run the code above in your browser using DataLab