Please first read the documentation for Links97Pair. That dataset contains the same pairs/rows, but only a subset of the variables/columns.
For variables
that are measured separately for both subjects (eg, Gender), the subjects' variable name will have an _S1
or _S2
appended to it. For instance, the variables LastSurvey_S1
and LastSurvey_S2
correspond to the last surveys completed
by the pair's first and second subject, respectively. Similarly, the functions CreatePairLinksDoubleEntered()
and
CreatePairLinksSingleEntered()
by default append _S1
and _S2
. However this can be
modified using the 'subject1Qualifier' and 'subject2Qualifier' parameters.
A data frame with 11,075 observations on the following 22 variables. There is one row per unique pair of subjects, irrespective of order.
ExtendedID see the variable of the same name in Links97Pair
SubjectTag_S1 see the variable of the same name in Links97Pair
SubjectTag_S2 see the variable of the same name in Links97Pair
R see the variable of the same name in Links97Pair
RFull This is a superset of R
. This includes all the R values we estimated, while R
(i.e., the variable above) excludes values like R=0.
RelationshipPath see the variable of the same name in Links97Pair
EverSharedHouse Indicate if the pair likely live in the same house. This is TRUE
for all pairs in this NLSY97 dataset.
IsMz Indicates if the pair is from the same zygote (ie, they are identical twins/triplets). This variable is a factor, with levels No
=0, Yes
=1, DoNotKnow
=255.
LastSurvey_S1 The year of Subject1's most recently completed survey. This may be different that the survey's administration date.
LastSurvey_S2 The year of Subject2's most recently completed survey. This may be different that the survey's administration date.
RPass1 The pair's estimated R coefficient, using both implicit and explicit information. Interpolation was NOT used. The variable R
is identically constructed, but it did use interpolation.
SubjectID_S1 The ID value assigned by NLS to the first subject. For Gen1 Subjects, this is their "CaseID" (ie, R00001.00). For Gen2 subjects, this is their "CID" (ie, C00001.00).
SubjectID_S2 The ID value assigned by NLS to the second subject.
Will Beasley
Specifies the relatedness coefficient (ie, 'R') between subjects in the same extended family. Each row represents a unique relationship pair. An extended family with \(k\) subjects will have \(k\)(\(k\)-1)/2 rows. Typically, Subject1 is older while Subject2 is younger.
The specific steps to determine the R coefficient will be described in an upcoming publication. The following information may influence the decisions of an applied researcher.
Download CSV If you're using the NlsyLinks package in R, the dataset automatically is available. To use it in a different environment, download the csv, which is readable by all statistical software. links-metadata-2017-97.yml documents the dataset version information.
library(NlsyLinks) # Load the package into the current R session.
hist(Links97PairExpanded$R) # Declare a concise variable name.
# write.csv(
# Links97PairExpanded,
# file ='~/NlsyLinksStaging/Links97PairExpanded.csv',
# row.names = FALSE
# )
Run the code above in your browser using DataLab