The RLdata
tables contain artificial personal data for the
evaluation of Record Linkage procedures. Some records have been duplicated
with randomly generated errors. RLdata500
contains fifty duplicates,
RLdata10000
thousand duplicates.
RLdata500
RLdata10000
identity.RLdata500
identity.RLdata10000
RLdata500
and RLdata10000
are character matrices with
500 and 10000 records. Each row represents one record, with the following
columns:
First name, first component
First name, second component
Last name, first component
Last name, second component
Year of birth
Month of birth
Day of birth
identity.RLdata500
and identity.RLdata10000
are integer vectors
representing the true record
ids of the two data sets. Two records are duplicates, if and only if their
corresponding values in the identity vector agree.
Andreas Borg, Murat Sariyar