Learn R Programming

RecordLinkage (version 0.4-12.4)

RLdata: Test data for Record Linkage

Description

The RLdata tables contain artificial personal data for the evaluation of Record Linkage procedures. Some records have been duplicated with randomly generated errors. RLdata500 contains fifty duplicates, RLdata10000 thousand duplicates.

Usage

RLdata500 
RLdata10000
identity.RLdata500 
identity.RLdata10000

Arguments

Format

RLdata500 and RLdata10000 are character matrices with 500 and 10000 records. Each row represents one record, with the following columns:

fname_c1

First name, first component

fname_c2

First name, second component

lname_c1

Last name, first component

lname_c2

Last name, second component

by

Year of birth

bm

Month of birth

bd

Day of birth

identity.RLdata500 and identity.RLdata10000 are integer vectors representing the true record ids of the two data sets. Two records are duplicates, if and only if their corresponding values in the identity vector agree.

Author

Andreas Borg, Murat Sariyar