Learn R Programming

mStats (version 3.2.2)

duplicates: Report and tag duplicated observations

Description

duplicates() reports duplications and creates indexes.

Usage

duplicates(data, ..., drop = FALSE)

Arguments

data

Dataset

...

Variables to find duplications.

drop

if TRUE, delete all duplicated records, keeping all unique. If not specified, all variables are used.

Value

Modified dataset with additional variable dup

Details

Specified variables are used to search for duplications. If not specified, all variables are used.

Then they are pasted as a character vector for speedy operation, extract duplication data and make a report.

The return dataset is added a new variable called dup for further use.

ANNOTATIONS:

Copies - nth Copies

Observations - Number of corresponding observations

Surplus - Number of surplus observations

dup - indicates copies within the dataset:

0 = unique observations

2 = duplicated two times

3 = duplicated three times and so on ...

Examples

Run this code
# NOT RUN {
## use infert data
data(infert)
codebook(infert)

## find duplicates by pooled.stratum
duplicates(infert, pooled.stratum)

## find duplicates by stratum and pooled.stratum
duplicates(infert, stratum, pooled.stratum)

## find and remove duplicates by pooled.stratum
duplicates(infert, pooled.stratum, drop = TRUE)

# }

Run the code above in your browser using DataLab