Learn R Programming

synthpop (version 1.9-0)

sdc: Tools for statistical disclosure control (sdc)

Description

Labeling, top and bottom coding, smoothing numeric data, and removing different types of unique records defined by keys from synthetic data. The function calls replicated.uniques to identify the rows to be excluded from the synthetic data set(s)

Usage

sdc(object, data,keys = NULL, prefix = NULL, suffix = NULL, label = NULL, 
rm.uniques.in.orig = FALSE, rm.replicated.uniques = FALSE, 
recode.vars = NULL, bottom.top.coding = NULL, 
 recode.exclude = NULL, smooth.vars = NULL)

Value

An object provided as an argument adjusted in accordance with the other parameters' values.

Arguments

object

an object of class synds, which stands for 'synthesised data set'. It is typically created by function syn() and it includes object$m synthesised data set(s).

data

the original (observed) data set.

keys

Variables to be used as quasi-identifiers to check for unique combinations. Passed to replicated.uniques to exclude rows in the synthetic data.

prefix

A character string to be added as a prefix to all variable names in the synthetic data set(s)

suffix

A character string to be added as a suffix to all variable names in the synthetic data set(s)

label

a single string with a label to be added to the synthetic data sets as a new variable to make it clear that the data are synthetic/fake.

rm.uniques.in.orig

a logical value indicating whether unique replicates of key variables that are present in the orginal data set should be removed from synthetic data set(s).

rm.replicated.uniques

a logical value indicating whether unique replicates of key variables that are also unique in the orginal data set should be removed.

recode.vars

a single string or a vector of strings with name(s) of variable(s) to be bottom- or/and top-coded.

bottom.top.coding

a list of two-element vectors specifing bottom and top codes for each variable in recode.vars. If there is no need for one of bottom or top coding NA should be used. If only one variable is to be recoded, codes can be given as a two-element vector.

recode.exclude

a list specifying for each variable in recode.vars values to be excluded from recoding, e.g. missing data codes. If all non-missing values should be considered for recoding NA should be used if missing values are present. If only one variable is to be recoded, code(s) can be given as a single number or a vector.

smooth.vars

a single string or a vector of strings with name(s) of numeric variable(s) to be smoothed (smooth.spline function is used).

See Also

replicated.uniques

Examples

Run this code
ods <- SD2011[1:1000,c("sex","age","region","edu","marital","income")]
s1 <- syn(ods, m = 2)
s1.sdc <- sdc(s1, ods, keys = c("sex","age","region"),suffix = "_synthetic",
label="false_data", rm.uniques.in.orig = TRUE,
recode.vars = c("age","income"),
bottom.top.coding = list(c(20,80),c(NA,2000)),
recode.exclude = list(NA,c(NA,-8)))
head(s1.sdc$syn[[2]])

Run the code above in your browser using DataLab