Calculates, prints and plots tables of disclosure measures for a set of
target variables from a fixed set of keys to form quasi-identifiers.
The calculations of disclosure measures are done by the function
disclosure
for each target.
This function can be also used with synthetic data NOT created by
syn()
, or even made anonymous by other methods such as sampling
More details of the measures calculated can be found in the package vignette
"Disclosure measures for Synthetic Data".
# S3 method for synds
multi.disclosure(object, data,
keys , targets = NULL, print.flag = TRUE,
denom_lim = 5, exclude_ov_denom_lim = FALSE,
not.targetslev = NULL,
usetargetsNA = TRUE, usekeysNA = TRUE,
exclude.keys = NULL, exclude.keylevs = NULL, exclude.targetlevs = NULL,
ngroups_targets = NULL, ngroups_keys = NULL,
ident.meas = "repU", attrib.meas = "DiSCO",
thresh_1way = c(50, 90),thresh_2way = c(4, 80),
digits = 2, plot = TRUE, ...)
# S3 method for data.frame
multi.disclosure(object, data, cont.na = NULL,
keys , targets = NULL, print.flag = TRUE,
denom_lim = 5, exclude_ov_denom_lim = FALSE,
not.targetslev = NULL,
usetargetsNA = TRUE, usekeysNA = TRUE,
exclude.keys = NULL, exclude.keylevs = NULL, exclude.targetlevs = NULL,
ngroups_targets = NULL, ngroups_keys = NULL,
ident.meas = "repU", attrib.meas = "DiSCO",
thresh_1way = c(50, 90),thresh_2way = c(4, 80),
digits = 2, plot = TRUE, compare.synorig = TRUE, ...)
# S3 method for list
multi.disclosure(object, data, cont.na = NULL,
keys , targets = NULL, print.flag = TRUE,
denom_lim = 5, exclude_ov_denom_lim = FALSE,
not.targetslev = NULL,
usetargetsNA = TRUE, usekeysNA = TRUE,
exclude.keys = NULL, exclude.keylevs = NULL, exclude.targetlevs = NULL,
ngroups_targets = NULL, ngroups_keys = NULL,
ident.meas = "repU", attrib.meas = "DiSCO",
thresh_1way = c(50, 90),thresh_2way = c(4, 80),
digits = 2, plot = TRUE, compare.synorig = TRUE,...)
# S3 method for multi.disclosure
print(x, digits = NULL, plot = NULL, to.print = c("ident","attrib"),
...)
An object of class multi.disclosure
which is a list with the following
components:
a table with the selected attribute disclosure measure
(attrib.meas
) for synthetic data and corresponding measure for the
original data "CAPd" if (attrib.meas
) is "DCAP", and "DiO" for others.
plot of attrib.table with labels indicating where large denominators suggest checking.
see above.
value of identity disclosure UiO
from the original data, see
help file for disclosure
.
value of identity disclosure ident.meas
from the synthetic
data, see help file for disclosure
.
Number of records in data.
see above.
see above.
see above.
see above.
see above.
see above.
see above.
see above.
see above.
A named list with a component for each target
where each component is the output from the function
disclosure
for that target. This
allows check_1way and check_2way to be examined for each target.
R call used to create the object
an object of class synds
, which stands for 'synthesised
data set'. It is typically created by function syn()
and it includes
object$m
synthesised data set(s) as object$syn
. This a single
data set when object$m = 1
or a list of length object$m
when
object$m > 1
. Alternatively, when data are synthesised not using
syn()
, it can be a data frame with a synthetic data set or a list
of data frames with synthetic data sets, all created from the same original
data with the same variables and the same method.
the original (observed) data set.
For data NOT supplied as a synthetic data object created by
synthpop
, this gives special values for continuous variables as
described in the documentation for the function syn
.
a vector of strings with the names of variables to be used in combination to form a quasi identifier.
a vector of strings with the names of variables to be used as
targets for the disclosure measures. Defaults to all variables in both original
and synthetic data that are not in keys
.
an integer that determines the limit above which a warning to check the two way relationships for potential prior disclosure information.
TRUE/FALSE according to whether disclosive groups with denominators > denom_lim should be excluded from disclosure measures.
Vector of same length as targets giving level of each target to be excluded from calculating disclosure measures. Set elements for unaffected targets as blanks.
TRUE/FALSE to print out line as disclosure for each member of targets is calculated.
A logical vector of the same length as targets
that
determines if NA
values of each are to be
considered disclosive. Defaults to FAULT
for all.
A logical vector of the same length as keys
that
determines if NA
values of each key are to be
considered disclosive. Defaults to FAULT
for all keys.
A list of same length as targets
giving the keys
for two way exclusions for the ith target. For details see documentation
in disclosure
A list of same length as targets
giving the levels of keys
for two way exclusions for the ith target. For details see documentation
in disclosure
A list of same length as targets
giving the levels of
target for two way exclusions for the ith target. For details see documentation
in disclosure
Unless set to NULL (the default) numeric target variables
will be grouped into ngroups_target
categories. If ngroups_keys
is of length 1 all numeric targets will be have the same number of groups.
Otherwise ngroups_targets
needs to be a vector of the same length as
targets and will give the number of groups for each target.
If an element of ngroups_targets
is zero, no grouping will be done.
Unless set to NULL (the default) any numeric variable
will be grouped into categories If ngroups_keys
is of length 1 all numeric
keys will be have the same number of groups. Otherwise ngroups_keys
needs to be the same length as keys and will give the number of groups for each
key. If an element of ngroups_keys
is zero, no grouping will be done.
Choice of statistics to use as a measure of identity disclosure.
Must be a selection from: "repU"
or "UiSiO"
. See
disclosure
for explanations of measures.
Choice of statistics to use as a measure of attribute disclosure.
Must be a selection from: "DiSCO"
or "DiSDiO"
. See
disclosure
for explanations of measures.
A vector of two numeric values both of which meed to be exceeded for warnings about a level of the target that may be dominating the results. The first is the count of all disclosive records, and the second is the % of all records for this level of the target. Default is c(50, 90), meaning a group of 50 disclosive records for this level of the target where they make up over 90% of all disclosive records.
A vector of two numeric values both of which meed to be exceeded for warnings about a level of the target that may be dominating the results. The first is the count of all disclosive records for this key-target combination and the second is the percantage of all disclosive records for this combination. Default is c(5, 80), meaning a group of more than 5 records where over 80% of all the original values with this key have this level of the target.
number of digits to print for the disclosure measures.
determines if plot will be produced when the result is printed.
logical value that determines if a summary of results is to be printed.
a logical value to determine if the functions
synorig.compare()
should be used to check that data sets can be
compared. Default set to FALSE
except when the synthetic data are supplied as a data.frame or a list when set to TRUE.
Vector of items to be printed including "ident", "attrib", both or NULL
additional parameters
an object of class multi.disclosure
.
Calculates measures of identity and attribution disclosure from the keys
specified in keys
with the function disclosure
. For attribute
disclosure a table with one line for each target can be printed or plotted.
Details are in help file for disclosure
.
to follow link to vignette
disclosure
ods <- SD2011[, c("sex", "age", "edu", "marital", "region", "income")]
s1 <- syn(ods)
### synthetic data provided as a 'data.frame' object
t1 <- multi.disclosure(s1$syn, ods,
keys = c("sex", "age", "edu"))
### synthetic data provided as a 'synds' object
t1 <- multi.disclosure(s1, ods,
keys = c("sex", "age", "edu"))
Run the code above in your browser using DataLab