multi.disclosure: Disclosure measures for multiple of target variables.

Description

Calculates, prints and plots tables of disclosure measures for a set of target variables from a fixed set of keys to form quasi-identifiers. The calculations of disclosure measures are done by the function disclosure for each target.

This function can be also used with synthetic data NOT created by syn(), or even made anonymous by other methods such as sampling More details of the measures calculated can be found in the package vignette "Disclosure measures for Synthetic Data".

Usage

# S3 method for synds
multi.disclosure(object, data, 
           keys , targets = NULL, print.flag = TRUE, 
           denom_lim = 5, exclude_ov_denom_lim = FALSE,
           not.targetslev = NULL,  
           usetargetsNA = TRUE,  usekeysNA = TRUE, 
           exclude.keys = NULL, exclude.keylevs = NULL,  exclude.targetlevs = NULL,
           ngroups_targets = NULL, ngroups_keys = NULL, 
           ident.meas = "repU", attrib.meas = "DiSCO",
           thresh_1way = c(50, 90),thresh_2way = c(4, 80), 
           digits = 2, plot = TRUE,  ...)
               
# S3 method for data.frame
multi.disclosure(object, data, cont.na = NULL, 
           keys , targets = NULL,  print.flag = TRUE, 
           denom_lim = 5, exclude_ov_denom_lim = FALSE,
           not.targetslev = NULL, 
           usetargetsNA = TRUE,  usekeysNA = TRUE, 
           exclude.keys = NULL, exclude.keylevs = NULL,  exclude.targetlevs = NULL,
           ngroups_targets = NULL, ngroups_keys = NULL, 
           ident.meas = "repU", attrib.meas = "DiSCO",
           thresh_1way = c(50, 90),thresh_2way = c(4, 80), 
           digits = 2, plot = TRUE,  compare.synorig = TRUE,  ...)
# S3 method for list
multi.disclosure(object, data, cont.na = NULL,
            keys , targets = NULL,  print.flag = TRUE, 
            denom_lim = 5, exclude_ov_denom_lim = FALSE,
           not.targetslev = NULL,  
           usetargetsNA = TRUE,  usekeysNA = TRUE, 
           exclude.keys = NULL, exclude.keylevs = NULL, exclude.targetlevs = NULL,
           ngroups_targets = NULL, ngroups_keys = NULL, 
           ident.meas = "repU", attrib.meas = "DiSCO",
           thresh_1way = c(50, 90),thresh_2way = c(4, 80), 
           digits = 2, plot = TRUE, compare.synorig = TRUE,...)

# S3 method for multi.disclosure
print(x, digits = NULL, plot = NULL, to.print =  c("ident","attrib"),
       ...)

Value

An object of class multi.disclosure which is a list with the following components:

attrib.table: a table with the selected attribute disclosure measure (attrib.meas) for synthetic data and corresponding measure for the original data "CAPd" if (attrib.meas) is "DCAP", and "DiO" for others.
attrib.plot: plot of attrib.table with labels indicating where large denominators suggest checking.
keys: see above.
ident.orig: value of identity disclosure UiO from the original data, see help file for disclosure.
ident.syn: value of identity disclosure ident.meas from the synthetic data, see help file for disclosure.
Norig: Number of records in data.
denom_lim: see above.
exclude_ov_denom_lim: see above.
digits: see above.
usetargetsNA: see above.
usekeysNA: see above.
ident.meas: see above.
attrib.meas: see above.
m: see above.
plot: see above.
output.list: A named list with a component for each target where each component is the output from the function disclosure for that target. This allows check_1way and check_2way to be examined for each target.
call: R call used to create the object

Arguments

object: an object of class synds, which stands for 'synthesised data set'. It is typically created by function syn() and it includes object$m synthesised data set(s) as object$syn. This a single data set when object$m = 1 or a list of length object$m when object$m > 1. Alternatively, when data are synthesised not using syn(), it can be a data frame with a synthetic data set or a list of data frames with synthetic data sets, all created from the same original data with the same variables and the same method.
data: the original (observed) data set.
cont.na: For data NOT supplied as a synthetic data object created by synthpop, this gives special values for continuous variables as described in the documentation for the function syn.
keys: a vector of strings with the names of variables to be used in combination to form a quasi identifier.
targets: a vector of strings with the names of variables to be used as targets for the disclosure measures. Defaults to all variables in both original and synthetic data that are not in keys.
denom_lim: an integer that determines the limit above which a warning to check the two way relationships for potential prior disclosure information.
exclude_ov_denom_lim: TRUE/FALSE according to whether disclosive groups with denominators > denom_lim should be excluded from disclosure measures.
not.targetslev: Vector of same length as targets giving level of each target to be excluded from calculating disclosure measures. Set elements for unaffected targets as blanks.
print.flag: TRUE/FALSE to print out line as disclosure for each member of targets is calculated.
usetargetsNA: A logical vector of the same length as targets that determines if NA values of each are to be considered disclosive. Defaults to FAULT for all.
usekeysNA: A logical vector of the same length as keys that determines if NA values of each key are to be considered disclosive. Defaults to FAULT for all keys.
exclude.keys: A list of same length as targets giving the keys for two way exclusions for the ith target. For details see documentation in disclosure
exclude.keylevs: A list of same length as targets giving the levels of keys for two way exclusions for the ith target. For details see documentation in disclosure
exclude.targetlevs: A list of same length as targets giving the levels of target for two way exclusions for the ith target. For details see documentation in disclosure
ngroups_targets: Unless set to NULL (the default) numeric target variables will be grouped into ngroups_target categories. If ngroups_keys is of length 1 all numeric targets will be have the same number of groups. Otherwise ngroups_targets needs to be a vector of the same length as targets and will give the number of groups for each target. If an element of ngroups_targets is zero, no grouping will be done.
ngroups_keys: Unless set to NULL (the default) any numeric variable will be grouped into categories If ngroups_keys is of length 1 all numeric keys will be have the same number of groups. Otherwise ngroups_keys needs to be the same length as keys and will give the number of groups for each key. If an element of ngroups_keys is zero, no grouping will be done.
ident.meas: Choice of statistics to use as a measure of identity disclosure. Must be a selection from: "repU" or "UiSiO". See disclosure for explanations of measures.
attrib.meas: Choice of statistics to use as a measure of attribute disclosure. Must be a selection from: "DiSCO" or "DiSDiO". See disclosure for explanations of measures.
thresh_1way: A vector of two numeric values both of which meed to be exceeded for warnings about a level of the target that may be dominating the results. The first is the count of all disclosive records, and the second is the % of all records for this level of the target. Default is c(50, 90), meaning a group of 50 disclosive records for this level of the target where they make up over 90% of all disclosive records.
thresh_2way: A vector of two numeric values both of which meed to be exceeded for warnings about a level of the target that may be dominating the results. The first is the count of all disclosive records for this key-target combination and the second is the percantage of all disclosive records for this combination. Default is c(5, 80), meaning a group of more than 5 records where over 80% of all the original values with this key have this level of the target.
digits: number of digits to print for the disclosure measures.
plot: determines if plot will be produced when the result is printed.
print: logical value that determines if a summary of results is to be printed.
compare.synorig: a logical value to determine if the functions synorig.compare() should be used to check that data sets can be compared. Default set to FALSE except when the synthetic data are supplied as a data.frame or a list when set to TRUE.
to.print: Vector of items to be printed including "ident", "attrib", both or NULL
...: additional parameters
x: an object of class multi.disclosure.

Details

Calculates measures of identity and attribution disclosure from the keys specified in keys with the function disclosure. For attribute disclosure a table with one line for each target can be printed or plotted. Details are in help file for disclosure.

References

to follow link to vignette

Examples

Run this code

ods <- SD2011[, c("sex", "age", "edu", "marital", "region", "income")]
s1 <- syn(ods)

### synthetic data provided as a 'data.frame' object
t1 <- multi.disclosure(s1$syn, ods,
keys = c("sex", "age", "edu"))

### synthetic data provided as a 'synds' object  
t1 <- multi.disclosure(s1, ods, 
keys = c("sex", "age", "edu"))

Run the code above in your browser using DataLab