Learn R Programming

sdcMicro (version 5.6.1)

localSuppression: Local Suppression to obtain k-anonymity

Description

Algorithm to achieve k-anonymity by performing local suppression.

Usage

localSuppression(obj, k = 2, importance = NULL, combs = NULL, ...)

kAnon(obj, k = 2, importance = NULL, combs = NULL, ...)

Value

Manipulated data set with suppressions that has k-anonymity with respect to specified key-variables or the manipulated data stored in the sdcMicroObj-class.

Arguments

obj

a sdcMicroObj-class-object or a data.frame

k

threshold for k-anonymity

importance

numeric vector of numbers between 1 and n (n=length of vector keyVars). This vector represents the "importance" of variables that should be used for local suppression in order to obtain k-anonymity. key-variables with importance=1 will - if possible - not suppressed, key-variables with importance=n will be used whenever possible.

combs

numeric vector. if specified, the algorithm will provide k-anonymity for each combination of n key variables (with n being the value of the ith element of this parameter. For example, if combs=c(4,3), the algorithm will provide k-anonymity to all combinations of 4 key variables and then k-anonymity to all combinations of 3 key variables. It is possible to apply different k to these subsets by specifying k as a vector. If k has only one element, the same value of k will be used for all subgroups.

...

see arguments below

  • keyVars: names (or indices) of categorical key variables (for data-frame method)

  • strataVars: name (or index) of variable which is used for stratification purposes, used in the data.frame method. This means that k-anonymity is provided within each category of the specified variable.

  • alpha: numeric value between 0 and 1 specifying how much keys that contain missing values (NAs) should contribute to the calculation of fk and Fk. For the default value of 1, nothing changes with respect to the implementation in prior versions. Each wildcard-match would be counted while for alpha=0 keys with missing values would be basically ignored. Used in the data-frame method only because in the method for sdcMicroObj-class-objects, this value is extracted from slot options.

Author

Bernhard Meindl, Matthias Templ

Details

The algorithm provides a k-anonymized data set by suppressing values in key variables. The algorithm tries to find an optimal solution to suppress as few values as possible and considers the specified importance vector. If not specified, the importance vector is constructed in a way such that key variables with a high number of characteristics are considered less important than key variables with a low number of characteristics.

The implementation provides k-anonymity per strata, if slot 'strataVar' has been set in sdcMicroObj-class or if parameter 'strataVar' is used when appying the data.frame method. For details, have a look at the examples provided.

References

Templ, M. Statistical Disclosure Control for Microdata: Methods and Applications in R. Springer International Publishing, 287 pages, 2017. ISBN 978-3-319-50272-4. tools:::Rd_expr_doi("10.1007/978-3-319-50272-4")

Templ, M. and Kowarik, A. and Meindl, B. Statistical Disclosure Control for Micro-Data Using the R Package sdcMicro. Journal of Statistical Software, 67 (4), 1--36, 2015. tools:::Rd_expr_doi("10.18637/jss.v067.i04")

Examples

Run this code
data(francdat)
## Local Suppression
localS <- localSuppression(francdat, keyVar=c(4,5,6))
localS
plot(localS)
if (FALSE) {
## for objects of class sdcMicro, no stratification
data(testdata2)
sdc <- createSdcObj(testdata2,
  keyVars=c('urbrur','roof','walls','water','electcon','relat','sex'),
  numVars=c('expend','income','savings'), w='sampling_weight')
sdc <- localSuppression(sdc)

## for objects of class sdcMicro, with stratification
testdata2$ageG <- cut(testdata2$age, 5, labels=paste0("AG",1:5))
sdc <- createSdcObj(testdata2,
  keyVars=c('urbrur','roof','walls','water','electcon','relat','sex'),
  numVars=c('expend','income','savings'), w='sampling_weight',
  strataVar='ageG')
sdc <- localSuppression(sdc)

## it is also possible to provide k-anonymity for subsets of key-variables
## with different parameter k!
## in this case we want to provide 10-anonymity for all combinations
## of 5 key variables, 20-anonymity for all combinations with 4 key variables
## and 30-anonymity for all combinations of 3 key variables.
sdc <- createSdcObj(testdata2,
  keyVars=c('urbrur','roof','walls','water','electcon','relat','sex'),
  numVars=c('expend','income','savings'), w='sampling_weight')
combs <- 5:3
k <- c(10,20,30)
sdc <- localSuppression(sdc, k=k, combs=combs)

## data.frame method (no stratification)
keyVars <- c("urbrur","roof","walls","water","electcon","relat","sex")
strataVars <- c("ageG")
inp <- testdata2[,c(keyVars, strataVars)]
ls <- localSuppression(inp, keyVars=1:7)
print(ls)
plot(ls)

## data.frame method (with stratification)
ls <- kAnon(inp, keyVars=1:7, strataVars=8)
print(ls)
plot(ls, showTotalSupps=TRUE)
}

Run the code above in your browser using DataLab