Learn R Programming

sdcMicro (version 5.6.1)

riskyCells: riskyCells

Description

Allows to compute risky (unweighted) combinations of key variables either up to a specified dimension or using identification level. This mimics the approach taken in mu-argus.

Usage

riskyCells(obj, useIdentificationLevel = FALSE, threshold, ...)

Value

a data.table showing the number of unsafe cells, thresholds for any combination of the key variables. If the input was a sdcMicroObj-class object and some modifications have been already applied to the categorical key variables, the resulting output contains the number of unsafe cells both for the original and the modified data.

Arguments

obj

a data.frame, data.table or an object of class sdcMicroObj-class

useIdentificationLevel

(logical) specifies if tabulation should be done up to a specific dimension (useIdentificationLevel=FALSE using argument maxDim) or taking identification levels (useIdentificationLevel=FALSE using argument level) into account.

threshold

a numeric vector specifiying the thresholds at which cells are considered to be unsafe. In case a tabulation is done up to a specific level (useIdentificationLevel=FALSE), the thresholds may be specified differently for each dimension. In the other case, the same threshold is used for all tables.

...

see possible arguments below

  • keyVars: index or variable-names within obj that should be used for tabulation. In case obj is of class sdcMicroObj-class, this argument is not used and the pre-defined key-variables are used.

  • level: in case useIdentificationLevel=TRUE, this numeric vector specifies the importance of the key variables. The construction of output tables follows the implementation in mu-argus, see e.g https://github.com/sdcTools/manuals/raw/master/mu-argus/MUmanual5.1.pdf. The length of this numeric vector must match the number of key variables.

  • maxDim: in case useIdentificationLevel=FALSE, this number specifies maximal number of variables to tablulate.

Author

Bernhard Meindl

Examples

Run this code
if (FALSE) {
## data.frame method / all combinations up to maxDim
riskyCells(testdata2, keyVars=c(1:5), threshold=c(50,25,10,5),
  useIdentificationLevel=FALSE, maxDim=4)
riskyCells(testdata2, keyVars=c(1:5), threshold=10,
  useIdentificationLevel=FALSE, maxDim=3)

## data.frame method / using identification levels
riskyCells(testdata2, keyVars=c(1:6), threshold=20,
  useIdentificationLevel=TRUE, level=c(1,1,2,3,3,5))
riskyCells(testdata2, keyVars=c(1,3,4,6), threshold=10,
  useIdentificationLevel=TRUE, level=c(1,2,2,4))

## sdcMicroObj-method / all combinations up to maxDim
testdata2[1:6] <- lapply(1:6, function(x) {
  testdata2[[x]] <- as.factor(testdata2[[x]])
})
sdc <- createSdcObj(testdata2,
  keyVars=c('urbrur','roof','walls','water','electcon','relat','sex'),
  numVars=c('expend','income','savings'), w='sampling_weight')

r0 <- riskyCells(sdc, useIdentificationLevel=FALSE, threshold=c(20,10,5), maxDim=3)
## in case key-variables have been modified, we get counts for original and modified data
sdc <- groupAndRename(sdc, var="roof", before=c("5","6","9"), after=c("5+"))
r1 <- riskyCells(sdc, useIdentificationLevel=FALSE, threshold=c(10,5,3), maxDim=3)

## sdcMicroObj-method / using identification levels
riskyCells(sdc, useIdentificationLevel=TRUE, threshold=10, level=c(c(1,1,3,4,5,5,5)))
}

Run the code above in your browser using DataLab