Learn R Programming

sdcMicro (version 5.6.1)

dRiskRMD: RMD based disclosure risk

Description

Distance-based disclosure risk estimation via robust Mahalanobis Distances.

Usage

dRiskRMD(obj, ...)

Value

The disclosure risk or the modified sdcMicroObj-class

risk1

percentage of sensitive observations according to method RMDID1.

risk2

standardized version of risk1

wrisk1

amount of sensitive observations according to RMDID1 weighted by their corresponding robust Mahalanobis distances.

wrisk2

RMDID2 measure

indexRisk1

index of observations with high risk according to risk1 measure

indexRisk2

index of observations with high risk according to wrisk2 measure

Arguments

obj

an sdcMicroObj-class-object or a data.frame

...

see possible arguments below

  • xm masked data

  • kweight for adjusting the influence of the robust Mahalanobis distances, i.e. to increase or decrease each of the disclosure risk intervals.

  • k2parameter for method RMDID2 to choose a small interval around each masked observation.

Author

Matthias Templ

Details

This method is an extension of method SDID because it accounts for the “outlyingness” of each observations. This is a quite natural approach since outliers do have a higher risk of re-identification and therefore these outliers should have larger disclosure risk intervals as observations in the center of the data cloud.

The algorithm works as follows:

1. Robust Mahalanobis distances are estimated in order to get a robust multivariate distance for each observation.

2. Intervals are estimated for each observation around every data point of the original data points where the length of the interval is defined/weighted by the squared robust Mahalanobis distance and the parameter $k$. The higher the RMD of an observation the larger the interval.

3. Check if the corresponding masked values fall into the intervals around the original values or not. If the value of the corresponding observation is within such an interval the whole observation is considered unsafe. So, we get a whole vector indicating which observation is save or not, and we are finished already when using method RMDID1).

4. For method RMDID1w: we return the weighted (via RMD) vector of disclosure risk.

5. For method RMDID2: whenever an observation is considered unsafe it is checked if $m$ other observations from the masked data are very close (defined by a parameter $k2$ for the length of the intervals as for SDID or RSDID) to such an unsafe observation from the masked data, using Euclidean distances. If more than $m$ points are in such a small interval, we conclude that this observation is ``save''.

References

Templ, M. and Meindl, B., Robust Statistics Meets SDC: New Disclosure Risk Measures for Continuous Microdata Masking, Lecture Notes in Computer Science, Privacy in Statistical Databases, vol. 5262, pp. 113-126, 2008.

Templ, M. New Developments in Statistical Disclosure Control and Imputation: Robust Statistics Applied to Official Statistics, Suedwestdeutscher Verlag fuer Hochschulschriften, 2009, ISBN: 3838108280, 264 pages.

See Also

dRisk

Examples

Run this code
data(Tarragona)
x <- Tarragona[, 5:7]
y <- addNoise(x)$xm
dRiskRMD(x, xm=y)
dRisk(x, xm=y)

data(testdata2)
sdc <- createSdcObj(testdata2,
  keyVars=c('urbrur','roof','walls','water','electcon','relat','sex'),
  numVars=c('expend','income','savings'), w='sampling_weight')
## this is already made internally:
## sdc <- dRiskRMD(sdc)
## and already stored in sdc

Run the code above in your browser using DataLab