Learn R Programming

synthpop (version 1.9-0)

synthpop-package: Generating synthetic versions of sensitive microdata for statistical disclosure control

Description

Generate synthetic versions of a data set using parametric or CART methods.

Arguments

Author

Beata Nowok, Gillian M Raab, and Chris Dibben

Details

Package:synthpop
Type:Package
Version:1.9-0
Date:2024-12-20
License:GPL-2 | GPL-3

Synthetic data are generated from the original (observed) data by the function syn. The package includes also tools to compare synthetic data with the observed data (compare.synds) and to fit (generalized) linear model to synthetic data (lm.synds, glm.synds) and compare the estimates with those for the observed data (compare.fit.synds). More extensive documentation on how to create synthetic data, with illustrative examples, is provided in the package vignette synthpop. Since that vignette was written more methods have been added to synthpop, including mthods for categorical variables based on log-linear models that can be made differentially private.

Now the package also includes functions to eavaluate the utility and disclosure risk of synthetic data. For details see the vignettes utility and disclosure. You can access all the vignettes via the index link at the bottom of this help page (synthpop-package)

References

Elliot, M. (2014) Final report on the disclosure risk associated with the synthetic data produced by the SYLLS team. Report 2015-2, Cathie Marsh Centre for Census and Survey Research (CCSR).

Nowok, B. Utility of synthetic microdata generated using tree-based methods (2015) Paper presented at the Privacy in Statistical Databases Conference 2016; Dubrovnik, Croatia, 14-16 September 2016 .

Nowok, B., Raab, G.M and Dibben, C. (2016). synthpop: Bespoke creation of synthetic data in R. Journal of Statistical Software, 74(11), 1-26. tools:::Rd_expr_doi("10.18637/jss.v074.i11").

Raab, G.M., Nowok, B., and Dibben, C. (2016) Practical data synthesis for large samples Journal of Privacy and Confidentiality, 7(3):67-97. tools:::Rd_expr_doi("10.29012/jpc.v7i3.407").

Raab, G.M., Nowok, B., and Dibben, C. (2016) Guidelines for producing useful synthetic data tools:::Rd_expr_doi("10.48550/arXiv.1712.04078") An earlier version was presented at the Privacy in Statistical Databases Conference 2016; Dubrovnik, Croatia, 14-16 September 2016

Nowok, B., Raab, G.M. and Dibben, C. (2017) Providing bespoke synthetic data for the UK Longitudinal Studies and other sensitive data with the synthpop package for R Statistical Journal of the IAOS, 33(3):785-796. tools:::Rd_expr_doi("10.3233/SJI-150153").

Raab, G.M., Nowok, B., and Dibben, C. (2021) Assessing, visualizing and improving the utility of synthetic data. Available attools:::Rd_expr_doi("10.48550/arXiv.2109.12717"). An earlier version was presented at the Joint UNECE/Eurostat expert meeting on statistical data confidentiality; Poznan, Poland, 1-3 December 2021.

Raab, G.M. (2022) Utility and Disclosure Risk for Differentially Private Synthetic Categorical Data, Chapter in Privacy in Statistical Databases 2022. Published in Springer Series Lecture notes in Computer Science. Also available at tools:::Rd_expr_doi("10.48550/arXiv.2206.01362").

Raab, G.M., Nowok, B., and Dibben, C. (2024) Practical privacy metrics for synthetic data, Vignette in synthpop package. Also available at tools:::Rd_expr_doi("10.48550/arXiv.2406.16826").

Raab, G.M. (2024) Privacy risk from synthetic data: practical proposals. Chapter in Privacy in Statistical databases 2024. published in Springer Series Lecture notes in Computer Science. Also available at tools:::Rd_expr_doi("10.48550/arXiv.2409.04257").