Generate synthetic versions of a data set using parametric or CART methods.
Beata Nowok, Gillian M Raab, and Chris Dibben
Package: | synthpop |
Type: | Package |
Version: | 1.9-0 |
Date: | 2024-12-20 |
License: | GPL-2 | GPL-3 |
Synthetic data are generated from the original (observed) data by the function
syn
. The package includes also tools to compare synthetic data with the
observed data (compare.synds
) and to fit (generalized) linear model to
synthetic data (lm.synds
, glm.synds
) and compare the estimates
with those for the observed data (compare.fit.synds
). More extensive
documentation on how to create synthetic data, with illustrative examples, is provided
in the package vignette synthpop. Since that vignette was written more
methods have been added to synthpop, including mthods for categorical variables
based on log-linear models that can be made differentially private.
Now the package also includes functions to eavaluate the utility and
disclosure risk of synthetic data. For details see the vignettes
utility and disclosure. You can access all the vignettes via the index link at the bottom of
this help page (synthpop-package
)
Elliot, M. (2014) Final report on the disclosure risk associated with the synthetic data produced by the SYLLS team. Report 2015-2, Cathie Marsh Centre for Census and Survey Research (CCSR).
Nowok, B. Utility of synthetic microdata generated using tree-based methods (2015) Paper presented at the Privacy in Statistical Databases Conference 2016; Dubrovnik, Croatia, 14-16 September 2016 .
Nowok, B., Raab, G.M and Dibben, C. (2016). synthpop: Bespoke creation of synthetic data in R. Journal of Statistical Software, 74(11), 1-26. tools:::Rd_expr_doi("10.18637/jss.v074.i11").
Raab, G.M., Nowok, B., and Dibben, C. (2016) Practical data synthesis for large samples Journal of Privacy and Confidentiality, 7(3):67-97. tools:::Rd_expr_doi("10.29012/jpc.v7i3.407").
Raab, G.M., Nowok, B., and Dibben, C. (2016) Guidelines for producing useful synthetic data tools:::Rd_expr_doi("10.48550/arXiv.1712.04078") An earlier version was presented at the Privacy in Statistical Databases Conference 2016; Dubrovnik, Croatia, 14-16 September 2016
Nowok, B., Raab, G.M. and Dibben, C. (2017) Providing bespoke synthetic data for the UK Longitudinal Studies and other sensitive data with the synthpop package for R Statistical Journal of the IAOS, 33(3):785-796. tools:::Rd_expr_doi("10.3233/SJI-150153").
Raab, G.M., Nowok, B., and Dibben, C. (2021) Assessing, visualizing and improving the utility of synthetic data. Available attools:::Rd_expr_doi("10.48550/arXiv.2109.12717"). An earlier version was presented at the Joint UNECE/Eurostat expert meeting on statistical data confidentiality; Poznan, Poland, 1-3 December 2021.
Raab, G.M. (2022) Utility and Disclosure Risk for Differentially Private Synthetic Categorical Data, Chapter in Privacy in Statistical Databases 2022. Published in Springer Series Lecture notes in Computer Science. Also available at tools:::Rd_expr_doi("10.48550/arXiv.2206.01362").
Raab, G.M., Nowok, B., and Dibben, C. (2024) Practical privacy metrics for synthetic data, Vignette in synthpop package. Also available at tools:::Rd_expr_doi("10.48550/arXiv.2406.16826").
Raab, G.M. (2024) Privacy risk from synthetic data: practical proposals. Chapter in Privacy in Statistical databases 2024. published in Springer Series Lecture notes in Computer Science. Also available at tools:::Rd_expr_doi("10.48550/arXiv.2409.04257").