Combines the results of models fitted to each of the m
synthetic data sets.
# S3 method for fit.synds
summary(object, population.inference = FALSE, msel = NULL,
real.varcov = NULL, incomplete = NULL, ...)# S3 method for summary.fit.synds
print(x, ...)
An object of class summary.fit.synds
which is a list with the
following components:
the original call to glm.synds
or lm.synds
.
a logical value indicating whether synthetic data were generated using proper synthesis.
a logical value indicating whether inference is made to population coefficients or to the results that would be expected from an analysis of the original data (see above).
a logical value indicating whether the dependent variable
in the model was not synthesised. It is derived in the synthpop
implementation of the fitting functions (lm.synds
,
glm.synds
, multinom.synds
and
polr.synds
) and saved with the fitted object. When
TRUE
inference with population.inference = TRUE
uses the
method proposed by Reiter (2003) for what he terms partially synthetic
data. This method requires multiple syntheses (m > 1
).
If m = 1
, incomplete = TRUE
and population.inference = TRUE
the results will be still calculated and returned with warning. This will
usually give standard errors that are larger than they should be. This
method can be forced by setting incomplete = TRUE
as a parameter
because it can also be used for complete synthesis.
function used to fit the model.
the number of synthetic versions of the original (observed) data.
a matrix with combined estimates. If inference is
required to the results that would be obtained from an analysis of the
original data, (population.inference = FALSE
) the coefficients are
given by xpct(Beta)
, the standard errors by xpct(se.Beta)
and
the corresponding Z-statistic by xpct(Z)
. If the synthetic data are
to be used to make inferences to population quantities
(population.inference = TRUE
), the coefficients are given by
Beta.syn
, their standard errors by se.Beta.syn
and the
Z-statistic by Z.syn
(see vignette on inference for more details).
a number of cases in the original data.
the number of cases in the synthesised data. Note that if k
and n
are not equal and population.inference = FALSE
(the default), then the standard errors produced will estimate what would
be expected by an analysis of the original data set of size n
.
summary.glm
or summary.lm
object respectively
or a list of m
such objects.
index or indices of synthetic data copies for which summaries
of fitted models are produced. If NULL
only a summary of combined
estimates is produced.
an object of class fit.synds
created by fitting a model to
synthesised data set using function glm.synds
,
lm.synds
,multinom.synds
or polr.synds
.
a logical value indicating whether inference
should be made to population quantities. If FALSE
inference is made
to the results that would be expected from an analysis of the original data.
This option should be selected if the synthetic data are being used for
exploratory analysis, but the final published results will be obtained by
running code on the original confidential data. If population.inference = TRUE
results would allow population inference to be made from the synthetic data.
In both cases the inference will depend on the synthesising model being
correct, but this can be checked by running the same analysis on the real
data, see compare.fit.synds
.
index or indices of the synthetic datasets (1
, ...
,
m
), for which summaries of fitted models are to be produced.
If NULL
(default) only the summary of combined estimates is produced.
the estimated variance-covariance matrix of the fit of the
model to the original data. This parameter is used in the function
compare.fit.synds
which has the original data as one of its parameters.
Logical variable as to whether population inference for
incomplete synthesis is to be used. If this is left at a NULL
value
it will be determined by whether the dependent variable has been synthesised.
See also below as output.
additional parameters.
an object of class summary.fit.synds
.
The mean of the estimates from each of the m synthetic data sets yields asymptotically unbiased estimates of the coefficients if the observed data conform to the distribution used for synthesis. The standard errors are estimated differently depending whether inference is made for the results that we would expect to obtain from the observed data or for the parameters of the population that we assume the observed data are sampled from. The standard errors also differ according to whether synthetic data were produced using simple or proper synthesis (for details see Raab et al. (2017)).
Nowok, B., Raab, G.M and Dibben, C. (2016). synthpop: Bespoke creation of synthetic data in R. Journal of Statistical Software, 74(11), 1-26. tools:::Rd_expr_doi("10.18637/jss.v074.i11").
Raab, G.M., Nowok, B. and Dibben, C. (2017). Practical data synthesis for large samples. Journal of Privacy and Confidentiality, 7(3), 67-97. Available at: https://journalprivacyconfidentiality.org/index.php/jpc/article/view/407
Reiter, J.P. (2003) Inference for partially synthetic, public use microdata sets. Survey Methodology, 29, 181-188.
ods <- SD2011[1:1000,c("sex","age","edu","ls","smoke")]
### simple synthesis
s1 <- syn(ods, m = 5)
f1 <- glm.synds(smoke ~ sex + age + edu + ls, data = s1, family = "binomial")
summary(f1)
summary(f1, population.inference = TRUE)
### proper synthesis
s2 <- syn(ods, m = 5, method = "parametric", proper = TRUE)
f2 <- glm.synds(smoke ~ sex + age + edu + ls, data = s2, family = "binomial")
summary(f2)
summary(f2, population.inference = TRUE)
Run the code above in your browser using DataLab