Calculates and plots tables of utility measures. The calculations of
utility measures are done by the function utility.tab
.
Options are all one-way tables, all two-way tables or three-way tables
for a specified third variable along with pairs of all other variables.
This function can be also used with synthetic data NOT created by
syn()
, but then an additional parameters not.synthesised
and cont.na
might need to be provided.
# S3 method for synds
utility.tables(object, data,
tables = "twoway", maxtables = 5e4,
vars = NULL, third.var = NULL,
useNA = TRUE, ngroups = 5,
tab.stats = c("pMSE", "S_pMSE", "df"),
plot.stat = "S_pMSE", plot = TRUE, max.table = 1e07,
print.tabs = FALSE, digits.tabs = 4,
max.scale = NULL, min.scale = 0, plot.title = NULL,
nworst = 5, ntabstoprint = 0, k.syn = FALSE,
low = "grey92", high = "#E41A1C",
n.breaks = NULL, breaks = NULL, print.flag = TRUE, ...)
# S3 method for data.frame
utility.tables(object, data,
cont.na = NULL, not.synthesised = NULL,
tables = "twoway", maxtables = 5e4,
vars = NULL, third.var = NULL,
useNA = TRUE, ngroups = 5,
tab.stats = c("pMSE", "S_pMSE", "df"),
plot.stat = "S_pMSE", plot = TRUE, max.table = 1e07,
print.tabs = FALSE, digits.tabs = 4,
max.scale = NULL, min.scale = 0, plot.title = NULL,
nworst = 5, ntabstoprint = 0, k.syn = FALSE,
low = "grey92", high = "#E41A1C",
n.breaks = NULL, breaks = NULL,
compare.synorig = TRUE, print.flag = TRUE,...)# S3 method for list
utility.tables(object, data,
cont.na = NULL, not.synthesised = NULL,
tables = "twoway", maxtables = 5e4,
vars = NULL, third.var = NULL,
useNA = TRUE, ngroups = 5,
tab.stats = c("pMSE", "S_pMSE", "df"),
plot.stat = "S_pMSE", plot = TRUE, max.table = 1e07,
print.tabs = FALSE, digits.tabs = 4,
max.scale = NULL, min.scale = 0, plot.title = NULL,
nworst = 5, ntabstoprint = 0, k.syn = FALSE,
low = "grey92", high = "#E41A1C",
n.breaks = NULL, breaks = NULL,
compare.synorig = TRUE, print.flag = TRUE,...)
# S3 method for utility.tables
print(x, print.tabs = NULL, digits.tabs = NULL,
plot = NULL, plot.title = NULL, max.scale = NULL, min.scale = NULL,
nworst = NULL, ntabstoprint = NULL, ...)
An object of class utility.tab
which is a list with the following
components:
a table with all the selected measures for all combinations of
variables defined by tables
, third.var
, and vars
.
measure used in mat
and toplot
.
see above.
see above.
plot of the selected utility measure.
an average of utility scores for all combinations with other variables.
see above.
see above.
see above.
see above.
see above.
see above.
see above.
see above.
variable combinations with nworst
worst utility scores.
observed and synthetic cross-tabulations for worstn
.
an object of class synds
, which stands for 'synthesised
data set'. It is typically created by function syn()
and it includes
object$m
synthesised data set(s) as object$syn
. This a single
data set when object$m = 1
or a list of length object$m
when
object$m > 1
. Alternatively, when data are synthesised not using
syn()
, it can be a data frame with a synthetic data set or a list
of data frames with synthetic data sets, all created from the same original
data with the same variables and the same method.
the original (observed) data set.
a named list of codes for missing values for continuous
variables if different from the R
missing data code NA
.
The names of the list elements must correspond to the variables names for
which the missing data codes need to be specified.
a vector of variable names for any variables that has been left unchanged in the synthetic data.
defines the type of tables to produce. Options are
"oneway"
, "twoway"
(default) or "threeway"
.
If set to "oneway"
or "twoway"
all possible tables from
vars
are produced. For "threeway"
, third.var
may be
specified and all three-way tables between this variable and other pairs of
variables are produced. If a third variable is not specified the function
chooses the variable with the largest median utility measure for all three-way
tables it contributes to.
maximum number of tables that will be produced. If number of
tables is larger, then utility is only measured for a sample of size
maxtables
. You cannot produce plots of twoway or three way tables from
sampled tables
.
a vector of strings with the names of variables to be used to form the table, or a vector of variable numbers in the original data. Defaults to all variables in both original and synthetic data.
when tables
is "threeway"
a variable
to make the third variable with all other pairs
determines if NA
values are to be included in tables. Only
applies for method "tab"
.
if numerical (non-factor) variables included with
method = "tab"
will be classified into this number of groups to form
tables. Classification is performed using classIntervals()
function
for n = ngroups
. By default, style = "quantile"
, to get
appropriate groups for skewed data. Problems for variables with a small
number of unique values are handled by selecting only unique values of
breaks. Arguments of classIntervals()
may be, however, specified
in the call to utility.tables()
.
statistics to include in the table of results. Must be
a selection from: "VW"
, "FT"
,"JSD"
, "SPECKS"
,
"WMabsDD"
, "U"
, "G"
, "pMSE"
, "PO50"
,
"MabsDD"
, "dBhatt"
, "S_VW"
, "S_FT"
,
"S_JSD"
, "S_WMabsDD"
, "S_G"
, "S_pMSE"
,
"df"
, dfG
. If tab.stats = "all"
, all of these will
be included. See utility.tab
for explanations of measures.
statistics to plot. Choice is "VW"
, "FT"
,
"JSD"
, "SPECKS"
, "WMabsDD"
, "U"
, "G"
,
"pMSE"
, "PO50"
, "MabsDD"
, "dBhatt"
,
"S_VW"
, "S_FT"
, "S_JSD"
, "S_WMabsDD"
,
"S_G"
, "S_pMSE"
. See utility.tab
for
explanations of measures.
determines if plot will be produced when the result is printed.
Value of maximum number of cells allowed in a table by the function utility.tab
logical value that determines if table of results is to be printed.
number of digits to print for table, except for p-values that are always printed to 4 places.
a numeric value for the maximum value used in calculating
the shading of the plots. If it is NULL
then the maximum value
will be replaced by the maximum value in the data.
a numeric value for the minimum value used in calculating
the shading of the plots. If it is NULL
then the minimum value
will be replaced by zero.
title for the plot.
a number of variable combinations with worst utility scores to be printed.
a number of tables to print for observed and synthetic data with the worst utility.
a logical indicator as to whether the sample size itself has been synthesised.
colour for low end of the gradient.
colour for high end of the gradient.
a number of break points to create if breaks are not given directly.
breaks for a two colour binned gradient.
a logical value to determine if the functions
synorig.compare()
should be used to check that data sets can be
compared. Used when the synthetic data are supplied as a data.frame or
a list when default set to TRUE.
Allows printing of message as metrics are calculated for each element of the table. Default is TRUE.
additional parameters
an object of class utility.tables
.
Calculates tables of observed and synthesised values for the variables
specified in vars
with the function utility.tab
and produces
tables and plots of one-way, two-way or
three-way utility measures formed from vars
. Several options for utility
measures can be selected for printing or plotting. Details are in help file
for utility.tab
.
The tables and variables with the worst utility scores are identified. Visualisations of the matrices of utility scores are plotted. For threeway tables a third variable can be defined to select all tables involving that variable for plotting. If it is not specified the variable with tables giving the worst utility is selected as the third variable.
Read, T.R.C. and Cressie, N.A.C. (1988) Goodness--of--Fit Statistics for Discrete Multivariate Data, Springer--Verlag, New York.
Voas, D. and Williamson, P. (2001) Evaluating goodness-of-fit measures for synthetic microdata. Geographical and Environmental Modelling, 5(2), 177-200.
utility.tab
ods <- SD2011[1:1000, c("sex", "age", "edu", "marital", "region", "income")]
s1 <- syn(ods)
### synthetic data provided as a 'synds' object
(t1 <- utility.tables(s1, ods, tab.stats = "all", print.tabs = TRUE))
### synthetic data provided as a 'data.frame' object
(t1 <- utility.tables(s1$syn, ods, tab.stats = "all", print.tabs = TRUE))
t2 <- utility.tables(s1, ods, tables = "twoway")
print(t2, max.scale = 3)
(t3 <- utility.tables(s1, ods, tab.stats = "all", tables = "threeway",
third.var = "sex", print.tabs = TRUE))
(t4 <- utility.tables(s1, ods, tab.stats = "all", tables = "threeway",
third.var = "sex", useNA = FALSE, print.tabs = TRUE))
(t5 <- utility.tables(s1, ods, tab.stats = "all",
print.tabs = TRUE))
Run the code above in your browser using DataLab