Simulate population data for the European Statistics on Income and Living Conditions (EU-SILC).
simEUSILC(
dataS,
hid = "db030",
wh = "db090",
wp = "rb050",
hsize = NULL,
strata = "db040",
pid = NULL,
age = "age",
gender = "rb090",
categorizeAge = TRUE,
breaksAge = NULL,
categorical = c("pl030", "pb220a"),
income = "netIncome",
method = c("multinom", "twostep"),
breaks = NULL,
lower = NULL,
upper = NULL,
equidist = TRUE,
probs = NULL,
gpd = TRUE,
threshold = NULL,
est = "moments",
const = NULL,
alpha = 0.01,
residuals = TRUE,
components = c("py010n", "py050n", "py090n", "py100n", "py110n", "py120n", "py130n",
"py140n"),
conditional = c(getCatName(income), "pl030"),
keep = TRUE,
maxit = 500,
MaxNWts = 1500,
tol = .Machine$double.eps^0.5,
nr_cpus = NULL,
seed
)
An object of class simPopObj
containing the
simulated EU-SILC population data as well as the underlying sample.
a data.frame
containing EU-SILC survey data.
a character string specifying the column of dataS
that
contains the household ID.
a character string specifying the column of dataS
that
contains the household sample weights.
a character string specifying the column of dataS
that
contains the personal sample weights.
an optional character string specifying a column of
dataS
that contains the household size. If NULL
, the household
sizes are computed.
a character string specifying the column of dataS
that
define strata. Note that this is currently a required argument and only one
stratification variable is supported.
an optional character string specifying a column of dataS
that contains the personal ID.
a character string specifying the column of dataS
that
contains the age of the persons (to be used for setting up the household
structure).
a character string specifying the column of dataS
that
contains the gender of the persons (to be used for setting up the household
structure).
a logical indicating whether age categories should be used for simulating additional categorical and continuous variables to decrease computation time.
numeric; if categorizeAge
is TRUE
, an
optional vector of two or more break points for constructing age categories,
otherwise ignored.
a character vector specifying additional categorical
variables of dataS
that should be simulated for the population data.
a character string specifying the variable of dataS
that contains the personal income (to be simulated for the population data).
a character string specifying the method to be used for
simulating personal income. Accepted values are "multinom"
(for using
multinomial log-linear models combined with random draws from the resulting
ategories) and "twostep"
(for using two-step regression models
combined with random error terms).
if method
is "multinom"
, an optional numeric
vector of two or more break points for categorizing the personal income. If
missing, break points are computed using weighted quantiles.
numeric values; if method
is "multinom"
and
breaks
is NULL
, these can be used to specify lower and upper
bounds other than minimum and maximum, respectively. Note that if gpd
is TRUE
(see below), upper
defaults to Inf
.
logical; if method
is "multinom"
and
breaks
is NULL
, this indicates whether the (positive) default
break points should be equidistant or whether there should be refinements in
the lower and upper tail (see getBreaks
).
numeric vector with values in \([0, 1]\); if method
is
"multinom"
and breaks
is NULL
, this gives probabilities
for quantiles to be used as (positive) break points. If supplied, this is
preferred over equidist
.
logical; if method
is "multinom"
, this indicates
whether the upper tail of the personal income should be simulated by random
draws from a (truncated) generalized Pareto distribution rather than a
uniform distribution.
a numeric value; if method
is "multinom"
,
values for categories above threshold
are drawn from a (truncated)
generalized Pareto distribution.
a character string; if method
is "multinom"
, the
estimator to be used to fit the generalized Pareto distribution.
numeric; if method
is "twostep"
, this gives a
constant to be added before log transformation.
numeric; if method
is "twostep"
, this gives
trimming parameters for the sample data. Trimming is thereby done with
respect to the variable specified by additional
. If a numeric vector
of length two is supplied, the first element gives the trimming proportion
for the lower part and the second element the trimming proportion for the
upper part. If a single numeric is supplied, it is used for both. With
NULL
, trimming is suppressed.
logical; if method
is "twostep"
, this
indicates whether the random error terms should be obtained by draws from
the residuals. If FALSE
, they are drawn from a normal distribution
(median and MAD of the residuals are used as parameters).
a character vector specifying the income components in
dataS
(to be simulated for the population data).
an optional character vector specifying categorical
contitioning variables for resampling of the income components. The
fractions occurring in dataS
are then drawn from the respective
subsets defined by these variables.
a logical indicating whether variables computed internally in the procedure (such as the original IDs of the corresponding households in the underlying sample, age categories or income categories) should be stored in the resulting population data.
control parameters to be passed to
multinom
and nnet
. See the help file
for nnet
.
if method
is "twostep"
, a small positive numeric
value or NULL
(see simContinuous
).
if specified, an integer number defining the number of cpus that should be used for parallel processing.
optional; an integer value to be used as the seed of the random number generator, or an integer vector containing the state of the random number generator to be restored.
Andreas Alfons and Stefan Kraft and Bernhard Meindl
simStructure
, simCategorical
,
simContinuous
, simComponents
data(eusilcS) # load sample data
if (FALSE) {
## long computation time
# multinomial model with random draws
eusilcM <- simEUSILC(eusilcS, upper = 200000, equidist = FALSE
, nr_cpus = 1)
summary(eusilcM)
# two-step regression
eusilcT <- simEUSILC(eusilcS, method = "twostep", nr_cpus = 1)
summary(eusilcT)
}
Run the code above in your browser using DataLab