This function shapes data for use in a dgirt or dgmrp model. Most
arguments give the name or names of key variables in the data. These
arguments end in _name
or _names
and should be character
vectors.
shape(item_data = NULL, item_names = NULL, time_name, geo_name,
group_names = NULL, id_vars = NULL, time_filter = NULL,
geo_filter = NULL, min_t_filter = 1L, min_survey_filter = 1L,
survey_name = NULL, modifier_data = NULL, modifier_names = NULL,
t1_modifier_names = NULL, standardize = TRUE, target_data = NULL,
raking = NULL, max_raked_weight = NULL, weight_name = NULL,
proportion_name = "proportion", aggregate_data = NULL,
aggregate_item_names = NULL, constant_item = TRUE, ...)
A table in which items appear in columns and each row represents an individual's responses in some time period and local geographic area.
Item response variables.
A time variable with numeric values.
A geographic variable representing local areas.
Discrete grouping variables, usually demographic. Using numeric variables is allowed but not recommended.
Additional variables that should be included in the result, other than those specified elsewhere.
A numeric vector giving possible values of the time variable. Observed and unobserved time periods can be given. Defaults to observed values.
A character vector giving values of the geographic variable. Defaults to observed values.
An integer minimum of time period appearances for included items.
An integer minimum of survey appearances for included items.
A survey identifier.
Table giving characteristics of local geographic areas in time periods. See details below.
Variables giving modifiers of geographic hierarchical
parameters in modifier_data
.
Variables to be used instead of those in
modifier_names
, only in the first period.
Whether to standardize the variables given by
modifier_names
and t1_modifier_names
to be zero-mean and
unit-variance for performance gains. (For discussion see the Stan Language
Reference section "Standardizing Predictors and Outputs.")
A table giving population proportions for groups by local geographic area and time period. See details below.
A formula or list of formulas specifying the variables on which to rake survey weights.
A maximum over which raked weights will be trimmed. Only applied after raking. To trim unraked weights, manipulate the input data directly.
A variable giving survey weights.
The variable giving population proportions
for strata in target_data
.
A table of trial and success counts by group and item. See details below.
A subset of values of the item
variable in
aggregate_data
, for restricting the aggregate data.
Whether item difficulty parameters should be constant over time.
Further arguments.
Individual-level data giving item responses is expected as argument
item_data
. Required arguments time_name
and geo_name
give the names of variables in item_data
that indicate time period and
local geographic area. Optional argument group_names
gives other
respondent characteristics to be modeled. item_data
is optional if
argument aggregate_data
is used. Note that the dgirt()
model
assumes consistent coding of the polarity of item responses for
identification.
Data for modeling geographic hierarchical parameters can be given with
argument modifier_data
, in which case argument modifier_names
is required and arguments t1_modifier_names
and standardize
are
optional.
shape()
aggregates the individual-level item response data given as
item_data
for modeling. Data already aggregated to the group level can
be provided with argument aggregate_data
.
The data given by aggregate_data
must be in a long table of trial and
success counts indexed by item, group, and time period. The variable names
given by arguments group_names
, geo_name
, andtime_name
should exist in aggregate_data
. Three fixed variable names must also
appear in aggregate_data
: item
giving item identifiers,
n_grp
giving counts of item-response trials, and s_grp
giving
counts of item-response successes. These counts should be adjusted
consistently with the transformations applied during the aggregation by
shape()
of the individual item_data
.
Use argument target_data
to adjust the weighting of groups toward
population targets via raking, using an adaptation of
rake
. To adjust existing survey weights in
item_data
, provide argument weight_name
. Otherwise,
observations in item_data
will be assigned equal starting weights.
Argument raking
defines strata. If you pass it a list of formulas like
list(~ x, ~ y)
, raking is first over x
, then over y
.
Given an additive formula like ~ x + y
, raking is over the
combinations of x
and y
. So, list(~ x, ~ y + z)
is first
over x
, then over y
-z
pairs. Argument
proportion_name
is optional.
For convenience, data in item_data
, modifier_data
,
aggregate_data
, and target_data
can be restricted (subsetted)
row-wise to the time periods given by argument time_filter
and the
local geographic areas given by argument geo_filter
.
Data can also be filtered column-wise to retain item variables that appear in
a minimum of time periods, using argument min_t_filter
, or a minimum
of surveys, with argument min_survey_filter
. Argument
survey_name
is required when filtering by survey.
If both row-wise and column-wise restrictions are specified, shape
iterates over them until they leave the data unchanged.
# NOT RUN {
# model individual item responses
shaped_responses <- shape(opinion, item_names = "abortion", time_name =
"year", geo_name = "state", group_names = "race3")
# summarize result)
summary(shaped_responses)
# check sparseness of data to be modeled
get_item_n(shaped_responses, by = "year")
# }
Run the code above in your browser using DataLab