Prepare rank or preference data for further analyses.
setup_rank_data(
rankings = NULL,
preferences = NULL,
user_ids = numeric(),
observation_frequency = NULL,
validate_rankings = TRUE,
na_action = c("augment", "fail", "omit"),
cl = NULL,
max_topological_sorts = 1,
timepoint = NULL,
n_items = NULL
)
An object of class "BayesMallowsData"
, to be provided in the data
argument to compute_mallows()
.
A matrix of ranked items, of size n_assessors x n_items
.
See create_ranking()
if you have an ordered set of items that need to be
converted to rankings. If preferences
is provided, rankings
is an
optional initial value of the rankings. If rankings
has column names,
these are assumed to be the names of the items. NA
values in rankings are
treated as missing data and automatically augmented; to change this
behavior, see the na_action
argument to set_model_options()
. A vector
length n_items
is silently converted to a matrix of length 1 x n_items
,
and names (if any), are used as column names.
A data frame with one row per pairwise comparison, and
columns assessor
, top_item
, and bottom_item
. Each column contains the
following:
assessor
is a numeric vector containing the assessor index.
bottom_item
is a numeric vector containing the index of the item that
was disfavored in each pairwise comparison.
top_item
is a numeric vector containing the index of the item that was
preferred in each pairwise comparison.
So if we have two assessors and five items, and assessor 1 prefers item 1
to item 2 and item 1 to item 5, while assessor 2 prefers item 3 to item 5,
we have the following df
:
assessor | bottom_item | top_item |
1 | 2 | 1 |
1 | 5 | 1 |
2 | 5 | 3 |
Optional numeric
vector of user IDs. Only only used by
update_mallows()
. If provided, new data can consist of updated partial
rankings from users already in the dataset, as described in Section 6 of
steinSequentialInferenceMallows2023;textualBayesMallows.
A vector of observation frequencies (weights) to
apply do each row in rankings
. This can speed up computation if a large
number of assessors share the same rank pattern. Defaults to NULL
, which
means that each row of rankings
is multiplied by 1. If provided,
observation_frequency
must have the same number of elements as there are
rows in rankings
, and rankings
cannot be NULL
. See
compute_observation_frequency()
for a convenience function for computing
it.
Logical specifying whether the rankings provided (or
generated from preferences
) should be validated. Defaults to TRUE
.
Turning off this check will reduce computing time with a large number of
items or assessors.
Character specifying how to deal with NA
values in the
rankings
matrix, if provided. Defaults to "augment"
, which means that
missing values are automatically filled in using the Bayesian data
augmentation scheme described in
vitelli2018;textualBayesMallows. The other options for this
argument are "fail"
, which means that an error message is printed and the
algorithm stops if there are NA
s in rankings
, and "omit"
which simply
deletes rows with NA
s in them.
Optional computing cluster used for parallelization when generating
transitive closure based on preferences, returned from
parallel::makeCluster()
. Defaults to NULL
.
When preference data are provided, multiple rankings will be consistent with the preferences stated by each users. These rankings are the topological sorts of the directed acyclic graph corresponding to the transitive closure of the preferences. This number defaults to one, which means that the algorithm stops when it finds a single initial ranking which is compatible with the rankings stated by the user. By increasing this number, multiple rankings compatible with the pairwise preferences will be generated, and one initial value will be sampled from this set.
Integer vector specifying the timepoint. Defaults to NULL
,
which means that a vector of ones, one for each observation, is generated.
Used by update_mallows()
to identify data with a given iteration of the
sequential Monte Carlo algorithm. If not NULL
, must contain one integer
for each row in rankings
.
Integer specifying the number of items. Defaults to NULL
,
which means that the number of items is inferred from rankings
or from
preferences
. Setting n_items
manually can be useful with pairwise
preference data in the SMC algorithm, i.e., when rankings
is NULL
and
preferences
is non-NULL
, and contains a small number of pairwise
preferences for a subset of users and items.
Other preprocessing:
get_transitive_closure()
,
set_compute_options()
,
set_initial_values()
,
set_model_options()
,
set_priors()
,
set_progress_report()
,
set_smc_options()