Learn R Programming

MSmix (version 2.0.0)

data_censoring: Censoring of full rankings

Description

Convert full rankings into either top-k rankings or into partial rankings with missing data in arbitrary positions.

Usage

data_censoring(
  rankings,
  topk = TRUE,
  nranked = NULL,
  probs = rep(1, ncol(rankings) - 1)
)

Value

A list of two named objects:

part_rankings

Integer \(N\)\(\times\)\(n\) matrix with partial (censored) rankings in each row. Missing positions are coded as NA.

nranked

Integer vector of length \(N\) with the actual number of items ranked in each partial sequence after censoring.

Arguments

rankings

Integer \(N\)\(\times\)\(n\) matrix or data frame with full rankings in each row.

topk

Logical: whether the full rankings must be converted into top-k rankings (TRUE) or into partial rankings with missing data in arbitrary positions (FALSE). Defaults to TRUE.

nranked

Integer vector of length \(N\) with the desired number of positions to be retained in each partial sequence after censoring. If nranked = NULL (default), the number of positions are randomly generated according to the probabilities in the probs argument.

probs

Numeric vector of the \((n-1)\) probabilities for the random generation of the number of positions to be retained in each partial sequence after censoring (normalization is not necessary). Used only if nranked = NULL. Defaults to equal probabilities.

Details

Both forms of partial rankings can be obtained into two ways: (i) by specifying, in the nranked argument, the number of positions to be retained in each partial ranking; (ii) by setting nranked = NULL (default) and specifying, in the probs argument, the probabilities of retaining respectively \(1, 2, ..., (n-1)\) positions in the partial rankings (recall that a partial sequence with \((n-1)\) observed entries corresponds to a full ranking).

When topk = FALSE, the exact positions that must be retained into the partial sequences after censoring are uniformly generated, regardless of the specification of the nranked argument.

Examples

Run this code

## Example 1. Censoring the Antifragility dataset into partial top rankings
# Top-3 censoring (assigned number of top positions to be retained)
n <- 7
r_antifrag <- ranks_antifragility[, 1:n]
data_censoring(r_antifrag, topk = TRUE, nranked = rep(3,nrow(r_antifrag)))
# Random top-k censoring with assigned probabilities
set.seed(12345)
data_censoring(r_antifrag, topk = TRUE, probs = 1:(n-1))

## Example 2. Simulate full rankings from a basic Mallows model with Spearman distance
n <- 10
N <- 100
set.seed(12345)
rankings <- rMSmix(sample_size = N, n_items = n)$samples
# Censoring in arbitrary positions with assigned number of ranks to be retained
set.seed(12345)
nranked <- round(runif(N,0.5,1)*n)
set.seed(12345)
arbitr_ranks1 <- data_censoring(rankings, topk = FALSE, nranked = nranked)
arbitr_ranks1
identical(arbitr_ranks1$nranked, nranked)
# Censoring in arbitrary positions with random number of ranks to be retained
set.seed(12345)
probs <- runif(n-1, 0, 0.5)
set.seed(12345)
arbitr_ranks2 <- data_censoring(rankings, topk = FALSE, probs = probs)
arbitr_ranks2
prop.table(table(arbitr_ranks2$nranked))
round(prop.table(probs), 2)

Run the code above in your browser using DataLab