scoreScale: Flexible function to score a single PRO or other psychometric scale

Description

scoreScale is a flexible function that can be used to calculate a single scale score from a set of items.

Usage

scoreScale(
  df,
  items = NULL,
  revitems = FALSE,
  minmax = NULL,
  okmiss = 0.5,
  type = c("pomp", "100", "sum", "mean"),
  scalename = "scoredScale",
  keepNvalid = FALSE
)

Value

A data frame with a variable containing the scale score. Optionally, the data frame can additionally have a variable containing the number of valid item responses for each respondent.

Arguments

df: A data frame containing the items you wish to score. It can contain only the items, or the items plus other non-scored variables. If it contains non-scored variables, then you must use the items argument to let the function know how to find your items in df.
items: (optional) A character vector with the item names, or a numeric vector indicating the column numbers of the items in df. If items is omitted, then scoreScale will assume that df contains only the items to be scored and no non-scored variables.
revitems: (optional) either TRUE, FALSE, or a vector indicating which items in df should be reverse coded before scoring. If omitted or FALSE (the default), no items are reverse coded. If TRUE, all items are reverse coded before scoring. If only some of the items should be reverse coded, provide either a character vector with names of the items or a numeric vector with column numbers of the items in df that should be reverse coded before scoring. If this argument is anything but FALSE, then the minmax argument is required.
minmax: (optional) A vector of 2 integers of the format c(itemMin, itemMax), indicating the minimum and maximum possible item responses, e.g., c(0, 4). This argument is required if type equals "pomp" (the default type) or "100". This is also required only revitems is used and not set to FALSE. This function assumes that all items have the same response range. If this is not the case, then manually reverse code your items in df before using this function, and omit the revitems and minmax arguments.
okmiss: The maximum proportion of items that a respondent is allowed to have missing and still have their non-missing items scored (and prorated). If the proportion of missing items for a respondent is greater than okmiss, then the respondent will be assigned a value of NA for their scale score. The default value is 0.50.
type: The type of score that scoreScale should produce. Must be one of either "sum" (for the sum of the item scores), "mean" (for the mean of the item scores), "100" (for the score transformed to range from 0 to 100), or "pomp" (for a score representing the "Percent Of the Maximum Possible", which is exactly the same as "100" but with a better name). The default is "pomp".
scalename: The quoted variable name you want the function to give your scored scale. If this argument is omitted, the scale will be named "scoredScale" by default.
keepNvalid: Logical value indicating whether a variable containing the number of valid, non-missing items for each respondent should be returned in a data frame with the scale score. The default is FALSE. Set to TRUE to return this variable, which will be named "scalename_N" (with whatever name you gave to the scalename argument). Most users should probably omit this argument entirely. This argument might be removed from future versions of the package, so please let me know if you think this argument useful and would rather it remain a part of the function.

Further Explanation of Arguments

The scoreScale function technically has only 1 required argument, df. If none of your items need to be reverse coded before scoring, your items are in a data frame named myData, and myData contains ONLY the items to be scored and no non-scored variables, then scoreScale(myData) is sufficient to score your items.

In most real-world situations, however, you will likely have a data frame containing a mix of items and other variables. In this case, you should additionally use the items argument to indicate which variables in your data frame are the items to be scored. For example, assume that myData contains an ID variable named "ID", followed by three items named "Q1", "Q2", and "Q3", none of which need to be reverse coded. You can score the scale by providing the items argument with either (1) a numeric vector with the column indexes of the items, like scoreScale(myData, items = 2:4) or scoreScale(myData, items = c(2, 3, 4), or (2) a character vector with the item names, like scoreScale(myData, items = c("Q1", "Q2", "Q3").

Details

The scoreScale function is the workhorse of the PROscorerTools package, and it is intended to be the building block of other, more complex scoring functions tailored to specific PRO measures. It can handle items that need to be reverse coded before scoring, and it has options for handling missing item responses. It can use three different methods to score the items: (1) sum scoring (the sum of the item scores), mean scoring (the mean of the item scores), and 0-100 scoring (like sum or mean scoring, except that the scores are rescaled to range from 0 to 100). This latter method is also called "POMP" scoring (Percent Of the Maximum Possible), and is the default scoring method of scoreScale since it has numerous advantages over other scoring methods (see References).

This function assumes that all items have the same numeric response range. It can still be used to score scales comprised of items with different response ranges with two caveats:

First, if your items have different ranges of possible response values AND some need to be reverse coded before scoring, you should not use this function's revitems plus minmax arguments to reverse your items. Instead, you should manually reverse code your items (see revcode) before using scoreScale, and omit the revitems and minmax arguments.
Second, depending on how the different item response options are numerically coded, some items might contribute more/less to the scale score than others. For example, consider a questionnaire where the first item has responses coded as "0 = No, 1 = Yes" and the rest of the items are coded as "0 = Never, 1 = Sometimes, 2 = Always". The first item will contribute relatively less weight to the scale score than the other items because its maximum value is only 1, compared to 2 for the other items. This state of affairs is not ideal, and you might want to reconsider including items with different response ranges in one scale score (if you have that option).

References

Cohen, P, Cohen, J, Aiken, LS, & West, SG (1999). The problem of units and the circumstance for POMP. Multivariate Behavioral Research, 34(3), 315-346.

Examples

Run this code

# Make a data frame using default settings of makeFakeData() function
# (20 respondents, 9 items with values 0 to 4, and about 20% missing)
dat <- makeFakeData()

# First "sum" score the items, then "mean" score them
scoreScale(dat, type = "sum")
scoreScale(dat, type = "mean")

# Must use "minmax" argument if the "type" argument is "100"
scoreScale(dat, type = "100", minmax = c(0, 4))
# If you omit "type", the default is "pomp" (which is identical to "100")
scoreScale(dat, minmax = c(0, 4))

# "minmax" is also required if any items need to be reverse coded for scoring
#  Below, the first two items are reverse coded before scoring
scoreScale(dat, type = "sum", revitems = c("q1", "q2"), minmax = c(0, 4))

Run the code above in your browser using DataLab