scoreScale
is a flexible function that can be used to
calculate a single scale score from a set of items.
scoreScale(
df,
items = NULL,
revitems = FALSE,
minmax = NULL,
okmiss = 0.5,
type = c("pomp", "100", "sum", "mean"),
scalename = "scoredScale",
keepNvalid = FALSE
)
A data frame with a variable containing the scale score. Optionally, the data frame can additionally have a variable containing the number of valid item responses for each respondent.
A data frame containing the items you wish to score. It can
contain only the items, or the items plus other non-scored variables. If
it contains non-scored variables, then you must use the items
argument to let the function know how to find your items in df
.
(optional) A character vector with the item names, or a numeric
vector indicating the column numbers of the items in df
. If
items
is omitted, then scoreScale
will assume that df
contains only the items to be scored and no non-scored variables.
(optional) either TRUE
, FALSE
, or
a vector indicating which items in df
should be reverse coded before
scoring. If omitted or FALSE
(the default), no items are reverse
coded. If TRUE
, all items are reverse coded before scoring. If
only some of the items should be reverse coded, provide either a character
vector with names of the items or a numeric vector with column numbers of
the items in df
that should be reverse coded before scoring. If
this argument is anything but FALSE
, then the minmax
argument
is required.
(optional) A vector of 2 integers of the format
c(itemMin, itemMax)
, indicating the minimum and maximum possible
item responses, e.g., c(0, 4)
. This argument is required if
type
equals "pomp"
(the default type
) or "100"
.
This is also required only revitems
is used and not set to
FALSE
. This function assumes that all items have the same response
range. If this is not the case, then manually reverse code your items in
df
before using this function, and omit the revitems
and
minmax
arguments.
The maximum proportion of items that a respondent is allowed to
have missing and still have their non-missing items scored (and prorated).
If the proportion of missing items for a respondent is greater than
okmiss
, then the respondent will be assigned a value of NA
for their scale score. The default value is 0.50
.
The type of score that scoreScale
should produce. Must be
one of either "sum"
(for the sum of the item scores), "mean"
(for the mean of the item scores), "100"
(for the score transformed
to range from 0 to 100), or "pomp"
(for a score representing the
"Percent Of the Maximum Possible", which is exactly the same as
"100"
but with a better name). The default is "pomp"
.
The quoted variable name you want the function to give your
scored scale. If this argument is omitted, the scale will be named
"scoredScale"
by default.
Logical value indicating whether a variable containing the
number of valid, non-missing items for each respondent should be returned
in a data frame with the scale score. The default is FALSE
. Set to
TRUE
to return this variable, which will be named "scalename_N"
(with whatever name you gave to the scalename
argument). Most users
should probably omit this argument entirely. This argument might be
removed from future versions of the package, so please let me know if you
think this argument useful and would rather it remain a part of the
function.
The scoreScale
function technically has only 1 required argument,
df
. If none of your items need to be reverse coded before scoring,
your items are in a data frame named myData
, and myData
contains ONLY the items to be scored and no non-scored variables, then
scoreScale(myData)
is sufficient to score your items.
In most real-world situations, however, you will likely have a data frame
containing a mix of items and other variables. In this case, you should
additionally use the items
argument to indicate which variables in
your data frame are the items to be scored. For example, assume that
myData
contains an ID variable named "ID", followed by three items
named "Q1", "Q2", and "Q3", none of which need to be reverse coded. You
can score the scale by providing the items
argument with either
(1) a numeric vector with the column indexes of the items, like
scoreScale(myData, items = 2:4)
or scoreScale(myData, items =
c(2, 3, 4)
, or (2) a character vector with the item names, like
scoreScale(myData, items = c("Q1", "Q2", "Q3")
.
The scoreScale
function is the workhorse of the
PROscorerTools package, and it is intended to be the building
block of other, more complex scoring functions tailored to specific PRO
measures. It can handle items that need to be reverse coded before
scoring, and it has options for handling missing item responses. It can
use three different methods to score the items: (1) sum scoring (the sum of
the item scores), mean scoring (the mean of the item scores), and 0-100
scoring (like sum or mean scoring, except that the scores are rescaled to
range from 0 to 100). This latter method is also called "POMP" scoring
(Percent Of the Maximum Possible), and is the default scoring method of
scoreScale
since it has numerous advantages over other scoring
methods (see References).
This function assumes that all items have the same numeric response range. It can still be used to score scales comprised of items with different response ranges with two caveats:
First, if your items have different ranges of possible response
values AND some need to be reverse coded before scoring, you should not
use this function's revitems
plus minmax
arguments to
reverse your items. Instead, you should manually reverse code your
items (see revcode
) before using scoreScale
, and omit the
revitems
and minmax
arguments.
Second, depending on how the different item response options are numerically coded, some items might contribute more/less to the scale score than others. For example, consider a questionnaire where the first item has responses coded as "0 = No, 1 = Yes" and the rest of the items are coded as "0 = Never, 1 = Sometimes, 2 = Always". The first item will contribute relatively less weight to the scale score than the other items because its maximum value is only 1, compared to 2 for the other items. This state of affairs is not ideal, and you might want to reconsider including items with different response ranges in one scale score (if you have that option).
Cohen, P, Cohen, J, Aiken, LS, & West, SG (1999). The problem of units and the circumstance for POMP. Multivariate Behavioral Research, 34(3), 315-346.
# Make a data frame using default settings of makeFakeData() function
# (20 respondents, 9 items with values 0 to 4, and about 20% missing)
dat <- makeFakeData()
# First "sum" score the items, then "mean" score them
scoreScale(dat, type = "sum")
scoreScale(dat, type = "mean")
# Must use "minmax" argument if the "type" argument is "100"
scoreScale(dat, type = "100", minmax = c(0, 4))
# If you omit "type", the default is "pomp" (which is identical to "100")
scoreScale(dat, minmax = c(0, 4))
# "minmax" is also required if any items need to be reverse coded for scoring
# Below, the first two items are reverse coded before scoring
scoreScale(dat, type = "sum", revitems = c("q1", "q2"), minmax = c(0, 4))
Run the code above in your browser using DataLab