ksIRT: ksIRT - kernel smoothing in Item Response Theory

Description

Fits nonparametric item and options characteristic curves using kernel smoothing techniques. Within the KernSmoothIRT package, it provides the relevant data for the graphical analysis of multiple choice test and questionnaire data.

Usage

ksIRT(responses, key, format, kernel = c("gaussian","quadratic","uniform"), itemlabels,
weights,miss = c("option","omit","random.multinom","random.unif"), NAweight = 0, 
evalpoints, nevalpoints, bandwidth = c("Silverman","CV"), RankFun = "sum", SubRank, 
thetadist = list("norm",0,1), groups = FALSE, nsubj)
# S3 method for ksIRT
print(x,...)

Arguments

responses

input data matrix with options selected from each individual for each item. Rows represent individuals, columns represent items. Alternatively, a data.frame or list can be specified. Missing values are inserted as NA.

key

a numeric vector or a scalar. If key is a vector, its length must match the number of items; if it is a scalar, its value is used for all items.

If the items are multiple choice, key should contain the option that corresponds to the correct response.

If the data are rating-scale, key should contain the largest option value for each item. In this case, the weight assigned to each option is equal to its option number.

More complicated weighting schemes, such as partial credit, can be specified in the weights argument. If weights is specified, key must be left blank.

format

a numeric scalar or vector specifying the type of items. If all of the items are multiple choice, then format = 1. If all of the items are rating-scale or partial credit, then format = 2. If all of the items are nominal items, then format = 3. If the test has a mixture of items of different formats, then format is a vector with length equal to the number of items with entries of 1 for each multiple choice item and 2 for each rating-scale item. For more complicated weighting schemes use the weights argument.

kernel

a character string specifying the kernel function. kernel must be either "gaussian", "quadratic" or "uniform". The default is "gaussian".

itemlabels

optional list of labels for each item. If omitted, each item will be labelled according to its numerical order. These labels will be used in plotting.

weights

optional list that may be used in lieu of including key. Specifying weights allows for more complicated weighting schemes than the default. Its length must be equal to the number of items and each entry must be a matrix with option numbers in the first row and option weights in the second row. If weights is omitted and format=1, then weights are given according to key. If weights is omitted and format=2, then an option weight equals the option number is given to each response. If weights is omitted and and format=3, then weights are set to zero.

miss

a character string specifying the method used to manage missing responses.

The default value, miss="option", considers the missing responses as a further option, labeled as NA, with zero weight. Such NA option will be added to the plot of the Option Characteristic Curves. Alternatively, a different weight for the NA option may be specified through the NAweight argument.

miss="random.unif" substitutes NAs with options randomly chosen from the possible ones for the corresponding item.

miss="random.multinom" does the same substitution as miss="random.unif" but each option has a probability of being selected proportional to its relative frequency.

miss="omit" excludes from the analysis all the subjects with at least one omitted response.

NAweight

a scalar value that specifies the weight given to missing responses when miss="option". The default is zero.

evalpoints

an optional numeric vector that specifies the quantiles at which to estimate the Option Characteristic Curves. If unspecified, the default is nevalpoints evenly spaced values with end points determined according to the number of subjects and the distribution specified with the thetadist argument.

nevalpoints

an optional scalar value that specifies the number of evenly spaced points at which curves are estimated. This value is used as an alternative to a user defined vector in the evalpoints argument. The default value is 51. The end points are determined according to the number of subjects and to the distribution specified for the thetadist argument. If both nevalpoints and evalpoints are specified, then evalpoints takes precedence.

bandwidth

either "Silverman", "CV" or a numeric vector specifying, for each item, the bandwidth to use for kernel smoothing. The default value, bandwidth="Silverman", is a numeric vector computed following the well-known Silverman's rule of thumb. If bandwidth="CV", then the bandwidth is chosen for each item through cross-validation.

RankFun

a function that is used to rank subjects. The default value is "sum". Another common choice is "mean".

SubRank

a numeric vector specifying the rank of each of the subjects. If unspecified and format=1 or format= 2, subjects will be ranked according to the function passed through the argument RankFun. When format=3 this argument must be provided.

thetadist

a list specifying the distribution to be used to thetadist (see Ramsay, 1991, p. 615) the subjects. By default a standard normal distribution is used. A different distribution can be adopted by specifying the first element of the list as "norm", "beta", "unif", "gamma", etc. where the character string is the same as used in the subjscoresummary function qnorm(),qbeta(), qunif(), qgamma(). The other elements of the list should be the distribution parameters as required by the subjscoresummary function chosen.

groups

an optional vector of length equal to the number of subjects containing the group designation of each subject. Adding this option allows for comparisons between groups using the Differential Item Functioning tools (see details section).

nsubj

an optional numeric value with the number of subjects.

a ksIRT object to be printed.

...

further parameters

Value

Returned from this function is a ksIRT object which is a list with the following components:

nitem

an integer indicating the number of items.

nsubj

an integer indicating the number of subjects.

nevalpoints

an integer indicating the number of points for curve estimation.

binaryresp

a matrix of binary responses. Each row corresponds to a single option. The first three columns specify the item, the option, and the corresponding weight. Each additional column is a binary indicator of whether a subject selected that option.

OCC

a matrix with the first 3 columns the same as binaryresp and an additional column for each quantile at which the option characteristic curves have been estimated. The additional columns contain the kernel smoothed probabilities of selecting each option.

stderrs

a matrix as OCC containing the standard errors of OCC.

subjscore

a vector containing the observed score of each subject.

itemlabels

a list containing the label for each item.

thetadist

a list indicating the distribution used to rank subjects (see thetadist in Arguments).

subjtheta

a vector of quantile ranks for each subject on the distribution specified in thetadist.

evalpoints

a vector with the subjscoresummary used in curve estimation.

subjscoresummary

a vector of subjscoresummary, of probabilities .05, .25, .50, .75, .95, for the observed overall scores.

subjscoresummaryevalpoints

a vector as subjscoresummary but computed on subjtheta.

SmthWgts

a matrix containing the kernel weights.

scale

a vector indicating whether each item is multiple-choice, rating-scale or nominal; 1 indicates multiple-choice, 0 indicates rating-scale, 3 indicates nominal.

format

returns the format argument passed at function call.

bandwidth

a vector containing the bandwidths for each item.

DIF

a list of ksIRT objects created for each of the subgroups specified by groups.

groups

returns the groups argument passed at function call.

itemcor

a vector containing the point polyserial correlation for each item.

RCC

a list of nsubj vectors containing the normalized likelihood for each value in evalpoints.

subjthetaML

the maximum likelihood estimate for the expected total score of each subject.

Details

When bandwidth="Silverman", the rule of thumb of Silverman (1986, p. 45) is implemented with the formula: 1.06*sigma.hat*nsubj^(-.2), where nsubj is the number of subjects and sigma.hat is the standard deviation of the subjscoresummary associated to the subjects according to the distribution specified with thetadist. Note that when thetadist=list("norm",mean,sd), sigma.hat is the value specified for sd.

Printing the ksIRT object shows the point polyserial correlation correlation between each item and the overall test score.

References

Mazza A, Punzo A, McGuire B. (2014). KernSmoothIRT: An R Package for Kernel Smoothing in Item Response Theory. Journal of Statistical Software, 58 6, 1-34. URL: http://www.jstatsoft.org/v58/i06/.

Ramsay, J.O. (2000). TestGraf: A program for the graphical analysis of multiple choice test and questionnaire data.

Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis. Chapman & Hall, London.

Examples

Run this code

# NOT RUN {
 ## Psych101 data
data(Psych101)
Psych1 <- ksIRT(responses = Psychresponses[1:100,], key = Psychkey, format = 1)
Psych1
    
plot(Psych1,plottype="OCC", item=c(24,25,92,96))
plot(Psych1,plottype="EIS", item=c(24,25,92,96))
plot(Psych1, plottype="tetrahedron", items=c(24,92))
plot(Psych1, plottype="triangle", items=c(24,92))
plot(Psych1, plottype="PCA")
plot(Psych1,plottype="RCC", subjects=c(33,92))
 
PCA(Psych1)
subjEIS(Psych1)
subjETS(Psych1)
subjOCC(Psych1, stype="ObsScore")
subjscore(Psych1)
subjthetaML(Psych1)
subjscoreML(Psych1)
 
plot(Psych1, plottype="expected")
plot(Psych1, plottype="sd")
plot(Psych1, plottype="density")

## HIV data
data(HIV)
HIVsubset <- HIV[c(c(1:50),c(1508:1558),c(2934:2984)),]
gr2 <- as.character(HIVsubset$SITE)
DIF2 <- ksIRT(res=HIVsubset[,-(1:3)], key=HIVkey, format = 2, groups=gr2, miss="omit")

plot(DIF2, plottype="expectedDIF", lwd=2)
plot(DIF2, plottype="densityDIF", lwd=2)
plot(DIF2, plottype="EISDIF",  item=c(6,11))

### Ordinal Survey Data
data(BDI)
BDI1 <- ksIRT(responses=BDIresponses, key=BDIkey, format = 2, miss="omit")

plot(BDI1, plottype="OCC", items=1:4)
plot(BDI1, plottype="sd")
plot(BDI1, plottype="density", ylim=c(0,0.1))

# }

Run the code above in your browser using DataLab