Fits nonparametric item and options characteristic curves using kernel smoothing techniques. Within the KernSmoothIRT package, it provides the relevant data for the graphical analysis of multiple choice test and questionnaire data.
ksIRT(responses, key, format, kernel = c("gaussian","quadratic","uniform"), itemlabels,
weights,miss = c("option","omit","random.multinom","random.unif"), NAweight = 0,
evalpoints, nevalpoints, bandwidth = c("Silverman","CV"), RankFun = "sum", SubRank,
thetadist = list("norm",0,1), groups = FALSE, nsubj)# S3 method for ksIRT
print(x,...)
input data matrix with options selected from each individual for each item. Rows represent individuals, columns represent items. Alternatively, a data.frame or list can be specified. Missing values are inserted as NA
.
a numeric vector or a scalar. If key
is a vector, its length must match the number of items; if it is a scalar, its value is used for all items.
If the items are multiple choice, key
should contain the option that corresponds to the correct response.
If the data are rating-scale, key
should contain the largest option value for each item. In this case, the weight assigned to each option is equal to its option number.
More complicated weighting schemes, such as partial credit, can be specified in the weights
argument. If weights
is specified, key
must be left blank.
a numeric scalar or vector specifying the type of items.
If all of the items are multiple choice, then format = 1
.
If all of the items are rating-scale or partial credit, then format = 2
.
If all of the items are nominal items, then format = 3
.
If the test has a mixture of items of different formats, then format is a vector with length equal to the number of items with entries of 1 for each multiple choice item and 2 for each rating-scale item. For more complicated weighting schemes use the weights
argument.
a character string specifying the kernel function. kernel
must be either "gaussian"
, "quadratic"
or "uniform"
. The default is "gaussian"
.
optional list of labels for each item. If omitted, each item will be labelled according to its numerical order. These labels will be used in plotting.
optional list that may be used in lieu of including key
. Specifying weights
allows for more complicated weighting schemes than the default. Its length must be equal to the number of items and each entry must be a matrix with option numbers in the first row and option weights in the second row. If weights is omitted and format=1
, then weights are given according to key
. If weights
is omitted and format=2
, then an option weight equals the option number is given to each response. If weights
is omitted and and format=3
, then weights are set to zero.
a character string specifying the method used to manage missing responses.
The default value, miss="option"
, considers the missing responses as a further option, labeled as NA
, with zero weight.
Such NA
option will be added to the plot of the Option Characteristic Curves.
Alternatively, a different weight for the NA
option may be specified through the NAweight
argument.
miss="random.unif"
substitutes NA
s with options randomly chosen from the possible ones for the corresponding item.
miss="random.multinom"
does the same substitution as miss="random.unif"
but each option has a probability of being selected proportional to its relative frequency.
miss="omit"
excludes from the analysis all the subjects with at least one omitted response.
a scalar value that specifies the weight given to missing responses when miss="option"
.
The default is zero.
an optional numeric vector that specifies the quantiles at which to estimate the Option Characteristic Curves.
If unspecified, the default is nevalpoints
evenly spaced values with end points determined according to the number of subjects and the distribution specified with the thetadist
argument.
an optional scalar value that specifies the number of evenly spaced points at which curves are estimated. This value is used as an alternative to a user defined vector in the evalpoints
argument. The default value is 51.
The end points are determined according to the number of subjects and to the distribution specified for the thetadist
argument.
If both nevalpoints
and evalpoints
are specified, then evalpoints
takes precedence.
either "Silverman"
, "CV"
or a numeric vector specifying, for each item, the bandwidth to use for kernel smoothing. The default value, bandwidth="Silverman"
, is a numeric vector computed following the well-known Silverman's rule of thumb. If bandwidth="CV"
, then the bandwidth is chosen for each item through cross-validation.
a function that is used to rank subjects. The default value is "sum"
. Another common choice is "mean"
.
a numeric vector specifying the rank of each of the subjects. If unspecified and format=1
or format= 2
, subjects will be ranked according to the function passed through the argument RankFun
. When format=3
this argument must be provided.
a list specifying the distribution to be used to thetadist (see Ramsay, 1991, p. 615) the subjects.
By default a standard normal distribution is used.
A different distribution can be adopted by specifying the first element of the list as "norm"
, "beta"
, "unif"
, "gamma"
, etc. where the character string is the same as used in the subjscoresummary function qnorm()
,qbeta()
, qunif()
, qgamma()
.
The other elements of the list should be the distribution parameters as required by the subjscoresummary function chosen.
an optional vector of length equal to the number of subjects containing the group designation of each subject. Adding this option allows for comparisons between groups using the Differential Item Functioning tools (see details section).
an optional numeric value with the number of subjects.
a ksIRT
object to be printed.
further parameters
Returned from this function is a ksIRT
object which is a list with the following components:
an integer indicating the number of items.
an integer indicating the number of subjects.
an integer indicating the number of points for curve estimation.
a matrix of binary responses. Each row corresponds to a single option. The first three columns specify the item, the option, and the corresponding weight. Each additional column is a binary indicator of whether a subject selected that option.
a matrix with the first 3 columns the same as binaryresp
and an additional column for each quantile at which the option characteristic curves have been estimated.
The additional columns contain the kernel smoothed probabilities of selecting each option.
a matrix as OCC
containing the standard errors of OCC
.
a vector containing the observed score of each subject.
a list containing the label for each item.
a list indicating the distribution used to rank subjects (see thetadist
in Arguments).
a vector of quantile ranks for each subject on the distribution specified in thetadist
.
a vector with the subjscoresummary used in curve estimation.
a vector of subjscoresummary, of probabilities .05
, .25
, .50
, .75
, .95
, for the observed overall scores.
a vector as subjscoresummary
but computed on subjtheta
.
a matrix containing the kernel weights.
a vector indicating whether each item is multiple-choice, rating-scale or nominal; 1
indicates multiple-choice, 0
indicates rating-scale, 3
indicates nominal.
returns the format
argument passed at function call.
a vector containing the bandwidths for each item.
a list of ksIRT
objects created for each of the subgroups specified by groups
.
returns the groups
argument passed at function call.
a vector containing the point polyserial correlation for each item.
a list of nsubj
vectors containing the normalized likelihood for each value in evalpoints
.
the maximum likelihood estimate for the expected total score of each subject.
When bandwidth="Silverman"
, the rule of thumb of Silverman (1986, p. 45) is implemented with the formula: 1.06*sigma.hat*nsubj^(-.2)
, where nsubj
is the number of subjects and sigma.hat
is the standard deviation of the subjscoresummary associated to the subjects according to the distribution specified with thetadist
.
Note that when thetadist=list("norm",mean,sd)
, sigma.hat
is the value specified for sd
.
Printing the ksIRT
object shows the point polyserial correlation correlation between each item and the overall test score.
Mazza A, Punzo A, McGuire B. (2014). KernSmoothIRT: An R Package for Kernel Smoothing in Item Response Theory. Journal of Statistical Software, 58 6, 1-34. URL: http://www.jstatsoft.org/v58/i06/.
Ramsay, J.O. (2000). TestGraf: A program for the graphical analysis of multiple choice test and questionnaire data.
Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis. Chapman & Hall, London.
# NOT RUN {
## Psych101 data
data(Psych101)
Psych1 <- ksIRT(responses = Psychresponses[1:100,], key = Psychkey, format = 1)
Psych1
plot(Psych1,plottype="OCC", item=c(24,25,92,96))
plot(Psych1,plottype="EIS", item=c(24,25,92,96))
plot(Psych1, plottype="tetrahedron", items=c(24,92))
plot(Psych1, plottype="triangle", items=c(24,92))
plot(Psych1, plottype="PCA")
plot(Psych1,plottype="RCC", subjects=c(33,92))
PCA(Psych1)
subjEIS(Psych1)
subjETS(Psych1)
subjOCC(Psych1, stype="ObsScore")
subjscore(Psych1)
subjthetaML(Psych1)
subjscoreML(Psych1)
plot(Psych1, plottype="expected")
plot(Psych1, plottype="sd")
plot(Psych1, plottype="density")
## HIV data
data(HIV)
HIVsubset <- HIV[c(c(1:50),c(1508:1558),c(2934:2984)),]
gr2 <- as.character(HIVsubset$SITE)
DIF2 <- ksIRT(res=HIVsubset[,-(1:3)], key=HIVkey, format = 2, groups=gr2, miss="omit")
plot(DIF2, plottype="expectedDIF", lwd=2)
plot(DIF2, plottype="densityDIF", lwd=2)
plot(DIF2, plottype="EISDIF", item=c(6,11))
### Ordinal Survey Data
data(BDI)
BDI1 <- ksIRT(responses=BDIresponses, key=BDIkey, format = 2, miss="omit")
plot(BDI1, plottype="OCC", items=1:4)
plot(BDI1, plottype="sd")
plot(BDI1, plottype="density", ylim=c(0,0.1))
# }
Run the code above in your browser using DataLab