xxirt: User Defined Item Response Model

Description

Estimates a user defined item response model. Both, item response functions and latent trait distributions can be specified by the user (see Details). By default, the EM algorithm is used for estimation. The number of maximum EM iterations can be defined with the argument maxit. The xxirt function also allows Newton-Raphson optimization by specifying values of maximum number of iterations in maxit_nr larger than zero. Typically, a small initial number of EM iterations should be chosen to obtain reasonable starting values.

Usage

xxirt(dat, Theta=NULL, itemtype=NULL, customItems=NULL, partable=NULL,
       customTheta=NULL, group=NULL, weights=NULL, globconv=1e-06, conv=1e-04,
       maxit=1000, mstep_iter=4, mstep_reltol=1e-06, maxit_nr=0, optimizer_nr="nlminb",
       control_nr=list(trace=1), h=1E-4, use_grad=TRUE, verbose=TRUE,
       penalty_fun_item=NULL, np_fun_item=NULL, verbose_index=NULL,
       cv_kfold=0, cv_maxit=10)
# S3 method for xxirt
summary(object, digits=3, file=NULL, ...)
# S3 method for xxirt
print(x, ...)
# S3 method for xxirt
anova(object,...)
# S3 method for xxirt
coef(object,...)
# S3 method for xxirt
logLik(object,...)
# S3 method for xxirt
vcov(object,...)
# S3 method for xxirt
confint(object, parm, level=.95, ... )
# S3 method for xxirt
IRT.expectedCounts(object,...)
# S3 method for xxirt
IRT.factor.scores(object, type="EAP", ...)
# S3 method for xxirt
IRT.irfprob(object,...)
# S3 method for xxirt
IRT.likelihood(object,...)
# S3 method for xxirt
IRT.posterior(object,...)
# S3 method for xxirt
IRT.modelfit(object,...)
# S3 method for IRT.modelfit.xxirt
summary(object,...)
# S3 method for xxirt
IRT.se(object,...)
# computes Hessian matrix
xxirt_hessian(object, h=1e-4, use_shortcut=TRUE)

Value

List with following entries

partable: Item parameter table
par_items: Vector with estimated item parameters
par_items_summary: Data frame with item parameters
par_items_bounds: Data frame with summary on bounds of estimated item parameters
par_Theta: Vector with estimated parameters of theta distribution
Theta: Matrix with \(\bold{\theta}\) grid
probs_items: Item response functions
probs_Theta: Theta distribution
deviance: Deviance
loglik: Log likelihood value
ic: Information criteria
item_list: List with item functions
customItems: Used customized item response functions
customTheta: Used customized theta distribution
cv_loglike: Cross-validated log-likelihood value (if cv_kfold>0)
p.xi.aj: Individual likelihood
p.aj.xi: Individual posterior
ll_case: Case-wise log-likelihood values
n.ik: Array of expected counts
EAP: EAP person parameter estimates
dat: Used dataset with item responses
dat_resp: Dataset with response indicators
weights: Vector of person weights
G: Number of groups
group: Integer vector of group indicators
group_orig: Vector of original group_identifiers
ncat: Number of categories per item
converged: Logical whether model has converged
iter: Number of iterations needed

Arguments

dat: Data frame with item responses
Theta: Matrix with \(\bold{\theta}\) grid vector of latent trait
itemtype: Vector of item types
customItems: List containing types of item response functions created by xxirt_createDiscItem.
partable: Item parameter table which is initially created by xxirt_createParTable and which can be modified by xxirt_modifyParTable.
customTheta: User defined \(\bold{\theta}\) distribution created by xxirt_createThetaDistribution.
group: Optional vector of group indicators
weights: Optional vector of person weights
globconv: Convergence criterion for relative change in deviance
conv: Convergence criterion for absolute change in parameters
maxit: Maximum number of iterations in the EM algorithm
mstep_iter: Maximum number of iterations in M-step
mstep_reltol: Convergence criterion in M-step
maxit_nr: Number of Newton-Raphson iterations after EM algorithm
optimizer_nr: Type of optimizer for Newton-Raphson optimization. Alternatives are "optim" or "nlminb" or other options of sirt_optimizer.
control_nr: Argument control for optimizer.
h: Numerical differentiation parameter
use_grad: Logical indicating whether the gradient should be supplied to stats::optim
verbose: Logical indicating whether iteration progress should be displayed
penalty_fun_item: Optional penalty function used in regularized estimation. Used as a function of x (vector of item parameters)
np_fun_item: Function that counts the number of item parameters in regularized estimation. Used as a function of x (vector of item parameters)
object: Object of class xxirt
digits: Number of digits to be rounded
file: Optional file name to which summary output is written
parm: Optional vector of parameters
level: Confidence level
verbose_index: Logical indicating whether item index should be printed in estimation output
cv_kfold: Number of k folds in cross validation. The default is 0 (no cross-validation)
cv_maxit: Maximum number of iterations for each cross-validation sample
x: Object of class xxirt
type: Type of person parameter estimate. Currently, only EAP is implemented.
use_shortcut: Logical indicating whether a shortcut in the computation should be utilized
...: Further arguments to be passed

Details

Item response functions can be specified as functions of unknown parameters \(\bold{\delta}_i\) such that \(P(X_{i}=x | \bold{\theta})=f_i( x | \bold{\theta} ; \bold{\delta}_i )\) The item response model is estimated under the assumption of local stochastic independence of items. Equality constraints of item parameters \(\bold{\delta}_i\) among items are allowed.

The probability distribution \(P(\bold{\theta})\) are specified as functions of an unknown parameter vector \(\bold{\gamma}\).

A penalty function for item parameters can be specified in penalty_fun_item. The penalty function should be differentiable and a non-differentiable function (e.g., the absolute value function) should be approximated by a differentiable function.

Examples

Run this code

if (FALSE) {
#############################################################################
## EXAMPLE 1: Unidimensional item response functions
#############################################################################

data(data.read)
dat <- data.read

#------ Definition of item response functions

#*** IRF 2PL
P_2PL <- function( par, Theta, ncat){
    a <- par[1]
    b <- par[2]
    TP <- nrow(Theta)
    P <- matrix( NA, nrow=TP, ncol=ncat)
    P[,1] <- 1
    for (cc in 2:ncat){
        P[,cc] <- exp( (cc-1) * a * Theta[,1] - b )
    }
    P <- P / rowSums(P)
    return(P)
}

#*** IRF 1PL
P_1PL <- function( par, Theta, ncat){
    b <- par[1]
    TP <- nrow(Theta)
    P <- matrix( NA, nrow=TP, ncol=ncat)
    P[,1] <- 1
    for (cc in 2:ncat){
        P[,cc] <- exp( (cc-1) * Theta[,1] - b )
    }
    P <- P / rowSums(P)
    return(P)
}

#** created item classes of 1PL and 2PL models
par <- c( "a"=1, "b"=0 )
# define some slightly informative prior of 2PL
item_2PL <- sirt::xxirt_createDiscItem( name="2PL", par=par, est=c(TRUE,TRUE),
               P=P_2PL, prior=c(a="dlnorm"), prior_par1=c( a=0 ),
               prior_par2=c(a=5) )
item_1PL <- sirt::xxirt_createDiscItem( name="1PL", par=par[2], est=c(TRUE),
               P=P_1PL )
customItems <- list( item_1PL,  item_2PL )

#---- definition theta distribution

#** theta grid
Theta <- matrix( seq(-6,6,length=21), ncol=1 )

#** theta distribution
P_Theta1 <- function( par, Theta, G){
    mu <- par[1]
    sigma <- max( par[2], .01 )
    TP <- nrow(Theta)
    pi_Theta <- matrix( 0, nrow=TP, ncol=G)
    pi1 <- dnorm( Theta[,1], mean=mu, sd=sigma )
    pi1 <- pi1 / sum(pi1)
    pi_Theta[,1] <- pi1
    return(pi_Theta)
}
#** create distribution class
par_Theta <- c( "mu"=0, "sigma"=1 )
customTheta  <- sirt::xxirt_createThetaDistribution( par=par_Theta, est=c(FALSE,TRUE),
                       P=P_Theta1 )

#****************************************************************************
#******* Model 1: Rasch model

#-- create parameter table
itemtype <- rep( "1PL", 12 )
partable <- sirt::xxirt_createParTable( dat, itemtype=itemtype,
                        customItems=customItems )

# estimate model
mod1 <- sirt::xxirt( dat=dat, Theta=Theta, partable=partable,
                   customItems=customItems, customTheta=customTheta)
summary(mod1)

# estimate Rasch model by providing starting values
partable1 <- sirt::xxirt_modifyParTable( partable, parname="b",
                   value=- stats::qlogis( colMeans(dat) ) )
# estimate model again
mod1b <- sirt::xxirt( dat=dat, Theta=Theta, partable=partable1,
                   customItems=customItems, customTheta=customTheta )
summary(mod1b)

# extract coefficients, covariance matrix and standard errors
coef(mod1b)
vcov(mod1b)
IRT.se(mod1b)

#** start with EM and finalize with Newton-Raphson algorithm
mod1c <- sirt::xxirt( dat=dat, Theta=Theta, partable=partable,
                   customItems=customItems, customTheta=customTheta,
                   maxit=20, maxit_nr=300)
summary(mod1c)

#****************************************************************************
#******* Model 2: 2PL Model with three groups of item discriminations

#-- create parameter table
itemtype <- rep( "2PL", 12 )
partable <- sirt::xxirt_createParTable( dat, itemtype=itemtype, customItems=customItems)
# modify parameter table: set constraints for item groups A, B and C
partable1 <- sirt::xxirt_modifyParTable(partable, item=paste0("A",1:4),
                         parname="a", parindex=111)
partable1 <- sirt::xxirt_modifyParTable(partable1, item=paste0("B",1:4),
                         parname="a", parindex=112)
partable1 <- sirt::xxirt_modifyParTable(partable1, item=paste0("C",1:4),
                         parname="a", parindex=113)
# delete prior distributions
partable1 <- sirt::xxirt_modifyParTable(partable1, parname="a", prior=NA)

#-- fix sigma to 1
customTheta1 <- customTheta
customTheta1$est <- c("mu"=FALSE,"sigma"=FALSE )

# estimate model
mod2 <- sirt::xxirt( dat=dat, Theta=Theta, partable=partable1,
                  customItems=customItems, customTheta=customTheta1 )
summary(mod2)

#****************************************************************************
#******* Model 3: Cloglog link function

#*** IRF cloglog
P_1N <- function( par, Theta, ncat){
    b <- par
    TP <- nrow(Theta)
    P <- matrix( NA, nrow=TP, ncol=ncat)
    P[,2] <- 1 - exp( - exp( Theta - b ) )
    P[,1] <- 1 - P[,2]
    return(P)
}
par <- c("b"=0)
item_1N <- sirt::xxirt_createDiscItem( name="1N", par=par, est=c(TRUE),
                    P=P_1N )
customItems <- list( item_1N )
itemtype <- rep( "1N", I )
partable <- sirt::xxirt_createParTable( dat[,items], itemtype=itemtype,
                      customItems=customItems )
partable <- sirt::xxirt_modifyParTable( partable=partable, parname="b",
                 value=- stats::qnorm( colMeans(dat[,items] )) )

#*** estimate model
mod3 <- sirt::xxirt( dat=dat, Theta=Theta, partable=partable, customItems=customItems,
                customTheta=customTheta )
summary(mod3)
IRT.compareModels(mod1,mod3)

#****************************************************************************
#******* Model 4: Latent class model

K <- 3 # number of classes
Theta <- diag(K)

#*** Theta distribution
P_Theta1 <- function( par, Theta, G  ){
    logitprobs <- par[1:(K-1)]
    l1 <- exp( c( logitprobs, 0 ) )
    probs <- matrix( l1/sum(l1), ncol=1)
    return(probs)
}

par_Theta <- stats::qlogis( rep( 1/K, K-1 ) )
names(par_Theta) <- paste0("pi",1:(K-1) )
customTheta  <- sirt::xxirt_createThetaDistribution( par=par_Theta,
                     est=rep(TRUE,K-1), P=P_Theta1)

#*** IRF latent class
P_lc <- function( par, Theta, ncat){
    b <- par
    TP <- nrow(Theta)
    P <- matrix( NA, nrow=TP, ncol=ncat)
    P[,1] <- 1
    for (cc in 2:ncat){
        P[,cc] <- exp( Theta %*% b )
    }
    P <- P / rowSums(P)
    return(P)
}
par <- seq( -1.5, 1.5, length=K )
names(par) <- paste0("b",1:K)
item_lc <- sirt::xxirt_createDiscItem( name="LC", par=par,
                 est=rep(TRUE,K), P=P_lc )
customItems <- list( item_lc )

# create parameter table
itemtype <- rep( "LC", 12 )
partable <- sirt::xxirt_createParTable( dat, itemtype=itemtype, customItems=customItems)
partable

#*** estimate model
mod4 <- sirt::xxirt( dat=dat, Theta=Theta, partable=partable, customItems=customItems,
                customTheta=customTheta)
summary(mod4)
# class probabilities
mod4$probs_Theta
# item response functions
imod4 <- IRT.irfprob( mod5 )
round( imod4[,2,], 3 )

#****************************************************************************
#******* Model 5: Ordered latent class model

K <- 3 # number of classes
Theta <- diag(K)
Theta <- apply( Theta, 1, cumsum )

#*** Theta distribution
P_Theta1 <- function( par, Theta, G  ){
    logitprobs <- par[1:(K-1)]
    l1 <- exp( c( logitprobs, 0 ) )
    probs <- matrix( l1/sum(l1), ncol=1)
    return(probs)
}
par_Theta <- stats::qlogis( rep( 1/K, K-1 ) )
names(par_Theta) <- paste0("pi",1:(K-1) )
customTheta  <- sirt::xxirt_createThetaDistribution( par=par_Theta,
                est=rep(TRUE,K-1), P=P_Theta1  )

#*** IRF ordered latent class
P_olc <- function( par, Theta, ncat){
    b <- par
    TP <- nrow(Theta)
    P <- matrix( NA, nrow=TP, ncol=ncat)
    P[,1] <- 1
    for (cc in 2:ncat){
        P[,cc] <- exp( Theta %*% b )
    }
    P <- P / rowSums(P)
    return(P)
}

par <- c( -1, rep( .5,, length=K-1 ) )
names(par) <- paste0("b",1:K)
item_olc <- sirt::xxirt_createDiscItem( name="OLC", par=par, est=rep(TRUE,K),
                    P=P_olc, lower=c( -Inf, 0, 0 ) )
customItems <- list( item_olc )
itemtype <- rep( "OLC", 12 )
partable <- sirt::xxirt_createParTable( dat, itemtype=itemtype, customItems=customItems)
partable

#*** estimate model
mod5 <- sirt::xxirt( dat=dat, Theta=Theta, partable=partable, customItems=customItems,
                customTheta=customTheta )
summary(mod5)
# estimated item response functions
imod5 <- IRT.irfprob( mod5 )
round( imod5[,2,], 3 )

#############################################################################
## EXAMPLE 2: Multiple group models with xxirt
#############################################################################

data(data.math)
dat <- data.math$data
items <- grep( "M[A-Z]", colnames(dat), value=TRUE )
I <- length(items)

Theta <- matrix( seq(-8,8,len=31), ncol=1 )

#****************************************************************************
#******* Model 1: Rasch model, single group

#*** Theta distribution
P_Theta1 <- function( par, Theta, G  ){
    mu <- par[1]
    sigma <- max( par[2], .01 )
    p1 <- stats::dnorm( Theta[,1], mean=mu, sd=sigma)
    p1 <- p1 / sum(p1)
    probs <- matrix( p1, ncol=1)
    return(probs)
}

par_Theta <- c(0,1)
names(par_Theta) <- c("mu","sigma")
customTheta  <- sirt::xxirt_createThetaDistribution( par=par_Theta,
                   est=c(FALSE,TRUE), P=P_Theta1  )
customTheta

#*** IRF 1PL logit
P_1PL <- function( par, Theta, ncat){
    b <- par
    TP <- nrow(Theta)
    P <- matrix( NA, nrow=TP, ncol=ncat)
    P[,2] <- plogis( Theta - b )
    P[,1] <- 1 - P[,2]
    return(P)
}
par <- c("b"=0)
item_1PL <- sirt::xxirt_createDiscItem( name="1PL", par=par, est=c(TRUE), P=P_1PL)
customItems <- list( item_1PL )

itemtype <- rep( "1PL", I )
partable <- sirt::xxirt_createParTable( dat[,items], itemtype=itemtype,
                       customItems=customItems )
partable <- sirt::xxirt_modifyParTable( partable=partable, parname="b",
                  value=- stats::qlogis( colMeans(dat[,items] )) )

#*** estimate model
mod1 <- sirt::xxirt( dat=dat[,items], Theta=Theta, partable=partable,
                customItems=customItems, customTheta=customTheta )
summary(mod1)

#****************************************************************************
#******* Model 2: Rasch model, multiple groups

#*** Theta distribution
P_Theta2 <- function( par, Theta, G  ){
    mu1 <- par[1]
    mu2 <- par[2]
    sigma1 <- max( par[3], .01 )
    sigma2 <- max( par[4], .01 )
    TP <- nrow(Theta)
    probs <- matrix( NA, nrow=TP, ncol=G)
    p1 <- stats::dnorm( Theta[,1], mean=mu1, sd=sigma1)
    probs[,1] <- p1 / sum(p1)
    p1 <- stats::dnorm( Theta[,1], mean=mu2, sd=sigma2)
    probs[,2] <- p1 / sum(p1)
    return(probs)
}
par_Theta <- c(0,0,1,1)
names(par_Theta) <- c("mu1","mu2","sigma1","sigma2")
customTheta2  <- sirt::xxirt_createThetaDistribution( par=par_Theta,
                    est=c(FALSE,TRUE,TRUE,TRUE), P=P_Theta2  )
print(customTheta2)

#*** estimate model
mod2 <- sirt::xxirt( dat=dat[,items], group=dat$female, Theta=Theta, partable=partable,
           customItems=customItems, customTheta=customTheta2, maxit=40)
summary(mod2)
IRT.compareModels(mod1, mod2)

#*** compare results with TAM package
library(TAM)
mod2b <- TAM::tam.mml( resp=dat[,items], group=dat$female )
summary(mod2b)
IRT.compareModels(mod1, mod2, mod2b)

#############################################################################
## EXAMPLE 3: Regularized 2PL model
#############################################################################

data(data.read, package="sirt")
dat <- data.read

#------ Definition of item response functions

#*** IRF 2PL
P_2PL <- function( par, Theta, ncat){
    a <- par[1]
    b <- par[2]
    TP <- nrow(Theta)
    P <- matrix( NA, nrow=TP, ncol=ncat)
    P[,1] <- 1
    for (cc in 2:ncat){
        P[,cc] <- exp( (cc-1) * a * Theta[,1] - b )
    }
    P <- P / rowSums(P)
    return(P)
}

#** created item classes of 1PL and 2PL models
par <- c( "a"=1, "b"=0 )
# define some slightly informative prior of 2PL
item_2PL <- sirt::xxirt_createDiscItem( name="2PL", par=par, est=c(TRUE,TRUE),
               P=P_2PL, prior=c(a="dlnorm"), prior_par1=c( a=0 ),
               prior_par2=c(a=5) )
customItems <- list( item_2PL )

#---- definition theta distribution

#** theta grid
Theta <- matrix( seq(-6,6,length=21), ncol=1 )

#** theta distribution
P_Theta1 <- function( par, Theta, G){
    mu <- par[1]
    sigma <- max( par[2], .01 )
    TP <- nrow(Theta)
    pi_Theta <- matrix( 0, nrow=TP, ncol=G)
    pi1 <- dnorm( Theta[,1], mean=mu, sd=sigma )
    pi1 <- pi1 / sum(pi1)
    pi_Theta[,1] <- pi1
    return(pi_Theta)
}
#** create distribution class
par_Theta <- c( "mu"=0, "sigma"=1 )
customTheta  <- sirt::xxirt_createThetaDistribution( par=par_Theta, est=c(FALSE,FALSE),
                       P=P_Theta1 )

#****************************************************************************
#******* Model 1: 2PL model

itemtype <- rep( "2PL", 12 )
partable <- sirt::xxirt_createParTable( dat, itemtype=itemtype,
                        customItems=customItems )

mod1 <- sirt::xxirt( dat=dat, Theta=Theta, partable=partable,
                   customItems=customItems, customTheta=customTheta)
summary(mod1)

#****************************************************************************
#******* Model 2: Regularized 2PL model with regularization on item loadings

# define regularized estimation of item loadings
parindex <- partable[ partable$parname=="a","parindex"]

#** penalty is defined by -N*lambda*sum_i (a_i-1)^2
N <- nrow(dat)
lambda <- .02
penalty_fun_item <- function(x)
{
    val <- N*lambda*sum( ( x[parindex]-1)^2)
    return(val)
}
# estimate standard deviation
customTheta1  <- sirt::xxirt_createThetaDistribution( par=par_Theta, est=c(FALSE,TRUE),
                       P=P_Theta1 )
mod2 <- sirt::xxirt( dat=dat, Theta=Theta, partable=partable,
                   customItems=customItems, customTheta=customTheta1,
                   penalty_fun_item=penalty_fun_item)
summary(mod2)

#############################################################################
## EXAMPLE 4: 2PL mixture model
#############################################################################

#*** simulate data
set.seed(123)
N <- 4000   # number of persons
I <- 15     # number of items
prop <- .25 # mixture proportion for second class

# discriminations and difficulties in first class
a1 <- rep(1,I)
b1 <- seq(-2,2,len=I)
# distribution in second class
mu2 <- 1
sigma2 <- 1.2
# compute parameters with constraint N(0,1) in second class
# a*(sigma*theta+mu-b)=a*sigma*(theta-(b-mu)/sigma)
#=> a2=a*sigma and b2=(b-mu)/sigma
a2 <- a1
a2[c(2,4,6,8)] <- 0.2  # some items with different discriminations
a2 <- a2*sigma2
b2 <- b1
b2[1:5] <- 1   # first 5 item with different difficulties
b2 <- (b2-mu2)/sigma2
dat1 <- sirt::sim.raschtype(theta=stats::rnorm(N*(1-prop)), b=b1, fixed.a=a1)
dat2 <- sirt::sim.raschtype(theta=stats::rnorm(N*prop), b=b2, fixed.a=a2)
dat <- rbind(dat1, dat2)

#**** model specification

#*** define theta distribution
TP <- 21
theta <- seq(-6,6,length=TP)
# stack theta vectors below each others=> 2 latent classes
Theta <- matrix( c(theta, theta ), ncol=1 )
# distribution of theta (i.e., N(0,1))
w_theta <- dnorm(theta)
w_theta <- w_theta / sum(w_theta)

P_Theta1 <- function( par, Theta, G){
    p2_logis <- par[1]
    p2 <- stats::plogis( p2_logis )
    p1 <- 1-p2
    pi_Theta <- c( p1*w_theta, p2*w_theta)
    pi_Theta <- matrix(pi_Theta, ncol=1)
    return(pi_Theta)
}

par_Theta <- c( p2_logis=qlogis(.25))
customTheta  <- sirt::xxirt_createThetaDistribution( par=par_Theta, est=c(TRUE),
                       P=P_Theta1)

# IRF for 2-class mixture 2PL model
par <- c(a1=1, a2=1, b1=0, b2=.5)

P_2PLmix <- function( par, Theta, ncat)
{
    a1 <- par[1]
    a2 <- par[2]
    b1 <- par[3]
    b2 <- par[4]
    P <- matrix( NA, nrow=2*TP, ncol=ncat)
    TP <- nrow(Theta)/2
    P1 <- stats::plogis( a1*(Theta[1:TP,1]-b1) )
    P2 <- stats::plogis( a2*(Theta[TP+1:(2*TP),1]-b2) )
    P[,2] <- c(P1, P2)
    P[,1] <- 1-P[,2]
    return(P)
}

# define some slightly informative prior of 2PL
item_2PLmix <- sirt::xxirt_createDiscItem( name="2PLmix", par=par,
               est=c(TRUE,TRUE,TRUE,TRUE), P=P_2PLmix )
customItems <- list( item_2PLmix )

#****************************************************************************
#******* Model 1: 2PL mixture model

itemtype <- rep( "2PLmix", I )
partable <- sirt::xxirt_createParTable( dat, itemtype=itemtype,
                        customItems=customItems )
mod1 <- sirt::xxirt( dat=dat, Theta=Theta, partable=partable,
                   customItems=customItems, customTheta=customTheta)
summary(mod1)
}