Qlearning_Single: Single Stage Q learning

Description

It implements single stage Q-learning. Q-learning estimates optimal treatment option by fitting a regression model with treatment, feature variable and their interactions. The optimal treatment option is the the sign of the interaction term which maximize the predicted value from the regression model.

Usage

Qlearning_Single(H, A, R, pentype = "lasso",m=4)

Arguments

a n by p matrix, n is the sample size, p is the number of feature variables.

a vector of treatment assignments coded 1 and -1.

a vector of outcomes, larger is more desirable.

pentype

The type of regression in Q-learning, 'lasso' is the default lasso regression; 'LSE' is the ordinary least square.

needed when pentype='lasso', the number of folds in cross validation for picking tuning parameter for lasso in cv.glmnet

Value

It returns a class of 'qlearn', that consists of two components:

the coefficient of the regression model, it is a 2p+2 vector. The design matrix is X=(Intercept, H, A, diag(A)*H)

The predicted optimal outcome from the regression model

References

Watkins, C. J. C. H. (1989). Learning from delayed rewards (Doctoral dissertation, University of Cambridge).

Murphy, S. A., Oslin, D. W., Rush, A. J., & Zhu, J. (2007). Methodological challenges in constructing effective treatment sequences for chronic psychiatric disorders. Neuropsychopharmacology, 32(2), 257-262.

Zhao, Y., Kosorok, M. R., & Zeng, D. (2009). Reinforcement learning design for cancer clinical trials. Statistics in medicine, 28(26), 3294.

Examples

Run this code

# NOT RUN {
n=200
A=2*rbinom(n,1,0.5)-1
p=20
mu=numeric(p)
Sigma=diag(p)
X=mvrnorm(n,mu,Sigma)
R=X[,1:3]%*%c(1,1,-2)+X[,3:5]%*%c(1,1,-2)*A+rnorm(n)
modelQ=Qlearning_Single(X,A,R)
# }