It implements single stage Q-learning. Q-learning estimates optimal treatment option by fitting a regression model with treatment, feature variable and their interactions. The optimal treatment option is the the sign of the interaction term which maximize the predicted value from the regression model.
Usage
Qlearning_Single(H, A, R, pentype = "lasso",m=4)
Arguments
H
a n by p matrix, n is the sample size, p is the number of feature variables.
A
a vector of treatment assignments coded 1 and -1.
R
a vector of outcomes, larger is more desirable.
pentype
The type of regression in Q-learning, 'lasso' is the default lasso regression; 'LSE' is the ordinary least square.
m
needed when pentype='lasso', the number of folds in cross validation for picking tuning parameter for lasso in cv.glmnet
Value
It returns a class of 'qlearn', that consists of two components:
co
the coefficient of the regression model, it is a 2p+2 vector. The design matrix is X=(Intercept, H, A, diag(A)*H)
Q
The predicted optimal outcome from the regression model
References
Watkins, C. J. C. H. (1989). Learning from delayed rewards (Doctoral dissertation, University of Cambridge).
Murphy, S. A., Oslin, D. W., Rush, A. J., & Zhu, J. (2007). Methodological challenges in constructing effective treatment sequences for chronic psychiatric disorders. Neuropsychopharmacology, 32(2), 257-262.
Zhao, Y., Kosorok, M. R., & Zeng, D. (2009). Reinforcement learning design for cancer clinical trials. Statistics in medicine, 28(26), 3294.
# NOT RUN {n=200A=2*rbinom(n,1,0.5)-1p=20mu=numeric(p)
Sigma=diag(p)
X=mvrnorm(n,mu,Sigma)
R=X[,1:3]%*%c(1,1,-2)+X[,3:5]%*%c(1,1,-2)*A+rnorm(n)
modelQ=Qlearning_Single(X,A,R)
# }