Learn R Programming

DTRlearn (version 1.3)

make_classification: Data Simulation for single stage

Description

It generates simulated datasets to test single stage DTR learning algorithms. The outcomes are generated based on a pattern mixture model using a latent variable with 2 categories. Category 1 has the optimal treatment y=1, and category 2 has y=-1. The feature variables X has a multivariate normal distribution. Specifically, we generate centroids of the classes from a multivariate normal distribution mean 0 and std 5. We add the centroids to the first pinfo dimension of the vectors of feature variables X simulated from multivariate normal distribution with pinfo+pnoise dimensions. The observed treatment assignments A are completely random to be 1 and -1 with probability 0.5, and the outcomes are generated as: \(R_1=0, R_2= 1.5A*y+N(0,1)\).

Usage

make_classification(n_cluster, pinfo, pnoise, n_sample, centroids = 0)

Arguments

n_cluster

number of clusters.

pinfo

number of informative variables, dimensions of the centroids related to the latent class of the sample.

pnoise

number of noise variables.

n_sample

sample size

centroids

For a training set, do not assign centroids, the centroids are generated randomly by the function. For a testing set, one wants to assign the same set of centroids as the training set. It is a matrix of dimension n_cluster by p.

Value

X

The feature variable matrix, it is a n_sample by pinfo+pnoise matrix generated from multivariate normal distribution. Where the noises are with mean 0 and std 1. The informative variables are shifted to centered at the randomly generated centroids.

A

The treatment assignment vector

y

The true optimal treatment vector

R

Outcomes vector

centroids

are from pinfo dimensional multivariate normal distribution.

References

This function borrows idea from a python comparable function make_classification in scikit_learn http://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html#sklearn.datasets.make_classification

See Also

make_2classification for generating simulation data for 2 stages

Examples

Run this code
# NOT RUN {
n_cluster=10
pinfo=10
pnoise=20
example1=make_classification(n_cluster,pinfo,pnoise,100)
test=make_classification(n_cluster,pinfo,pnoise,100,example1$centroids)
model1=Olearning_Single(example1$X,example1$A,example1$R)
Atp=predict(model1,test$X)
V1=mean(test$R[Atp==test$A])

model2=wsvm(example1$X,example1$A,example1$R,'rbf',0.05)
Atp=predict(model2,test$X)
V2=mean(test$R[Atp==test$A])
#in this very non-linear case, one can compare V1 with V2 (the empirical value on testing set),
#and can see the better of model2 using rbf kernel
#to model1 using linear kernel. 
#the true optimal value here is 1.5
# }

Run the code above in your browser using DataLab