Learn R Programming

SSL (version 0.1)

sslCoTrain: Co-Training

Description

Co-Training

Usage

sslCoTrain(xl, yl, xu, method1 = "nb", method2 = "nb", nrounds1, nrounds2, portion = 0.5, n = 10, seed = 0, ...)

Arguments

xl
a n * p matrix or data.frame of labeled data
yl
a n * 1 integer vector of labels.
xu
a m * p matrix or data.frame of unlabeled data
method1, method2
a string which specifies the first and second classification model to use.xgb means extreme gradient boosting,please refer to xgb.train.For other options,see more in train.
nrounds1, nrounds2
parameter needed when method1 or method2 =xgb. See more in xgb.train
portion
the percentage of data to split into two parts.
n
the number of unlabeled examples to add into label data in each iteration.
seed
an integer specifying random number generation state for data split
...
other parameters

Value

a m * 1 integer vector representing the predictions of unlabeled data.

Details

sslCoTrain divides labeled data into two parts ,each part is trained with a classifier, then it chooses some unlabeled examples for prediction and adds them into labeled data. These new labeled data help the other classifer improve performance.

References

Blum, A., & Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. COLT: Proceedings of the Workshop on Computational Learning Theory.

See Also

train xgb.train

Examples

Run this code
data(iris)
xl<-iris[,1:4]
#Suppose we know the first twenty observations of each class
#and we want to predict the remaining with co-training
# 1 setosa, 2 versicolor, 3 virginica
yl<-rep(1:3,each=20)
known.label <-c(1:20,51:70,101:120)
xu<-xl[-known.label,]
xl<-xl[known.label,]
yu<-sslCoTrain(xl,yl,xu,method1="xgb",nrounds1 = 100,method2="xgb",nrounds2 = 100,n=60)

Run the code above in your browser using DataLab