sslCoTrain: Co-Training

Description

Co-Training

Usage

sslCoTrain(xl, yl, xu, method1 = "nb", method2 = "nb", nrounds1, nrounds2, portion = 0.5, n = 10, seed = 0, ...)

Arguments

a n * p matrix or data.frame of labeled data

a n * 1 integer vector of labels.

a m * p matrix or data.frame of unlabeled data

method1, method2

a string which specifies the first and second classification model to use.xgb means extreme gradient boosting,please refer to xgb.train.For other options,see more in train.

nrounds1, nrounds2

parameter needed when method1 or method2 =xgb. See more in xgb.train

portion

the percentage of data to split into two parts.

the number of unlabeled examples to add into label data in each iteration.

seed

an integer specifying random number generation state for data split

...

other parameters

Value

a m * 1 integer vector representing the predictions of unlabeled data.

Details

sslCoTrain divides labeled data into two parts ,each part is trained with a classifier, then it chooses some unlabeled examples for prediction and adds them into labeled data. These new labeled data help the other classifer improve performance.

References

Blum, A., & Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. COLT: Proceedings of the Workshop on Computational Learning Theory.

Examples

Run this code

data(iris)
xl<-iris[,1:4]
#Suppose we know the first twenty observations of each class
#and we want to predict the remaining with co-training
# 1 setosa, 2 versicolor, 3 virginica
yl<-rep(1:3,each=20)
known.label <-c(1:20,51:70,101:120)
xu<-xl[-known.label,]
xl<-xl[known.label,]
yu<-sslCoTrain(xl,yl,xu,method1="xgb",nrounds1 = 100,method2="xgb",nrounds2 = 100,n=60)

Run the code above in your browser using DataLab