Learn R Programming

⚠️There's a newer version (0.9.7) of this package.Take me there.

R Semi-Supervised Learning package

This R package provides implementations of several semi-supervised learning methods, in particular, our own work involving constraint based semi-supervised learning.

To cite the package, use either of these two references:

  • Krijthe, J. H. (2016). RSSL: R package for Semi-supervised Learning. In B. Kerautret, M. Colom, & P. Monasse (Eds.), Reproducible Research in Pattern Recognition. RRPR 2016. Lecture Notes in Computer Science, vol 10214. (pp. 104–115). Springer International Publishing. https://doi.org/10.1007/978-3-319-56414-2_8. arxiv: https://arxiv.org/abs/1612.07993
  • Krijthe, J.H. & Loog, M. (2015). Implicitly Constrained Semi-Supervised Least Squares Classification. In E. Fromont, T. de Bie, & M. van Leeuwen, eds. 14th International Symposium on Advances in Intelligent Data Analysis XIV (Lecture Notes in Computer Science Volume 9385). Saint Etienne. France, pp. 158-169.

Installation Instructions

This package available on CRAN. The easiest way to install the package is to use:

install.packages("RSSL")

To install the latest version of the package using the devtools package:

library(devtools)
install_github("jkrijthe/RSSL")

Usage

After installation, load the package as usual:

library(RSSL)

The following code generates a simple dataset, trains a supervised and two semi-supervised classifiers and evaluates their performance:

library(dplyr,warn.conflicts = FALSE)
library(ggplot2,warn.conflicts = FALSE)

set.seed(2)
df <- generate2ClassGaussian(200, d=2, var = 0.2, expected=TRUE)

# Randomly remove labels
df <- df %>% add_missinglabels_mar(Class~.,prob=0.98) 

# Train classifier
g_nm <- NearestMeanClassifier(Class~.,df,prior=matrix(0.5,2))
g_self <- SelfLearning(Class~.,df,
                       method=NearestMeanClassifier,
                       prior=matrix(0.5,2))

# Plot dataset
df %>% 
  ggplot(aes(x=X1,y=X2,color=Class,size=Class)) +
  geom_point() +
  coord_equal() +
  scale_size_manual(values=c("-1"=3,"1"=3), na.value=1) +
  geom_linearclassifier("Supervised"=g_nm,
                  "Semi-supervised"=g_self)


# Evaluate performance: Squared Loss & Error Rate
mean(loss(g_nm,df))
mean(loss(g_self,df))


mean(predict(g_nm,df)!=df$Class)
mean(predict(g_self,df)!=df$Class)

Acknowledgement

Work on this package was supported by Project 23 of the Dutch national program COMMIT.

Copy Link

Version

Install

install.packages('RSSL')

Monthly Downloads

259

Version

0.9.3

License

GPL (>= 2)

Issues

Pull Requests

Stars

Forks

Maintainer

Jesse Krijthe

Last Published

November 13th, 2020

Functions in RSSL (0.9.3)

GRFClassifier

Label propagation using Gaussian Random Fields and Harmonic functions
KernelICLeastSquaresClassifier

Kernelized Implicitly Constrained Least Squares Classification
ICLinearDiscriminantClassifier

Implicitly Constrained Semi-supervised Linear Discriminant Classifier
CrossValidationSSL

Cross-validation in semi-supervised setting
EMLeastSquaresClassifier

An Expectation Maximization like approach to Semi-Supervised Least Squares Classification
EMLinearDiscriminantClassifier

Semi-Supervised Linear Discriminant Analysis using Expectation Maximization
EntropyRegularizedLogisticRegression

Entropy Regularized Logistic Regression
ICLeastSquaresClassifier

Implicitly Constrained Least Squares Classifier
EMNearestMeanClassifier

Semi-Supervised Nearest Mean Classifier using Expectation Maximization
BaseClassifier

Classifier used for enabling shared documenting of parameters
LaplacianSVM

Laplacian SVM classifier
LinearDiscriminantClassifier

Linear Discriminant Classifier
S4VM

Safe Semi-supervised Support Vector Machine (S4VM)
MCPLDA

Maximum Contrastive Pessimistic Likelihood Estimation for Linear Discriminant Analysis
c.CrossValidation

Merge result of cross-validation runs on single datasets into a the same object
LogisticRegressionFast

Logistic Regression implementation that uses R's glm
S4VM-class

LinearSVM Class
LeastSquaresClassifier

Least Squares Classifier
MCNearestMeanClassifier

Moment Constrained Semi-supervised Nearest Mean Classifier
LinearSVM-class

LinearSVM Class
adjacency_knn

Calculate knn adjacency matrix
LinearTSVM

Linear CCCP Transductive SVM classifier
PreProcessing

Preprocess the input to a classification function
LinearSVM

Linear SVM Classifier
PreProcessingPredict

Preprocess the input for a new set of test objects for classifier
KernelLeastSquaresClassifier

Kernelized Least Squares Classifier
decisionvalues

Decision values returned by a classifier for a set of objects
df_to_matrices

Convert data.frame with missing labels to matrices
SSLDataFrameToMatrices

Convert data.frame to matrices for semi-supervised learners
SVM

SVM Classifier
MCLinearDiscriminantClassifier

Moment Constrained Semi-supervised Linear Discriminant Analysis.
WellSVM_supervised

A degenerated version of WellSVM where the labels are complete, that is, supervised learning
LearningCurveSSL

Compute Semi-Supervised Learning Curve
generateABA

Generate data from 2 alternating classes
generate2ClassGaussian

Generate data from 2 Gaussian distributed classes
generateParallelPlanes

Generate Parallel planes
generateSlicedCookie

Generate Sliced Cookie dataset
minimaxlda

Implements weighted likelihood estimation for LDA
line_coefficients

Loss of a classifier or regression function
losspart

Loss of a classifier or regression function evaluated on partial labels
harmonic_function

Direct R Translation of Xiaojin Zhu's Matlab code to determine harmonic solution
generateTwoCircles

Generate data from 2 circles
generateSpirals

Generate Intersecting Spirals
USMLeastSquaresClassifier-class

USMLeastSquaresClassifier
LogisticLossClassifier-class

LogisticLossClassifier
stat_classifier

Plot RSSL classifier boundaries
LogisticLossClassifier

Logistic Loss Classifier
LogisticRegression

(Regularized) Logistic Regression implementation
stderror

Calculate the standard error of the mean from a vector of numbers
SelfLearning

Self-Learning approach to Semi-supervised Learning
TSVM

Transductive SVM classifier using the convex concave procedure
generateFourClusters

Generate Four Clusters dataset
generateCrescentMoon

Generate Crescent Moon dataset
print.LearningCurve

Print LearningCurve object
MajorityClassClassifier

Majority Class Classifier
LaplacianKernelLeastSquaresClassifier

Laplacian Regularized Least Squares Classifier
NearestMeanClassifier

Nearest Mean Classifier
USMLeastSquaresClassifier

Updated Second Moment Least Squares Classifier
projection_simplex

project an n-dim vector y to the simplex Dn
split_dataset_ssl

Create Train, Test and Unlabeled Set
geom_classifier

Plot RSSL classifier boundary (deprecated)
svdsqrtm

Taking the square root of a matrix using the singular value decomposition
svdinvsqrtm

Taking the inverse of the square root of the matrix using the singular value decomposition
wlda_loglik

Measures the expected log-likelihood of the LDA model defined by m, p, and iW on the data set a, where weights w are potentially taken into account
add_missinglabels_mar

Throw out labels at random
diabetes

diabetes data for unit testing
cov_ml

Biased (maximum likelihood) estimate of the covariance matrix
clapply

Use mclapply conditional on not being in RStudio
split_random

Randomly split dataset in multiple parts
find_a_violated_label

Find a violated label
missing_labels

Access the true labels for the objects with missing labels when they are stored as an attribute in a data frame
responsibilities

Responsilibities assigned to the unlabeled objects
geom_linearclassifier

Plot linear RSSL classifier boundary
plot.CrossValidation

Plot CrossValidation object
svmlin

svmlin implementation by Sindhwani & Keerthi (2006)
wlda

Implements weighted likelihood estimation for LDA
rssl-formatting

Show RSSL classifier
svmlin_example

Test data from the svmlin implementation
QuadraticDiscriminantClassifier

Quadratic Discriminant Classifier
gaussian_kernel

calculated the gaussian kernel matrix
RSSL

R Semi-Supervised Learning Package
WellSVM

WellSVM for Semi-superivsed Learning
measure_accuracy

Performance measures used in classifier evaluation
losslogsum

LogsumLoss of a classifier or regression function
loss

Loss of a classifier or regression function
WellSVM_SSL

Convex relaxation of S3VM by label generation
posterior

Class Posteriors of a classifier
plot.LearningCurve

Plot LearningCurve object
svmproblem

Train SVM
testdata

Example semi-supervised problem
localDescent

Local descent
logsumexp

Numerically more stable way to calculate log sum exp
predict,scaleMatrix-method

Predict for matrix scaling inspired by stdize from the PLS package
print.CrossValidation

Print CrossValidation object
sample_k_per_level

Sample k indices per levels from a factor
svdinv

Inverse of a matrix using the singular value decomposition
solve_svm

SVM solve.QP implementation
summary.CrossValidation

Summary of Crossvalidation results
scaleMatrix

Matrix centering and scaling
rssl-predict

Predict using RSSL classifier
threshold

Refine the prediction to satisfy the balance constraint
wlda_error

Measures the expected error of the LDA model defined by m, p, and iW on the data set a, where weights w are potentially taken into account
true_labels

Access the true labels when they are stored as an attribute in a data frame
wellsvm_direct

wellsvm implements the wellsvm algorithm as shown in [1].
wdbc

wdbc data for unit testing