Learn R Programming

Rdimtools (version 1.1.2)

do.sda: Semi-Supervised Discriminant Analysis

Description

Semi-Supervised Discriminant Analysis (SDA) is a linear dimension reduction method when label is partially missing, i.e., semi-supervised. The labeled data points are used to maximize the separability between classes while the unlabeled ones to estimate the intrinsic structure of the data. Regularization in case of rank-deficient case is also supported via an \(\ell_2\) scheme via beta.

Usage

do.sda(X, label, ndim = 2, type = c("proportion", 0.1), alpha = 1, beta = 1)

Value

a named list containing

Y

an \((n\times ndim)\) matrix whose rows are embedded observations.

trfinfo

a list containing information for out-of-sample prediction.

projection

a \((p\times ndim)\) whose columns are basis for projection.

Arguments

X

an \((n\times p)\) matrix or data frame whose rows are observations and columns represent independent variables.

label

a length-\(n\) vector of data class labels.

ndim

an integer-valued target dimension.

type

a vector of neighborhood graph construction. Following types are supported; c("knn",k), c("enn",radius), and c("proportion",ratio). Default is c("proportion",0.1), connecting about 1/10 of nearest data points among all data points. See also aux.graphnbd for more details.

alpha

balancing parameter between model complexity and empirical loss.

beta

Tikhonov regularization parameter.

Author

Kisung You

References

cai_semisupervised_2007Rdimtools

Examples

Run this code
## use iris data
data(iris)
X     = as.matrix(iris[,1:4])
label = as.integer(iris$Species)

## copy a label and let 20% of elements be missing
nlabel = length(label)
nmissing = round(nlabel*0.20)
label_missing = label
label_missing[sample(1:nlabel, nmissing)]=NA

## compare true case with missing-label case
out1 = do.sda(X, label)
out2 = do.sda(X, label_missing)

## visualize
opar <- par(no.readonly=TRUE)
par(mfrow=c(1,2))
plot(out1$Y, col=label, main="true projection")
plot(out2$Y, col=label, main="20% missing labels")
par(opar)

Run the code above in your browser using DataLab