predict.orclass: Subspace clustering based local classification using ORCLUS.

Description

Assigns clusters and distances and classes for new data according to the intrinsic subspace clusters of an orclass classification model.

Usage

# S3 method for orclass
predict(object, newdata, type = "nearest", ...)

Arguments

object

Model resulting from a call of orclass.

newdata

A matrix or data frame to be clustered by the given model.

type

Default "nearest" computes relative class frequencies of nearest cluster as class posterior probabilities.

…

Currently not used.

Value

class

Vector of predicted class levels.

posterior

Matrix where coloumns contain class posterior probabilities.

distances

A matrix where coloumns are the distances to all cluster centers in the corresponding subspaces for the new data.

cluster

The resulting cluster labels for the new data.

Details

For prediction the class distribution of the "nearest"" cluster is used. If type = "fuzzywts" cluster memberships (see e.g. Bezdek, 1981) are computed based on the cluster distances of cluster assignment by predict.orclus. For orclass prediction the class distributions of the clusters are weigthed using the cluster memberships of an observation.

References

Aggarwal, C. and Yu, P. (2000): Finding generalized projected clusters in high dimensional spaces, Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 70-81.

Bezdek, J. (1981): Pattern recognition with fuzzy objective function algorithms, Kluwer Academic, Norwell, MA.

Examples

Run this code

# NOT RUN {
# definition of a function for parameterized data simulation
sim.orclus <- function(k = 3, nk = 100, d = 10, l = 4, 
                       sd.cl = 0.05, sd.rest = 1, locshift = 1){
  ### input parameters for data generation
  # k           number of clusters
  # nk          observations per cluster
  # d           original dimension of the data
  # l           subspace dimension where the clusters are concentrated
  # sd.cl       (within cluster subspace) standard deviations for data generation 
  # sd.rest     standard deviations in the remaining space 
  # locshift    parameter of a uniform distribution to sample different cluster means  

  x <- NULL
  for(i in 1:k){
  # cluster centers
  apts <- locshift*matrix(runif(l*k), ncol = l)  
  # sample points in original space
  xi.original <- cbind(matrix(rnorm(nk * l, sd = sd.cl), ncol=l) + matrix(rep(apts[i,], nk), 
                              ncol = l, byrow = TRUE),
                       matrix(rnorm(nk * (d-l), sd = sd.rest), ncol = (d-l)))  
  # subspace generation
  sym.mat <- matrix(nrow=d, ncol=d)
  for(m in 1:d){
    for(n in 1:m){
      sym.mat[m,n] <- sym.mat[n,m] <- runif(1)  
      }
    } 
  subspace <- eigen(sym.mat)$vectors    
  # transformation
  xi.transformed <- xi.original %*% subspace
  x <- rbind(x, xi.transformed)
  }  
  clids <- rep(1:k, each = nk)
  result <- list(x = x, cluster = clids)
  return(result)
  }

# simulate data of 2 classes where class 1 consists of 2 subclasses
simdata <- sim.orclus(k = 3, nk = 200, d = 15, l = 4, 
                      sd.cl = 0.05, sd.rest = 1, locshift = 1)

x <- simdata$x
y <- c(rep(1,400), rep(2,200))

res <- orclass(x, y, k = 3, l = 4, k0 = 15, a = 0.75)
prediction <- predict(res, x)

# compare results
table(prediction$class, y)

# }

Run the code above in your browser using DataLab