predict: Model Predictions

Description

Provides the method predict() for itemMatrix (e.g., transactions). Predicts the membership (nearest neighbor) of new data to clusters represented by medoids or labeled examples.

Usage

predict(object, ...)
# S4 method for itemMatrix
predict(object, newdata, labels = NULL, blocksize = 200, ...)

Value

An integer vector of the same length as newdata containing the predicted labels for each element.

Arguments

object: clustered examples as an itemMatrix with cluster label specified in labels or medoids as an itemMatrix (use labels = NULL).
...: further arguments passed on to dissimilarity(). E.g., method.
newdata: an itemMatrix containing the objects to predict labels for.
labels: an integer vector containing the labels for the examples in object. The cluster labels need to be contiguous integers starting with 1.
blocksize: a numeric scalar indicating how much memory predict can use for big x and/or y (approx. in MB). 200 is only a crude approximation for 32-bit machines (64-bit architectures need double the blocksize in memory) and using the default Jaccard method for dissimilarity calculation. In general, reducing blocksize will decrease the memory usage but will increase the run-time.

Author

Michael Hahsler

Examples

Run this code

data("Adult")

## sample
small <- sample(Adult, 500)
large <- sample(Adult, 5000)

## cluster a small sample and extract the cluster lael vector
d_jaccard <- dissimilarity(small)
hc <- hclust(d_jaccard)
l <-  cutree(hc, k=4)

## predict labels for a larger sample
labels <- predict(small, large, l)

## plot the profile of the 1. cluster
itemFrequencyPlot(large[labels == 1, itemFrequency(large) > 0.1])