maxDissim(a, b, n = 2, obj = minDiss, randomFrac = 1, verbose = FALSE, ...)
minDiss(u)
sumDiss(u)
b
that comprise the sub-sample.obj
measures the overall dissimilarity between the initial set and a candidate point. For example, maximizing the minimum or the sum of the m dissimilarities are two common approaches.This algorithm tends to select points on the edge of the data mainstream and will reliably select outliers. To select more samples towards the interior of the data set, set randomFrac
to be small (see the examples below).
dist
# start with 15 data points start <- sample(1:dim(tmp)[1], 15) base <- tmp[start,] pool <- tmp[-start,] # select 9 for addition newSamp <- maxDissim( base, pool, n = 9, randomFrac = pct, obj = obj, ...) allSamp <- c(start, newSamp) plot( tmp[-newSamp,], xlim = extendrange(tmp[,1]), ylim = extendrange(tmp[,2]), col = "darkgrey", xlab = "variable 1", ylab = "variable 2") points(base, pch = 16, cex = .7) for(i in seq(along = newSamp)) points( pool[newSamp[i],1], pool[newSamp[i],2], pch = paste(i), col = "darkred") }
par(mfrow=c(2,2))
set.seed(414) example(1, minDiss) title("No Random Sampling, Min Score")
set.seed(414) example(.1, minDiss) title("10 Pct Random Sampling, Min Score")
set.seed(414) example(1, sumDiss) title("No Random Sampling, Sum Score")
set.seed(414) example(.1, sumDiss) title("10 Pct Random Sampling, Sum Score")