Learn R Programming

spe (version 1.1.2)

spe: Implements the stochastic proximity embedding algorithm

Description

Embeds an N dimensional dataset in M dimensions, such that distances (or similarities) in the original N dimensions are maintained (as close as possible) in the final M dimensions

Usage

spe( coord, rcutpercent = 1, maxdist = 0, nobs = 0, ndim = 0, edim, lambda0 = 2.0, lambda1 = 0.01, nstep = 1e6, ncycle = 100, evalstress=FALSE, sampledist=TRUE, samplesize = 1e6)

Arguments

coord
This should be a matrix with number of rows equal to the number of observations and number of columns equal to the input dimension. A data.frame may also be supplied and it will be converted to a matrix (so all names will be lost)
rcutpercent
This is the percentage of the maximum distance (as determined by probability sampling) that will be used as the neighborhood radius. Setting rcutpercent to a value greater than 1 effectively sets it to infinity.
maxdist
If you have alread calculated a mxaimum distance then you can supply it and probability sampling will not be carried out to obtain a maximum distance. The default is to carry out sampling. By setting maxdist to a non zero value sampling will not be carried out (even if sampledist=TRUE)
nobs
The number of observations. If it is not specified nobs will be taken as nrow(coord)
ndim
The number of input dimensions. If not specified it will be taken as ncol(coord)
edim
The number of dimensions to embed in
lambda0
The starting value of the learning parameter
lambda1
The ending value of the learning parameter
nstep
The number of refinement steps
ncycle
The number of cycles to carry out refinement for
evalstress
If TRUE the function will evaluate the Sammon stress on the final embedding
sampledist
If TRUE an approximation to the maximum distance in the input dimensions will be obtained via probability sampling
samplesize
The number of iterations for probability sampling. For a dataset of 6070 observations there will be 6070x6069/2 pairwise distances. The default value gives a close approximation and runs fast. If you want a bettr approximation 1e7 is a good value. YMMV

Value

If evalstress is TRUE it will be a list with two components named x and stress. x is the matrix of the final embedding and stress is the final stress

Details

Efficient determination of rcut is yet to be implemented (using the connected component method). As a result you will have to determine a value of rcutpercent by trail and error. The pivot SPE method (J. Mol. Graph. Model., 2003, 22, 133-140) is not yet implemented

References

A Self Organizing Principle for Learning Nonlinear Manifolds, Proc. Nat. Acad. Sci., 2002, 99, 15869-15872 Stochastic Proximity Embedding, J. Comput. Chem., 2003, 24, 1215-1221 A Modified Rule for Stochastic Proximity Embedding, J. Mol. Graph. Model., 2003, 22, 133-140 A Geodesic Framework for Analyzing Molecular Similarities, J. Chem. Inf. Comput. Sci., 2003, 43, 475-484

See Also

eval.stress, sample.max.distance

Examples

Run this code
## load the phone dataset
data(phone)

## run SPE, embed$stress should be 0 or very close to it
## You can plot the embedding using the scatterplot3d package
## (This will take a few minutes to run)
embed <- spe(phone, edim=3, evalstress=TRUE)

## evaluate the Sammon stress
stress <- eval.stress(embed$x, phone)

## embed the Swiss Roll dataset in 2D
data(swissroll)
embed <- spe(swissroll, edim=2, evalstress=TRUE)

Run the code above in your browser using DataLab