Learn R Programming

rgr (version 1.1.15)

gx.2dproj: Function to Compute and Display 2-d Projections for Data Matrices

Description

Function computes and displays 2-d projections of data matrices using either Sammon Non-linear Mapping (default), Multidimensional Scaling, Kruskal's non-metric Multidimensional Scaling (see Venables and Ripley (2001) and Cox and Cox (2001)). The original S-Plus implementation also computed the Minimum Spanning Tree plane projection (Friedman and Rafsky, 1981) as it was available in the Venables and Ripley MASS library for S-Plus. However, the R implememntation of the MASS library does not include Minimum Spanning Trees. In the R implementation, Projection Pursuit has been added using the fastICA procedure of Hyvarinen and Oja (2000). Provision is made to optionally trim individuals (rows) from the input data matrix.

Usage

gx.2dproj(xx, proc = "sam", ifilr = FALSE, log = FALSE, rsnd = FALSE, snd = FALSE,
	range = FALSE, main = "", setseed = FALSE, row.omits = NULL, ...)

Arguments

xx

then by p matrix for which the 2-d projection is required.

proc

the 2-d projection procedure required, the default is proc = "sam" for Sammon Non-Linear Mapping. For Classic (metric) Multidimensional Scaling use proc = "mds", for Kruskal's non-metric Multidimensional Scaling use "iso", and for Projection Pursuit use "ica".

ifilr

optional isometric log-ratio transformation, the default is no transformation. Recommended for closed compositionl, geochemical, data, when ifilr = TRUE all other transformations are ignored.

log

optional (natural) log transformation of the data, the default is no log transformation. For a log transformation set log = TRUE.

rsnd

optional robust normalization of the data with matrix column medians and MADs, the default is no transformation. For a robust normalization set rsnd = TRUE.

snd

optional normalization of the data with matrix column means and standard deviations, the default is no transformation. For a normalization set snd = TRUE. If rsnd = TRUE, then snd will be set to FALSE.

range

optional range transformation for the matrix columns, the data values being scaled to between zero and one for, respectively, the minimum and maximum column values. If the data are range transformed, other normalization transformation requests will be ignored.

main

an alternative plot title, see Details below.

row.omits

permits rows, individuals, to be trimmed from the input matrix, the default row.omits = NULL is for no trimming. To trim individuals enter their row numbers as a concatenated string, e.g. row.omits = c(13,15,16). The list may be extended by adding additional row numbers so as to display the 2-d structure of the remaining core data and whether further multivariate outliers are present.

setseed

sets the random number seed for fastICA so that all runs result in the same projection, and that projection is generally similar to the Sammon projection on the ilr transformed Howarth - Sinding-Larsen data set.

further arguments to be passed to methods concerning the generated plots. For example, if smaller plotting characters are required, specify cex = 0.8; or if some colour other than black is required for the plotting characters, specify col = 2 to obtain red (see display.lty for the default colour palette). If it is required to make the plot title smaller, add cex.main = 0.9 to reduce the font size by 10%.

Value

The following are returned as an object to be saved for further use:

main

the plot title.

input

a text string containing the name of the n by p matrix containing the data, and a list of the row numbers of any individuals trimmed, if none are trimmed the entry is NULL.

usage

The projection option selected, and the values, TRUE or FALSE, for the ilr, log, robust normalization, normalization, and range transformation options.

xlab

the 2-d projection x-axis label.

ylab

the 2-d projection y-axis label.

matnames

the individal, sample, row identifiers and the names of the input variables. If there are no individual, sample, row identifiers then row numbers are used. If an ilr transform has been used the variable names will be the (p-1) synthetic ilr variable names. If a trim has been executed only the row identifiers for the remaining data are stored.

row.numbers

the row numbers of the individuals, samples, remaining after a trim. If a trim has been executed only the row numbers for the remaining data are stored.

x

the n x-axis values for the 2-d projection.

y

the n y-axis values for the 2-d projection.

stress

the estimated stress of fitting 2-d projection to the p-space data.

Details

If main is undefined a default plot title is generated by appending the input matrix name to the text string "2-d Projection for: ". If no plot title is required set main = " ", or if a user defined plot title is required it should be defined in main, e.g., main = "Plot Title Text".

Firstly, it is strongly recommended that if the input data matrix is for data from a closed compositional, geochemical, data matrix that an ilr transform be applied to the data, ifilr = TRUE. This has the effect of reducing the dimension of the data matrix from p to (p-1). Otherwise, it is desirable to normalize, centre and scale, or undertake a range transformation on the data to ensure the variables have equal ‘weight’ in the projections. If no transformation is requested a warning message is displayed.

The x- and y-axis labels are set appropriately to indicated the type of 2-d projection in the display.

A measure of the ‘stress’ in generating the 2-d projection is estimated and displayed, low stress indicates the projection faithfully represents the relative ‘positions’ of the data in the original p-space.

References

Cox, T.F. and Cox, M.A.A., 2001. Multidimensional Scaling. Chapman and Hall, 308 p.

Friedman, J.H. and Rafsky, L.C., 1981. Graphics for the multivariate two-sample problem. Journal of the American Statistical Association, 76(374):277-291.

Hyvarinen, A. and Oja, E., 2000. Independent Component Analysis: Algorithms and Applications. Neural Networks, 13(4-5):411-430.

Reimann, C., Filzmoser, P., Garrett, R. and Dutter, R., 2008. Statistical Data Analysis Explained: Applied Environmental Statistics with R. John Wiley & Sons, Ltd., 362 p.

Venables, W.N. and Ripley, B.D., 2001. Modern Applied Statistics with S-Plus, 3rd Edition. Springer, 501 p.

See Also

ltdl.fix.df, remove.na, gx.2dproj.plot, sammon, cmdscale, isoMDS, fastICA, set.seed

Examples

Run this code
# NOT RUN {
## Make test data available
data(sind.mat2open)

## Display default, Sammon non-linear map, 2-d projection
sind.2dproj <- gx.2dproj(sind.mat2open, ifilr = TRUE)

## Display saved object identifying input matrix row numbers (cex = 0.7),
## and with an alternate main title (cex.main = 0.8) 
gx.2dproj.plot(sind.2dproj, rowids = TRUE, cex = 0.7, cex.main = 0.8,
	main = "Howarth & Sinding-Larsen\nStream Sediment ilr Transformed Data")

## Display Kruskal's non-metric multidimensional scaling 2-d projection
sind.2dproj <- gx.2dproj(sind.mat2open, proc = "iso", ifilr = TRUE)

## Display saved object identifying input matrix row numbers (cex = 0.7),
## and with an alternate main title (cex.main = 0.8) 
gx.2dproj.plot(sind.2dproj, rowids = FALSE, cex = 0.7, cex.main = 0.8, 
	main = "Howarth & Sinding-Larsen\nStream Sediment ilr Transformed Data")

## Display default, Sammon non-linear map, 2-d projection, removing the three
## most extreme individuuals
sind.2dproj.trim3 <- gx.2dproj(sind.mat2open, ifilr = TRUE, row.omits = c(13,15,16))

## Clean-up
rm(sind.2dproj)
rm(sind.2dproj.trim3)
# }

Run the code above in your browser using DataLab