Learn R Programming

classiFunc (version 0.1.1)

computeDistMat: Compute a distance matrix for functional observations

Description

This mainly internal function offers a unified framework to access the dist function from the proxy package and additional (semi-)metrics.

Usage

computeDistMat(x, y = NULL, method = "Euclidean", dmin = 0, dmax = 1,
  dmin1 = 0, dmax1 = 1, dmin2 = 0, dmax2 = 1, t1 = 0, t2 = 1,
  .poi = seq(0, 1, length.out = ncol(x)), custom.metric = function(x, y, lp
  = 2, ...) {     return(sum(abs(x - y)^lp)^(1/lp)) }, a = NULL, b = NULL,
  c = NULL, lambda = 0, ...)

Arguments

x

[matrix] matrix containing the functional observations as rows.

y

[matrix] see x. The default NULL uses y = x.

method

[character(1)] character string describing the distance function to be used. For a full list execute metricChoices().

Euclidean

equals Lp with p = 2. This is the default.

Lp, Minkowski

the distance for an Lp-space, takes p as an additional argument in ....

Manhattan

equals Lp with p = 1.

supremum, max, maximum

equals Lp with p = Inf. The supremal pointwise difference between the curves.

and ...

all other available measures for dist.

shortEuclidean

Euclidean distance on a limited part of the domain. Additional arguments dmin and dmax can be specified in ..., giving the position of the first and the last point to use of an evenly spaced sequence from 0 to 1 of length length(grid). The default values are dmin = o and dmax = 1, which results in the Euclidean distance on the entire domain.

mean

the absolute similarity of the overall mean values of the observations.

relAreas

the difference of the relation of two areas on parts of the domain given by dmin1 to dmax1 and dmin2 to dmax2. They are defined analogously to dmin and dmax and take the same default values.

jump

the similarity of jump heights at points t1 and t2, i.e. x[t1 * length(x)] - x[t2 * length(x)] for every functional observation x. The points t1 and t2 are the positions in an evenly spaced sequence from 0 to 1 of length length(grid) for which to compare the jump height. The default values are t1 = 0 and t2 = 1.

globMax

the difference of the curves global maxima.

globMin

the difference of the curves global minima.

points

the mean absolute differences at certain observation points .poi, also called "points of impact". These are specified as a vector .poi of arbitrary length with values between 0 and 1, encoding the the index of the points of observations. The default value is .poi = seq(0, 1, length.out = length(grid)), which results in the Manhattan distance.

custom.metric

your own semimetric will be used. Specify your own distance function in the argument custom.metric.

amplitudeDistance,phaseDistance

The amplitude distance or phase distance as described in Srivastava, A. and E. P. Klassen (2016). Functional and Shape Data Analysis. Springer.

FisherRao, elasticMetric

the elastic distance of the square root velocity of the curves as described in Srivastava and Klassen (2016). This equates to the Fisher Rao metric.

elasticDistance

weighted mean of the amplitude and the phase distance using the implementation in elastic.distance. Additional arguments are the numeric the penalization parameters a,b,c for the amplitude distance (a^2) and the phase distance (b^2). The default values are a = 1/2, b = 1. Alternatively c denotes the ratio of 2*a and b. lambda is the additional penalization parameter for the warping allowed before calculating the elastic distance. The default is 1.

rucrdtw, rucred

Dynamic Time Warping Distance and Euclidean Distance from package rucrdtw. Implemented in Boersch-Supan (2016) and originally described in Rakthanmanon et al. (2012).

dmin, dmax, dmin1, dmax1, dmin2, dmax2

[integer(1)] encode the indices used to define subspaces for method %in% c("shortEuclidean", "relAreas") as numeric values between 0 and 1, where 0 encodes grid[1] and 1 encodes grid[length(grid)].

t1, t2

[numeric(1)] encode the position of the points for which to compare the jump heights in method = "jump" as numeric values between 0 and 1, see dmin.

.poi

[numeric(1 to ncol(x))] numeric vector of length arbitrary length taking numeric values between 0 and 1, denoting the position of the points of interest for method = "points". The default value is .poi = seq(0, 1, length.out = length(grid)), which results in the Manhattan distance.

custom.metric

[function(x, y, ...)] a function specifying how to compute the distance between two functional observations (= numeric vectors of the same length) x and y. It can handle additional arguments in .... The default is the Euclidean distance (equals Minkwoski distance with lp = 2). Used for method = "custom.metric".

a, b, c

[numeric(1)] weights of the amplitude distance (a) and the phase distance (b) in a semimetric that combines them by addition. Used for method == 'elasticDistance'.

lambda

[numeric(1)] penalization parameter for the warping allowed before calculating the elastic distance. Default value is 0. Large values imply less (no) warping, small values imply more warping. Used for method %in% c('elastic', 'SRV').

...

additional parameters to the (semi-)metrics.

Value

a matrix of dimensions nrow(x) by nrow(y) containing the distances of the functional observations contained in x and y, if y is specified. Otherwise a matrix containing the distances of all functional observations within x to each other.

References

Boersch-Supan (2016). rucrdtw: Fast time series subsequence search in R. The Journal of Open Source Software URL http://doi.org/10.21105/joss.00100

Fuchs, K., J. Gertheiss, and G. Tutz (2015): Nearest neighbor ensembles for functional data with interpretable feature selection. Chemometrics and Intelligent Laboratory Systems 146, 186 - 197.

Rakthanmanon, Thanawin, et al. "Searching and mining trillions of time series subsequences under dynamic time warping." Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2012.

Srivastava, A. and E. P. Klassen (2016). Functional and Shape Data Analysis. Springer.