meanFrechet: ~ Function: meanFrechet ~

Description

Compute the Frechet mean

Usage

meanFrechet(trajLong, timeScale = 0.1, FrechetSumOrMax = "sum", aggregationMethod = "all", shuffle = TRUE, sampleSize = NA, methodHclust = "average")

Arguments

trajLong

[data.frame]: trajectories in long format. The data.frame has to be (no choice!) in the following format: the first column should be the individual indentifiant. The second should be the times at which the measurement are made. The third one should be the measurements.

timeScale

[numeric]: allow to modify the time scale, increasing or decreasing the cost of the horizontal shift. If timeScale is very big, then the Frechet mean tends to the euclidienne distance. If timeScale is very small, then it tends to the Dynamic Time Warping.

FrechetSumOrMax

[character]: Like Frechet's distance, the Frechet Mean can be define using the 'sum' function or the 'max' function. This option let the user to chose one or the other.

aggregationMethod

[character]: define the agglomerative method used to compute the mean. Three methods are curently available: "all", "sample" and "hierarchical". See detail.

shuffle

[logical]: shall the order of the agglomeration should be randomly chosen? (only for methods "all" and "sample")

sampleSize

[integer]: define the size of the sample (for method 'sample' only).

methodHclust

[character]: define the distance between two clusters used by the hierarchical clustering. The methods available are the one usable by the function hclust

Value

data.frame holding a trajectory.

Details

Compute the Frechet mean, as define in [1]. The main idea of the algorithm is the following:

The Frechet mean of two trajectories can be easely define as the middle of the leash that joint the two trajectories (see meanFrechet2). Then the mean of n individual can be obtain by merging the individual trajectories two by two, then merging the resulting trajectories and so on until there is only one trajectory left. This last trajectory is the Frechet mean. Theoriticaly, the final result depend of the order of agglomeration. In practice, on large sample, this order has little impact on the final result (see [1] for detail).

So far, three agglomeration methods are availables:

all: the n individuals are scattered (randomly if shuffle=TRUE) on the leaves of a complete binary tree (all the knots have zero or two leaves) having depth h with 2^h <= n="" <2^h+1.="" the="" value="" of="" each="" non-terminal="" leaf="" is="" frechet="" mean="" for="" two="" trajectories="" children="" leaves.="" thus="" tree="" root.="" (informally,="" this="" structure="" close="" to="" that="" a="" tennis="" tournament).="" complexity="" method="" o(nt^2).="" <="" li="">
sample: This method is the method all applied only to a sample of sampleSize trajectories. The complexity of the method is $O(n^0t^2)$, $n^0$ being the size of the random sample.

hierarchical: the combination order between individuals is fixed in a deterministic way through an ascending hierarchical classification; the closest individuals being combined first. The complexity of this method is $O(n^2t^2)$.

Examples

Run this code

require(lattice)

### Define artificial data
g <- function(x)dnorm(0:20,runif(1,5,15),2)*rnorm(1,5,1)
dn <- data.frame(id=rep(1:20,each=21),
   times=rep((0:20),times=20),
   traj=as.numeric(sapply(1:20,g)),
   weight=1
)

xyplot(traj ~ times, data=dn, groups=id,type="l",ylim=c(0,1.4))
plot(meanFrechet(dn),ylim=c(0,1.4))
plot(meanFrechet(dn,0.001),ylim=c(0,1.4))
plot(meanFrechet(dn,10),ylim=c(0,1.4))