Learn R Programming

traj (version 1.2)

step3clusters: Cluster Trajectories According to the Subset of Measures Selected Previously

Description

Classify trajectories based on the factors identified in step2factors.

Usage

step3clusters(trajFactors, nclusters = NULL, nstart = 50, 
              criteria = "ccc", forced.factors = NULL)

# S3 method for traj print(x, round.pos = 2, ...)

# S3 method for traj summary(object, round.pos = 2, ...)

# S3 method for traj plot(x, num.samples = 10, clust.num = NULL, color.vect = NULL, ...)

Value

The function returns a traj object that contains objects carried through steps 1 and 2 which includes the original data, measures and factors.

Furthermore, it includes a data.frame containing the ID corresponding to each trajectory, and the cluster number in which the trajectory was classified. This is stored in the clusters field of the traj object. It also contains the cluster distribution of the observations.

Methods to plot the output of step3clusters include:

plot

plots a 10 person sample from every cluster

plotMedTraj

plots the median trajectory of the clusters

plotMeanTraj

plots the mean trajectory of the clsuters

plotBoxplotTraj

produce a boxplot of trajectories of every cluster

Arguments

trajFactors

Object generated by step2factors. Contains data factors, eigenvalues, principal factors as well as the original data.

nstart

Integer number designating the number of seedings that kmeans should do in order to cluster the trajectories. Defaults to 50.

nclusters

Integer number indicating the number of clusters to use in order to classify the trajectories. If NULL, the function selects the number of clusters based on an automated criteria specified by index.. Defaults to NULL.

criteria

String indicating the criteria to select the number of clusters. Defaults to ccc (Cubic clustering criterion).

forced.factors

(Optional) Vector containing the names of the measures calculated in step1measures to force as factors for the clustering. This vector will override the factors selected by step2factors. Available options: "m1", "m2", "m3", ... ,"m23" and "m24". Defaults to NULL. See details.

x

traj object created by step3clusters

object

traj object created by step3clusters

round.pos

Value indicating the number of decimal places to display in the print and in the summary functions. Defaults to 2.

num.samples

Integer indicating the number of individuals to plot in each cluster. Defaults to 10.

clust.num

Integer indicating the cluster to plot. It will be the only plot generated.NULL to print all clusters. Defaults to NULL.

color.vect

Vector of colors that will represent each individual trajectory in the cluster plot.Possible implementation: c(1,2,3,4,5,6,7,8,9,10) for 10 randomly sampled individuals. Defaults to NULL.

...

Arguments for generic s3 functions

Author

Dan Vatnik, Marie-Pierre Sylvestre
dan.vatnik@gmail.com

Details

If nclusters is set to NULL, the function will use the NbClust function to select the optimal number of clusters. The NbClust function uses kmeans as the cluster analysis method. The criteria to be computed can be chosen by the criteria argument. The list of available methods and criteria can be found in the NbClust help page. Criteria compatible with step3clusters are: "ch", "kl", "ccc", "hartigan", "scott", "trcovw", "tracew" and "friedman". It is important to note that some of these criteria will not always yield the same number of clusters when run multiple times. Increassing nstart will generaly stabilize the results.

The function then uses kmeans in order to cluster the trajectories in the required number of clusters. If nclusters is set to NULL, then the number of clusters is computed by NbClust, if it is set to a positive non-zero integer, then the data will be classified into that number of clusters. kmeans uses the nstart argument in order to select how many random sets should be run during its execution. If the function does not converge, increasing nstart can improve the result. PLease consult the kmeans help page for more information.

When forced.factors is set to NULL, the function will select the factors identified by step2factors in order to cluster the trajectories. When the parameter is set to a vector, it must contain at least one measure name such as: "m1", "m2", "m3", ... ,"m23" and "m24". The function will then cluster the trajectories using the stated measures. These measures are generated by step1measures. They range from "m1" to "m24". All of these measures are found in the trajMeasures object.

When the plot function is run without changing the default values, only a traj object is required. The function will generate a multiplot of all the clusters. In each plot, 10 randomly selected trajectories will be traced. The same number of trajectories for each cluster will be plotted. If the function is rerun, the plots will not look the same because the trajectories are randomly sampled. Seeding is required in order replicate a plot.

If color.vect is NULL, the function will randomly assign a color to each trajectory. The same colors will be used for all the trajectories in each plot. If specific colors are chosen, there must be as many colors in the vector as there are trajectories to be plotted or an error will thrown.

If clust.num is set to an integer, the cluster associated with that integer will be plotted. Only that one will be displayed among the available clusters.

The print function displays the number of observations used in the computation of traj, the number of clusters as well as the number of observations in each one and the measures set as factors. These factors are used to cluster the data. The number of decimal places is defaulted to 2, it can be changed in the arguments of step3clusters.

The summary function displays the number of observations analysed as well as the total number of clusters into which the data was classified. Prints the eigenvalues used to determine the number of factors to be selected in step2factors. Prints summary statistics of each of the factors by cluster. The number of decimal places is defaulted to 2, it can be changed in the parameters of step3clusters.

See Also

NbClust kmeans step1measures step2factors plot

Examples

Run this code
# Setup data and time
data = example.data$data
time = example.data$time

# Run step1measures, step2factors and step3clusters
s1 = step1measures(data,time, ID=TRUE)
s2 = step2factors(s1)
s3 = step3clusters(s2)

# Print and plot "traj object"
s3
plot(s3)


# Run step3measures with predetermined number of colusters
s3.4clusters = step3clusters(s2, nclusters=4)

# Display "traj" object
s3.4clusters
summary(s3.4clusters)
plot(s3.4clusters)

s3$cluster[1:10,]

Run the code above in your browser using DataLab