step3clusters: Cluster Trajectories According to the Subset of Measures Selected Previously

Description

Classify trajectories based on the factors identified in step2factors.

Usage

step3clusters(
  trajFactors,
  nclusters = NULL,
  nstart = 50,
  criteria = "ccc",
  forced.factors = NULL
)

Value

The function returns a traj object that contains objects carried through steps 1 and 2 which includes the original data, measures and factors.

Furthermore, it includes a data.frame containing the ID corresponding to each trajectory, and the cluster number in which the trajectory was classified. This is stored in the clusters field of the traj object. It also contains the cluster distribution of the observations.

Methods to plot the output of step3clusters include:

plot: plots a 10 person sample from every cluster
plotMedTraj: plots the median trajectory of the clusters
plotMeanTraj: plots the mean trajectory of the clsuters
plotBoxplotTraj: produce a boxplot of trajectories of every cluster

Arguments

trajFactors: Object generated by step2factors. Contains data factors, eigenvalues, principal factors as well as the original data.
nclusters: Integer number indicating the number of clusters to use in order to classify the trajectories. If NULL, the function selects the number of clusters based on an automated criteria specified by index.. Defaults to NULL.
nstart: Integer number designating the number of seedings that kmeans should do in order to cluster the
criteria: String indicating the criteria to select the number of clusters. Defaults to ccc (Cubic clustering criterion).
forced.factors: (Optional) Vector containing the names of the measures calculated in step1measures to force as factors for the clustering. This vector will override the factors selected by step2factors. Available options: "m1", "m2", "m3", ... ,"m23" and "m24". Defaults to NULL. See details.

Author

Marie-Pierre Sylvestre, Dan Vatnik

marie-pierre.sylvestre@umontreal.ca

Details

If nclusters is set to NULL, the function will use the NbClust function to select the optimal number of clusters. The NbClust function uses kmeans as the cluster analysis method. Te measures are standardized within the step3clusters function prior to clustering. The criteria to be computed can be chosen by the criteria argument. The list of available methods and criteria can be found in the NbClust help page. Criteria compatible with step3clusters are: "ch", "kl", "ccc", "hartigan", "scott", "trcovw", "tracew" and "friedman". It is important to note that some of these criteria will not always yield the same number of clusters when run multiple times. Increasing nstart will generally stabilize the results.

The function then uses kmeans in order to cluster the trajectories in the required number of clusters. If nclusters is set to NULL, then the number of clusters is computed by then the data will be classified into that number of clusters. kmeans uses the nstart argument in order to select how many random sets should be run during its execution. If the function does not converge, increasing nstart can improve the result. PLease consult the kmeans help page for more information.

When forced.factors is set to NULL, the function will select the factors identified by step2factors in order to cluster the trajectories. When the parameter is set to a vector, it must contain at least one measure name such as: "m1", "m2", "m3", ... ,"m23" and "m24". The function will then cluster the trajectories using the stated measures. These measures are generated by step1measures. They range from "m1" to "m24". All of these measures are found in the trajMeasures object.

When the plot function is run without changing the default values, only a traj object is required. The function will generate a multiplot of all the clusters. In each plot, 10 randomly selected trajectories will be traced. The same number of trajectories for each cluster will be plotted. If the function is rerun, the plots will not look the same because the trajectories are randomly sampled. Seeding is required in order replicate a plot.

If color.vect is NULL, the function will randomly assign a color to each trajectory. The same colors will be used for all the trajectories in each plot. If specific colors are chosen, there must be as many colors in the vector as there are trajectories to be plotted or an error will thrown.

If clust.num is set to an integer, the cluster associated with that integer will be plotted. Only that one will be displayed among the available clusters.

The print function displays the number of observations used in the computation of traj, the number of clusters as well as the number of observations in each one and the measures set as factors. These factors are used to cluster the data. The number of decimal places is defaulted to 2, it can be changed in the arguments of step3clusters.

The summary function displays the number of observations analysed as well as the total number of clusters into which the data was classified. Prints the eigenvalues used to determine the number of factors to be selected in step2factors. Prints summary statistics of each of the factors by cluster. The number of decimal places is defaulted to 2, it can be changed in the parameters of step3clusters.

Examples

Run this code

if (FALSE) {
# Setup data 
data = example.data$data

# Run step1measures, step2factors and step3clusters
s1 = step1measures(data, ID=TRUE)
s2 = step2factors(s1)
s3 = step3clusters(s2)

# Print and plot 'traj' object
s3
plot(s3)

# Run step3clusters with predetermined number of clusters
s3.4clusters = step3clusters(s2, nclusters=4)

# Display 'traj' object s3.4clusters
summary(s3.4clusters)
plot(s3.4clusters)

s3$cluster[1:10,]

}

Run the code above in your browser using DataLab