Classify trajectories based on the factors identified in step2factors.
step3clusters(
trajFactors,
nclusters = NULL,
nstart = 50,
criteria = "ccc",
forced.factors = NULL
)
The function returns a traj
object that contains objects carried through steps 1 and 2 which includes the original data, measures and factors.
Furthermore, it includes a data.frame containing the ID corresponding to each trajectory, and the cluster number in which the trajectory was classified. This is stored in the clusters
field of the traj
object. It also contains the cluster distribution of the observations.
Methods to plot the output of step3clusters
include:
plots a 10 person sample from every cluster
plots the median trajectory of the clusters
plots the mean trajectory of the clsuters
produce a boxplot of trajectories of every cluster
Object generated by step2factors
. Contains
data factors, eigenvalues, principal factors as well as
the original data.
Integer number indicating the number
of clusters to use in order to classify the trajectories. If NULL
,
the function selects the number of clusters based on an automated criteria specified by index.. Defaults
to NULL
.
Integer number designating the number of
seedings that kmeans
should do in order to cluster the
String indicating the criteria to
select the number of clusters. Defaults to
ccc
(Cubic clustering criterion).
(Optional) Vector containing the names of the measures calculated in
step1measures
to force as factors for the clustering. This vector will override the factors selected by step2factors
. Available options: "m1", "m2", "m3", ... ,"m23" and "m24". Defaults to NULL
. See details.
Marie-Pierre Sylvestre, Dan Vatnik
marie-pierre.sylvestre@umontreal.ca
If nclusters
is set to NULL
, the function will use the
NbClust
function to select the
optimal number of clusters. The NbClust
function
uses kmeans
as the cluster analysis method. Te measures are standardized within
the step3clusters function prior to clustering. The criteria
to be computed can be chosen by the criteria
argument.
The list of available methods and criteria can be found
in the NbClust
help page. Criteria compatible with step3clusters
are:
"ch", "kl", "ccc", "hartigan", "scott", "trcovw", "tracew" and "friedman". It is important
to note that some of these criteria will not always yield the same number of clusters when
run multiple times. Increasing nstart
will generally stabilize the results.
The function then uses kmeans
in order to cluster the trajectories
in the required number of clusters. If nclusters
is
set to NULL
, then the number of clusters is computed by
then the data will be classified into that number of clusters.
kmeans
uses the nstart
argument in order to select how
many random sets should be run during its execution. If
the function does not converge, increasing nstart
can
improve the result. PLease consult the kmeans
help page for more information.
When forced.factors
is set to NULL
, the function will select the factors identified
by step2factors
in order to cluster the trajectories. When the parameter is set to a vector,
it must contain at least one measure name such as: "m1", "m2", "m3", ... ,"m23" and "m24". The function will then
cluster the trajectories using the stated measures. These measures are generated by step1measures
. They range from "m1" to "m24". All of these measures are found in the trajMeasures
object.
When the plot function is run without changing the default values, only a traj
object
is required. The function will generate a multiplot of all
the clusters. In each plot, 10 randomly selected
trajectories will be traced. The same number of trajectories for each cluster
will be plotted. If the function is rerun, the plots will
not look the same because the trajectories are randomly sampled.
Seeding is required in order replicate a plot.
If color.vect
is NULL
, the function will randomly assign
a color to each trajectory. The same colors will be used
for all the trajectories in each plot. If specific colors
are chosen, there must be as many colors in the vector as
there are trajectories to be plotted or an error will
thrown.
If clust.num
is set to an integer, the cluster associated
with that integer will be plotted. Only that one will be
displayed among the available clusters.
The print function displays the number of observations used in the computation of traj
,
the number of clusters as well as the number of observations in each one and
the measures set as factors. These factors are used to cluster the data.
The number of decimal places is defaulted to 2, it can be changed in the arguments
of step3clusters
.
The summary function displays the number of observations analysed as well as the total number of
clusters into which the data was classified.
Prints the eigenvalues used to determine the number of
factors to be selected in step2factors
.
Prints summary statistics of each of the factors by cluster.
The number of decimal places is defaulted to 2, it can be changed in the parameters
of step3clusters
.
if (FALSE) {
# Setup data
data = example.data$data
# Run step1measures, step2factors and step3clusters
s1 = step1measures(data, ID=TRUE)
s2 = step2factors(s1)
s3 = step3clusters(s2)
# Print and plot 'traj' object
s3
plot(s3)
# Run step3clusters with predetermined number of clusters
s3.4clusters = step3clusters(s2, nclusters=4)
# Display 'traj' object s3.4clusters
summary(s3.4clusters)
plot(s3.4clusters)
s3$cluster[1:10,]
}
Run the code above in your browser using DataLab