Given a list of trajectories and a functional method,
this function clusters the trajectories into a k
number of
groups. If a vector of two numbers is given, the function determines
the best solution from those options based on the Cali<U+0144>ski-Harabasz
criterion.
akclustr(traj, id_field = FALSE, method = "linear",
k = c(3,6), crit="Silhouette", verbose = TRUE, quality_plot=FALSE)
[matrix (numeric)]: longitudinal data. Each row represents an individual trajectory (of observations). The columns show the observations at consecutive time steps.
[numeric or character] Whether the first column of the
traj
is a unique (id
) field. Default: FALSE
.
If TRUE
the function recognizes the second column as the first
time points.
[character] The parametric initialization strategy.
Currently, the only available method is a linear
method, set as
"linear"
. This uses the time-dependent linear regression lines
and the resulting groups are order in the order on increasing slopes.
[integer or vector (numeric)] either an exact integer number
of clusters, or a vector of length two specifying the minimum and
maximum numbers of clusters to be examined from which the best
solution will be determined. In either case, the minimum number
of clusters is 3
. The default is c(3,6)
.
[character] a string specifying the type of the criterion
to use for assessing the quality of the cluster solutions, when
k
is a vector of two values (as above). Default:
crit="Silhouette"
, use the average Silhouette width
(Rousseeuw P. J. 1987
). Using the "Silhouette"
criterion,
the optimal value of k
can be determined as the elbow point of
the curve. Other valid criterion is the "Calinski_Harabasz"
(Cali<U+0144>ski T. & Harabasz J. 1974
) in which the maximum score
represents the point of optimality. Having determined the optimal
k
, the function can then be re-run, using the exact (optimal)
value of k
.
to suppress output messages (to the console)
during clustering. Default: TRUE
.
Whether to show plot of quality criteria across
different values of k
. Default: FALSE
.
generates an akobject
consisting of the
cluster solutions at the specified values of k
. Also,
the graphical plot of the quality scores of the cluster
solutions.
This function works by first approximating the trajectories
based on the chosen parametric forms (e.g. linear), and then partitions
the original trajectories based on the form groupings, in similar
fashion to k-means clustering (Genolini et al. 2015)
. The key
distinction of akmedoids
compared with existing longitudinal
approaches is that both the initial starting points as well as the
subsequent cluster centers (as the iteration progresses) are based
the selection of observations (medoids) as oppose to centroids.
1
. Genolini, C. et al. (2015) kml and kml3d:
R Packages to Cluster Longitudinal Data. Journal of Statistical
Software, 65(4), 1-34. URL http://www.jstatsoft.org/v65/i04/.
2
. Rousseeuw P. J. (1987) Silhouettes: A graphical aid
to the interpretation and validation of cluster analysis.
J. Comput. Appl. Math 20:53<U+2013>65.
3
. Cali<U+0144>ski T, Harabasz J (1974) A dendrite method for
cluster analysis. Commun. Stat. 3:1-27.
# NOT RUN {
data(traj)
trajectry <- data_imputation(traj, id_field = TRUE, method = 2,
replace_with = 1, fill_zeros = FALSE)
trajectry <- props(trajectry$CompleteData, id_field = TRUE)
print(trajectry)
output <- akclustr(trajectry, id_field = TRUE,
method = "linear", k = c(3,7), crit='Calinski_Harabasz',
verbose = FALSE, quality_plot=FALSE)
print(output)
# }
Run the code above in your browser using DataLab