Learn R Programming

akmedoids (version 0.1.5)

akmedoids.clust: Anchored k-medoids clustering

Description

Given a list of trajectories and a functional method, this function clusters the trajectories into a k number of groups. If a vector of two numbers is given, the function determines the best solution from those options based on the Calinski-Harabasz criterion.

Usage

akmedoids.clust(traj, id_field = FALSE, method = "linear", k = c(3,6), crit="Silhouette")

Arguments

traj

[matrix (numeric)]: longitudinal data. Each row represents an individual trajectory (of observations). The columns show the observations at consecutive time steps.

id_field

[numeric or character] Whether the first column of the traj is a unique (id) field. Default: FALSE. If TRUE the function recognises the second column as the first time points.

method

[character] The parametric initialisation strategy. Currently, the only available method is a linear method, set as "linear". This uses the time-dependent linear regression lines and the resulting groups are order in the order on increasing slopes.

k

[integer or vector (numeric)] either an exact integer number of clusters, or a vector of length two specifying the minimum and maximum numbers of clusters to be examined from which the best solution will be determined. In either case, the minimum number of clusters is 3. The default is c(3,6).

crit

[character] a string specifying the type of the criterion to use for assessing the quality of the cluster solutions, when k is a vector of two values (as above). Default:crit="Silhouette", use the average Silhouette width (Rousseeuw P. J. 1987). Using the "Silhouette" criterion, the optimal value of k can be determined as the elbow point of the curve. Other valid criterion is the "Calinski_Harabatz" (Calinski T. & Harabatz J. 1974) in which the maximum score represent the point of optimality. Having determined the optimal k, the function can then be re-run, using the exact (optimal) value of k.

Value

If k is a vector of two numbers (see param. k details above), the output is a graphical plot of the quality scores of the cluster solutions. If k is an exact integer number of clusters, the function returns trajectory labels indicating the group membership of the corresponding trajectory in the traj object.

Details

This function works by first approximating the trajectories based on the chosen parametric forms (e.g. linear), and then partitions the original trajectories based on the form groupings, in similar fashion to k-means clustering (Genolini et al. 2015). The key distinction of akmedoids compared with existing longitudinal approaches is that both the initial starting points as well as the subsequent cluster centers (as the iteration progresses) are based the selection of observations (medoids) as oppose to centroids.

References

1. Genolini, C. et al. (2015) kml and kml3d: R Packages to Cluster Longitudinal Data. Journal of Statistical Software, 65(4), 1-34. URL http://www.jstatsoft.org/v65/i04/. 2. Rousseeuw P. J. (1987) Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math 20:53<U+2013>65. 3. Calinski T, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat 3:1-27.

Examples

Run this code
# NOT RUN {
traj <- traj
print(traj)
traj <- dataImputation(traj, id_field = TRUE, method = 2, replace_with = 1, fill_zeros = FALSE)
traj <- props(traj, id_field = TRUE)
print(traj)
output <- akmedoids.clust(traj, id_field = TRUE, method = "linear", k = c(3))
print(output)  #type 'as.vector(output$memberships)'
# }

Run the code above in your browser using DataLab