cluster_track() clusters trajectories based on various movement and velocity parameters calculated for each track.
Usage
cluster_track(
data,
veltrack,
variables,
analysis = c("hclust", "mclust"),
k = 2,
dist_method = "euclidean",
hclust_method = "complete",
transform = TRUE,
scale = TRUE,
max_clusters = NULL,
gauge_size = NA
)Arguments
- data
A
trackR object, which is a list consisting of two elements:Trajectories: A list of interpolated trajectories, where each trajectory is a series of midpoints between consecutive footprints.Footprints: A list of data frames containing footprint coordinates, metadata (e.g., image reference, ID), and a marker indicating whether the footprint is actual or inferred.
- veltrack
A
track velocityR object consisting of a list of lists, where each sublist contains the computed parameters for a corresponding track.- variables
A character vector specifying the movement parameters to be used in the clustering analysis. Valid parameter names include:
"TurnAng","sdTurnAng","PathLen","BeelineLen","StLength","sdStLength","StrideLen","PaceLen","Sinuosity","Straightness","TrackWidth","Gauge","PaceAng","StepAng","Velocity","sdVelocity","MaxVelocity","MinVelocity".- analysis
Character; clustering backend. One of
"hclust"(default; distance-based hierarchical clustering) or"mclust"(model-based Gaussian mixtures).- k
Integer or
NULL; number of clusters to cut the dendrogram whenanalysis = "hclust". Default2. IfNULL, no cut is made andclassificationis returned asNULL.- dist_method
Character; distance used when
analysis = "hclust". One of"euclidean"(default),"maximum","manhattan","canberra","binary", or"minkowski".- hclust_method
Character; agglomeration method when
analysis = "hclust". One of"complete"(default),"ward.D","ward.D2","single","average"(= UPGMA),"mcquitty"(= WPGMA),"median"(= WPGMC), or"centroid"(= UPGMC).- transform
Logical; if
TRUE(default), applies monotone transformations suited to variable support (e.g., log/logit) before clustering.- scale
Logical; if
TRUE(default), z-transforms (centers and scales) the variables used in clustering.- max_clusters
Integer. Maximum number of clusters (mixture components) to consider when
analysis = "mclust". IfNULL(default), usesmin(9, n), wherenis the number of valid tracks (observations) entering the model. The minimum number of clusters considered by default is1.- gauge_size
Numeric. Pes/manus length (or width) used to compute Gauge as
TrackWidth / gauge_size. IfNULLorNA, Gauge is returned asNA. If"Gauge"is included invariables,gauge_sizemust be a single positive numeric value.
Value
A track clustering R object consisting of a list containing the following elements:
matrix: A data frame containing the movement parameters calculated for each track (original scale/units as produced bytrack_param()andvelocity_track()).clust:When
analysis = "hclust": a list with:hclust: Thestats::hclustobject (dendrogram and merge history).dist: Thestats::distobject used to fithclust.k: The requested number of clusters used to cut the tree (if provided).classification: Cluster labels obtained bystats::cutree(hclust, k);NULLifk = NULL.
When
analysis = "mclust": anMclustobject containing the results of the model-based clustering analysis (optimal model selected by BIC). Main components include:call: The matched call.data: The input data matrix used byMclust.modelName: The covariance model at which the optimal BIC occurs.n: Number of observations.d: Data dimensionality.G: Optimal number of mixture components.BIC: Matrix of BIC values for models and components considered.loglik: Log-likelihood for the selected model.df: Number of estimated parameters.bic: BIC value of the selected model.icl: ICL value of the selected model.hypvol: Hypervolume parameter for the noise component (if applicable), otherwiseNULL.parameters: List of fitted parameters, including:pro: Mixing proportions (lengthG).mean: Component means (vector or matrix with columns as components).variance: List of variance parameters (structure depends onmodelName).
z: Matrix of posterior probabilities; entry(i, k)is P(observationiin classk).classification: Hard labels (MAP fromz).uncertainty: Classification uncertainty for each observation.
Details
The cluster_track() function can perform distance-based hierarchical clustering via the hclust() function from the stats package (analysis = "hclust", default) or model-based clustering via the Mclust() function from the mclust package
(analysis = "mclust").
The function first filters out tracks with fewer than four steps, as these tracks may not provide reliable movement data.
It then computes movement/trackway parameters via
track_param() (turning angles; path and beeline lengths; step/stride/pace; sinuosity/straightness; and
footprint-based metrics such as trackway width, gauge, pace angulation and step angle) and combines them with
velocity descriptors from veltrack. The selected variables are then used for clustering.
Finally, the selected movement parameters are used as input for clustering the tracks.
When analysis = "hclust": the function performs agglomerative hierarchical clustering
on a pairwise dissimilarity matrix computed from the working feature matrix (which already reflects the
internal expansion of circular variables to sine/cosine components, and any requested transformations and
z-scaling).
The algorithm starts with each track as its own cluster and iteratively merges the two closest clusters
according to the chosen hclust_method (linkage criterion) until a single cluster remains. The full
merge history is represented by a dendrogram that can be inspected and cut at any desired number of clusters.
Advantages include being distribution-free, since the method does not assume Gaussianity or any parametric mixture form.
It is also flexible, supporting multiple distance metrics, and highly interpretable, as the dendrogram reveals
multiscale clustering structure. In addition, the number of clusters does not need to be fixed in advance,
since it can be chosen a posteriori by cutting the tree. Disadvantages are that
the greedy nature of the algorithm makes merges irreversible, so no reassignment
is possible once clusters are combined. Results can also be sensitive to variable scaling, as well as the
choice of distance metric and linkage method. Certain linkage strategies, such as "median" or
"centroid", may produce dendrogram inversions, and the "ward.*" methods require Euclidean
geometry to preserve their variance-minimization interpretation.
If transform = TRUE and/or scale = TRUE, the distance is computed on the
transformed/scaled space. This is generally recommended so that variables on different scales do not dominate the
distances. Missing values in the selected variables will cause stats::dist() to fail; ensure inputs are
complete (or impute externally) before clustering.
The argument k controls the number of clusters to cut the dendrogram. By default it is set to 2.
If NULL, no cut is made and classification is returned as NULL. Setting k = 1
yields a single cluster containing all tracks.
The argument dist_method specifies the distance metric used to build the dissimilarity matrix,
passed to stats::dist. For a detailed explanation of distance metrics see ?dist.
The argument hclust_method specifies the agglomeration (linkage) method used for hierarchical clustering,
passed to stats::hclust. For a detailed explanation of linkage procedures see ?hclust.
When analysis = "mclust", clustering is performed using finite Gaussian mixture models
via the Mclust() function from the mclust package.
This approach assumes that the data are generated from a mixture of multivariate Gaussian distributions,
and selects both the number of mixture components and the covariance parameterization by optimizing
the Bayesian Information Criterion (BIC).
Advantages include statistical rigor, automatic model selection, and the ability to capture ellipsoidal cluster shapes. Limitations are the reliance on distributional assumptions (normality, homoscedasticity depending on the model), potential sensitivity to outliers, and increased computational cost for high-dimensional data. These assumptions are partially mitigated by the variable transformations applied before clustering (e.g., log or logit transforms), which help approximate Gaussian-like distributions.
If only one parameter is selected, the clustering is performed using equal variance ("E") and variable
variance ("V") Gaussian models. If more than one parameter is selected, all covariance models available
in mclust.options("emModelNames") are considered.
The argument max_clusters controls the maximum number of clusters (mixture components) to be explored.
If NULL (default), the maximum is set to min(9, n), where n is the number of valid tracks
(observations) entering the model. The minimum number of clusters considered by default is 1.
The following movement parameters can be included in the clustering:
"TurnAng": Turning angles for the track, measured in degrees. Internally expanded to sine and cosine components (circular treatment) before analysis. This measures how much the direction of movement changes at each step."sdTurnAng": The circular standard deviation of the turning angles, in degrees. Internally expanded to sine and cosine components before analysis to respect angular geometry."PathLen": Total path length (meters), computed as the sum of distances between consecutive trajectory points."BeelineLen": Straight-line distance (meters) between the first and last points of the trajectory."StLength": Mean step length for each step of the track (meters), representing how far the object moved between two consecutive trajectory points."sdStLength": The standard deviation of the step lengths (meters), showing how consistent the steps are in length."StrideLen": Mean stride length (meters). In the current implementation this is computed asMean_step_length * 2."PaceLen": Mean pace length (meters), computed as the mean distance between consecutive contralateral footprints (L–R or R–L) in footprint order."Sinuosity": A measure of the track's winding nature (dimensionless)."Straightness": The straightness of the track (dimensionless)."TrackWidth": Trackway width, i.e., the lateral spacing between left and right footprint series (units consistent with input coordinates, typically meters)."Gauge": Trackway gauge (dimensionless), computed asTrackWidth / gauge_size."PaceAng": Pace angulation (degrees), a classical ichnological descriptor summarizing the angular relation of successive steps within a trackway."StepAng": Mean step angle (degrees), computed as the mean angle between each pace segment and the inferred stride/trackway axis estimated by PCA."Velocity": The average velocity of the track, calculated as the total distance divided by the time elapsed between the first and last footprint (meters per second)."sdVelocity": The standard deviation of the velocity, indicating how much the velocity fluctuates throughout the track (meters per second)."MaxVelocity": The maximum velocity achieved during the track, identifying the fastest point (meters per second)."MinVelocity": The minimum velocity during the track, identifying the slowest point (meters per second).
The cluster_track() function has biological relevance in identifying groups of tracks with
similar movement parameters, providing insights into ecological and behavioral patterns. By
clustering tracks based on characteristics such as sinuosity, velocity, and turning angles (treated
circularly via sine/cosine), it helps detect movement patterns associated with specific behaviors.
This can reveal tracks potentially made by individuals moving together, useful for investigating
hypotheses on gregarious behavior, predation strategies, or coordinated movement. Additionally,
clustering serves as a preliminary step before similarity tests and simulations, refining track
selection and improving hypothesis testing in movement ecology studies.
References
Alexander, R. M. (1976). Estimates of speeds of dinosaurs. Nature, 261(5556), 129-130.
Ruiz, J., & Torices, A. (2013). Humans running at stadiums and beaches and the accuracy of speed estimations from fossil trackways. Ichnos, 20(1), 31-35.
Scrucca L., Fop M., Murphy T. B., & Raftery A. E. (2016) mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. The R Journal, 8(1), 289-317.
