Title: | A Framework for Clustering Longitudinal Data |
---|---|
Description: | A framework for clustering longitudinal datasets in a standardized way. The package provides an interface to existing R packages for clustering longitudinal univariate trajectories, facilitating reproducible and transparent analyses. Additionally, standard tools are provided to support cluster analyses, including repeated estimation, model validation, and model assessment. The interface enables users to compare results between methods, and to implement and evaluate new methods with ease. The 'akmedoids' package is available from <https://github.com/MAnalytics/akmedoids>. |
Authors: | Niek Den Teuling [aut, cre] |
Maintainer: | Niek Den Teuling <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.6.1 |
Built: | 2025-03-07 06:07:45 UTC |
Source: | https://github.com/philips-software/latrend |
A framework for clustering longitudinal datasets in a standardized way. The package provides an interface to existing R packages for clustering longitudinal univariate trajectories, facilitating reproducible and transparent analyses. Additionally, standard tools are provided to support cluster analyses, including repeated estimation, model validation, and model assessment. The interface enables users to compare results between methods, and to implement and evaluate new methods with ease. The 'akmedoids' package is available from https://github.com/MAnalytics/akmedoids.
Unified cluster analysis, independent of the underlying algorithms used. Enabling users to compare the performance of various longitudinal cluster methods on the case study at hand.
Supports many different methods for longitudinal clustering out of the box (see the list of supported packages below).
The framework consists of extensible S4 methods based on an abstract model class, enabling rapid prototyping of new cluster methods or model specifications.
Standard plotting tools for model evaluation across methods (e.g., trajectories, cluster trajectories, model fit, metrics)
Support for many cluster metrics through the packages clusterCrit, mclustcomp, and igraph.
The structured and unified analysis approach enables simulation studies for comparing methods.
Standardized model validation for all methods through bootstrapping or k-fold cross-validation.
The supported types of longitudinal datasets are described here.
The latrendData dataset is included with the package and is used in all examples.
The plotTrajectories()
function can be used to visualize any longitudinal dataset, given the id
and time
are specified.
data(latrendData) head(latrendData) options(latrend.id = "Id", latrend.time = "Time") plotTrajectories(latrendData, response = "Y")
Discovering longitudinal clusters using the package involves the specification of the longitudinal cluster method that should be used.
kmlMethod <- lcMethodKML("Y", nClusters = 3) kmlMethod
The specified method is then estimated on the data using the generic estimation procedure function latrend()
:
model <- latrend(kmlMethod, data = latrendData)
We can then investigate the fitted model using
summary(model) plot(model) metric(model, c("WMAE", "BIC")) qqPlot(model)
Create derivative method specifications for 1 to 5 clusters using the lcMethods()
function.
A series of methods can be estimated using latrendBatch()
.
kmlMethods <- lcMethods(kmlMethod, nClusters = 1:5) models <- latrendBatch(kmlMethods, data = latrendData)
Determine the number of clusters through one or more internal cluser metrics.
This can be done visually using the plotMetric()
function.
plotMetric(models, c("WMAE", "BIC"))
Further step-by-step instructions on how to use the package are described in the vignettes.
See vignette("demo", package = "latrend")
for an introduction to conducting a longitudinal cluster analysis on a example case study.
See vignette("simulation", package = "latrend")
for an example on conducting a simulation study.
See vignette("validation", package = "latrend")
for examples on applying internal cluster validation.
See vignette("implement", package = "latrend")
for examples on constructing your own cluster models.
Data requirements and datasets: latrend-data latrendData PAP.adh
High-level method recommendations and supported methods: latrend-approaches latrend-methods
Method specification: lcMethod lcMethods
Method estimation: latrend latrendRep latrendBatch latrendBoot latrendCV latrend-parallel Steps performed during estimation
Model functions: lcModel clusterTrajectories plotClusterTrajectories postprob trajectoryAssignments predictPostprob predictAssignments predict.lcModel predictForCluster fitted.lcModel fittedTrajectories
Maintainer: Niek Den Teuling [email protected] (ORCID)
Other contributors:
Steffen Pauws [email protected] [contributor]
Edwin van den Heuvel [email protected] [contributor]
Koninklijke Philips N.V. [copyright holder]
Useful links:
Report bugs at https://github.com/philips-software/latrend/issues
Retrieve and evaluate a lcMethod argument by name
## S4 method for signature 'lcMethod' x$name ## S4 method for signature 'lcMethod' x[[i, eval = TRUE, envir = NULL]]
## S4 method for signature 'lcMethod' x$name ## S4 method for signature 'lcMethod' x[[i, eval = TRUE, envir = NULL]]
x |
The |
name |
The argument name, as |
i |
Name or index of the argument to retrieve. |
eval |
Whether to evaluate the call argument (enabled by default). |
envir |
The |
The argument call
or evaluation result.
Other lcMethod functions:
as.data.frame.lcMethod()
,
as.data.frame.lcMethods()
,
as.lcMethods()
,
as.list.lcMethod()
,
evaluate.lcMethod()
,
formula.lcMethod()
,
lcMethod-class
,
names,lcMethod-method
,
update.lcMethod()
method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time", nClusters = 3) method$nClusters # 3 m = lcMethodLMKM(Y ~ Time, id = "Id", time = "Time", nClusters = 5) m[["nClusters"]] # 5 k = 2 m = lcMethodLMKM(Y ~ Time, id = "Id", time = "Time", nClusters = k) m[["nClusters", eval=FALSE]] # k
method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time", nClusters = 3) method$nClusters # 3 m = lcMethodLMKM(Y ~ Time, id = "Id", time = "Time", nClusters = 5) m[["nClusters"]] # 5 k = 2 m = lcMethodLMKM(Y ~ Time, id = "Id", time = "Time", nClusters = k) m[["nClusters", eval=FALSE]] # k
Computes the average posterior probability of assignment (APPA) for each cluster.
APPA(object)
APPA(object)
object |
The model, of type |
The APPA per cluster, as a numeric vector
of length nClusters(object)
.
Empty clusters will output NA
.
Nagin DS (2005). Group-based modeling of development. Harvard University Press. ISBN 9780674041318, doi:10.4159/9780674041318.
Klijn SL, Weijenberg MP, Lemmens P, van den Brandt PA, Passos VL (2017). “Introducing the fit-criteria assessment plot - A visualisation tool to assist class enumeration in group-based trajectory modelling.” Statistical Methods in Medical Research, 26(5), 2424-2436.
van der Nest G, Lima Passos V, Candel MJ, van Breukelen GJ (2020). “An overview of mixture modelling for latent evolutions in longitudinal data: Modelling approaches, fit statistics and software.” Advances in Life Course Research, 43, 100323. ISSN 1040-2608, doi:10.1016/j.alcr.2019.100323.
Converts the arguments of a lcMethod
to a named list
of atomic types.
## S3 method for class 'lcMethod' as.data.frame(x, ..., eval = TRUE, nullValue = NA, envir = NULL)
## S3 method for class 'lcMethod' as.data.frame(x, ..., eval = TRUE, nullValue = NA, envir = NULL)
x |
|
... |
Additional arguments. |
eval |
Whether to evaluate the arguments in order to replace expression if the resulting value is of a class specified in |
nullValue |
Value to use to represent the |
envir |
The |
A single-row data.frame
where each columns represents an argument call or evaluation.
Other lcMethod functions:
[[,lcMethod-method
,
as.data.frame.lcMethods()
,
as.lcMethods()
,
as.list.lcMethod()
,
evaluate.lcMethod()
,
formula.lcMethod()
,
lcMethod-class
,
names,lcMethod-method
,
update.lcMethod()
Converts a list of lcMethod
objects to a data.frame
.
## S3 method for class 'lcMethods' as.data.frame(x, ..., eval = TRUE, nullValue = NA, envir = parent.frame())
## S3 method for class 'lcMethods' as.data.frame(x, ..., eval = TRUE, nullValue = NA, envir = parent.frame())
x |
the |
... |
Additional arguments. |
eval |
Whether to evaluate the arguments in order to replace expression if the resulting value is of a class specified in |
nullValue |
Value to use to represent the |
envir |
The |
A data.frame
with each row containing the argument values of a method object.
Other lcMethod functions:
[[,lcMethod-method
,
as.data.frame.lcMethod()
,
as.lcMethods()
,
as.list.lcMethod()
,
evaluate.lcMethod()
,
formula.lcMethod()
,
lcMethod-class
,
names,lcMethod-method
,
update.lcMethod()
Generate a data.frame containing the argument values per method per row
## S3 method for class 'lcModels' as.data.frame(x, ..., excludeShared = FALSE, eval = TRUE)
## S3 method for class 'lcModels' as.data.frame(x, ..., excludeShared = FALSE, eval = TRUE)
x |
|
... |
Arguments passed to as.data.frame.lcMethod. |
excludeShared |
Whether to exclude columns which have the same value across all methods. |
eval |
Whether to evaluate the arguments in order to replace expression if the resulting value is of a class specified in |
A data.frame
.
Print an argument summary for each of the models.
Convert to a data.frame
of method arguments.
Subset the list.
Compute an internal metric or external metric.
Obtain the best model according to minimizing or maximizing a metric.
Obtain the summed estimation time.
Plot a metric across a variable.
Convert a list of lcMethod objects to a lcMethods list
as.lcMethods(x)
as.lcMethods(x)
x |
A |
A lcMethods
object.
Other lcMethod functions:
[[,lcMethod-method
,
as.data.frame.lcMethod()
,
as.data.frame.lcMethods()
,
as.list.lcMethod()
,
evaluate.lcMethod()
,
formula.lcMethod()
,
lcMethod-class
,
names,lcMethod-method
,
update.lcMethod()
Convert a list of lcModels to a lcModels list
as.lcModels(x)
as.lcModels(x)
x |
A |
A lcModels
object.
Print an argument summary for each of the models.
Convert to a data.frame
of method arguments.
Subset the list.
Compute an internal metric or external metric.
Obtain the best model according to minimizing or maximizing a metric.
Obtain the summed estimation time.
Plot a metric across a variable.
lcModels
Other lcModels functions:
lcModels
,
lcModels-class
,
max.lcModels()
,
min.lcModels()
,
plotMetric()
,
print.lcModels()
,
subset.lcModels()
Extract the method arguments as a list
## S3 method for class 'lcMethod' as.list(x, ..., args = names(x), eval = TRUE, expand = FALSE, envir = NULL)
## S3 method for class 'lcMethod' as.list(x, ..., args = names(x), eval = TRUE, expand = FALSE, envir = NULL)
x |
The |
... |
Additional arguments. |
args |
A |
eval |
Whether to evaluate the arguments. |
expand |
Whether to return all method arguments when |
envir |
The |
A list
with the argument call
s or evaluated results depending on the value for eval
.
Other lcMethod functions:
[[,lcMethod-method
,
as.data.frame.lcMethod()
,
as.data.frame.lcMethods()
,
as.lcMethods()
,
evaluate.lcMethod()
,
formula.lcMethod()
,
lcMethod-class
,
names,lcMethod-method
,
update.lcMethod()
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") as.list(method) as.list(method, args = c("id", "time")) if (require("kml")) { method <- lcMethodKML("Y", id = "Id", time = "Time") as.list(method) # select arguments used by kml() as.list(method, args = kml::kml) # select arguments used by either kml() or parALGO() as.list(method, args = c(kml::kml, kml::parALGO)) }
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") as.list(method) as.list(method, args = c("id", "time")) if (require("kml")) { method <- lcMethodKML("Y", id = "Id", time = "Time") as.list(method) # select arguments used by kml() as.list(method, args = kml::kml) # select arguments used by either kml() or parALGO() as.list(method, args = c(kml::kml, kml::parALGO)) }
Get the cluster names
clusterNames(object, factor = FALSE)
clusterNames(object, factor = FALSE)
object |
The |
factor |
Whether to return the cluster names as a factor. |
A character
of the cluster names.
Other lcModel functions:
clusterProportions()
,
clusterSizes()
,
clusterTrajectories()
,
coef.lcModel()
,
converged()
,
deviance.lcModel()
,
df.residual.lcModel()
,
estimationTime()
,
externalMetric()
,
fitted.lcModel()
,
fittedTrajectories()
,
getCall.lcModel()
,
getLcMethod()
,
ids()
,
lcModel-class
,
metric()
,
model.frame.lcModel()
,
nClusters()
,
nIds()
,
nobs.lcModel()
,
plot-lcModel-method
,
plotClusterTrajectories()
,
plotFittedTrajectories()
,
postprob()
,
predict.lcModel()
,
predictAssignments()
,
predictForCluster()
,
predictPostprob()
,
qqPlot()
,
residuals.lcModel()
,
sigma.lcModel()
,
strip()
,
time.lcModel()
,
trajectoryAssignments()
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData) clusterNames(model) # A, B
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData) clusterNames(model) # A, B
Update the cluster names
clusterNames(object) <- value
clusterNames(object) <- value
object |
The |
value |
The |
The updated lcModel
object.
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData, nClusters = 2) clusterNames(model) <- c("Group 1", "Group 2")
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData, nClusters = 2) clusterNames(model) <- c("Group 1", "Group 2")
Obtain the proportional size per cluster, between 0 and 1.
clusterProportions(object, ...) ## S4 method for signature 'lcModel' clusterProportions(object, ...)
clusterProportions(object, ...) ## S4 method for signature 'lcModel' clusterProportions(object, ...)
object |
The model. |
... |
For |
A named numeric vector
of length nClusters(object)
with the proportional size of each cluster.
By default, the cluster proportions are determined from the cluster-averaged posterior probabilities of the fitted data (as computed by the postprob()
function).
Classes extending lcModel
can override this method to return, for example, the exact estimated mixture proportions based on the model coefficients.
setMethod("clusterProportions", "lcModelExt", function(object, ...) { # return cluster proportion vector })
Other lcModel functions:
clusterNames()
,
clusterSizes()
,
clusterTrajectories()
,
coef.lcModel()
,
converged()
,
deviance.lcModel()
,
df.residual.lcModel()
,
estimationTime()
,
externalMetric()
,
fitted.lcModel()
,
fittedTrajectories()
,
getCall.lcModel()
,
getLcMethod()
,
ids()
,
lcModel-class
,
metric()
,
model.frame.lcModel()
,
nClusters()
,
nIds()
,
nobs.lcModel()
,
plot-lcModel-method
,
plotClusterTrajectories()
,
plotFittedTrajectories()
,
postprob()
,
predict.lcModel()
,
predictAssignments()
,
predictForCluster()
,
predictPostprob()
,
qqPlot()
,
residuals.lcModel()
,
sigma.lcModel()
,
strip()
,
time.lcModel()
,
trajectoryAssignments()
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData, nClusters = 2) clusterProportions(model)
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData, nClusters = 2) clusterProportions(model)
Obtain the size of each cluster, where the size is determined by the number of assigned trajectories to each cluster.
clusterSizes(object, ...)
clusterSizes(object, ...)
object |
The |
... |
Additional arguments passed to |
The cluster sizes are computed from the trajectory cluster membership as decided by the trajectoryAssignments()
function.
A named integer vector
of length nClusters(object)
with the number of assigned trajectories per cluster.
clusterProportions trajectoryAssignments
Other lcModel functions:
clusterNames()
,
clusterProportions()
,
clusterTrajectories()
,
coef.lcModel()
,
converged()
,
deviance.lcModel()
,
df.residual.lcModel()
,
estimationTime()
,
externalMetric()
,
fitted.lcModel()
,
fittedTrajectories()
,
getCall.lcModel()
,
getLcMethod()
,
ids()
,
lcModel-class
,
metric()
,
model.frame.lcModel()
,
nClusters()
,
nIds()
,
nobs.lcModel()
,
plot-lcModel-method
,
plotClusterTrajectories()
,
plotFittedTrajectories()
,
postprob()
,
predict.lcModel()
,
predictAssignments()
,
predictForCluster()
,
predictPostprob()
,
qqPlot()
,
residuals.lcModel()
,
sigma.lcModel()
,
strip()
,
time.lcModel()
,
trajectoryAssignments()
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData, nClusters = 2) clusterSizes(model)
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData, nClusters = 2) clusterSizes(model)
Extracts a data.frame
of the cluster trajectories associated with the given object.
clusterTrajectories(object, ...) ## S4 method for signature 'lcModel' clusterTrajectories(object, at = time(object), what = "mu", ...)
clusterTrajectories(object, ...) ## S4 method for signature 'lcModel' clusterTrajectories(object, at = time(object), what = "mu", ...)
object |
The model. |
... |
For |
at |
A |
what |
The distributional parameter to predict. By default, the mean response 'mu' is predicted. The cluster membership predictions can be obtained by specifying |
A data.frame
of the estimated values at the specified times.
The first column should be named "Cluster".
The second column should be time, with the name matching the timeVariable(object)
.
The third column should be the expected value of the observations, named after the responseVariable(object)
.
Other lcModel functions:
clusterNames()
,
clusterProportions()
,
clusterSizes()
,
coef.lcModel()
,
converged()
,
deviance.lcModel()
,
df.residual.lcModel()
,
estimationTime()
,
externalMetric()
,
fitted.lcModel()
,
fittedTrajectories()
,
getCall.lcModel()
,
getLcMethod()
,
ids()
,
lcModel-class
,
metric()
,
model.frame.lcModel()
,
nClusters()
,
nIds()
,
nobs.lcModel()
,
plot-lcModel-method
,
plotClusterTrajectories()
,
plotFittedTrajectories()
,
postprob()
,
predict.lcModel()
,
predictAssignments()
,
predictForCluster()
,
predictPostprob()
,
qqPlot()
,
residuals.lcModel()
,
sigma.lcModel()
,
strip()
,
time.lcModel()
,
trajectoryAssignments()
method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData) clusterTrajectories(model) clusterTrajectories(model, at = c(0, .5, 1))
method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData) clusterTrajectories(model) clusterTrajectories(model, at = c(0, .5, 1))
Extract the coefficients of the lcModel
object, if defined.
The returned set of coefficients depends on the underlying type of lcModel
.
The default implementation checks for the existence of a coef()
function for the internal model as defined in the @model
slot, returning the output if available.
## S3 method for class 'lcModel' coef(object, ...)
## S3 method for class 'lcModel' coef(object, ...)
object |
The |
... |
Additional arguments. |
A named numeric vector
with all coefficients, or a matrix
with each column containing the cluster-specific coefficients. If coef()
is not defined for the given model, an empty numeric vector
is returned.
Classes extending lcModel
can override this method to return model-specific coefficients.
coef.lcModelExt <- function(object, ...) { # return model coefficients }
Other lcModel functions:
clusterNames()
,
clusterProportions()
,
clusterSizes()
,
clusterTrajectories()
,
converged()
,
deviance.lcModel()
,
df.residual.lcModel()
,
estimationTime()
,
externalMetric()
,
fitted.lcModel()
,
fittedTrajectories()
,
getCall.lcModel()
,
getLcMethod()
,
ids()
,
lcModel-class
,
metric()
,
model.frame.lcModel()
,
nClusters()
,
nIds()
,
nobs.lcModel()
,
plot-lcModel-method
,
plotClusterTrajectories()
,
plotFittedTrajectories()
,
postprob()
,
predict.lcModel()
,
predictAssignments()
,
predictForCluster()
,
predictPostprob()
,
qqPlot()
,
residuals.lcModel()
,
sigma.lcModel()
,
strip()
,
time.lcModel()
,
trajectoryAssignments()
Other lcModel functions:
clusterNames()
,
clusterProportions()
,
clusterSizes()
,
clusterTrajectories()
,
converged()
,
deviance.lcModel()
,
df.residual.lcModel()
,
estimationTime()
,
externalMetric()
,
fitted.lcModel()
,
fittedTrajectories()
,
getCall.lcModel()
,
getLcMethod()
,
ids()
,
lcModel-class
,
metric()
,
model.frame.lcModel()
,
nClusters()
,
nIds()
,
nobs.lcModel()
,
plot-lcModel-method
,
plotClusterTrajectories()
,
plotFittedTrajectories()
,
postprob()
,
predict.lcModel()
,
predictAssignments()
,
predictForCluster()
,
predictPostprob()
,
qqPlot()
,
residuals.lcModel()
,
sigma.lcModel()
,
strip()
,
time.lcModel()
,
trajectoryAssignments()
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData, nClusters = 2) coef(model)
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData, nClusters = 2) coef(model)
lcMethod
estimation step: compose an lcMethod objectNote: this function should not be called directly, as it is part of the lcMethod
estimation procedure.
For fitting an lcMethod
object to a dataset, use the latrend()
function or one of the other standard estimation functions.
The compose()
function of the lcMethod
object evaluates and finalizes the lcMethod
arguments.
The default implementation returns an updated object with all arguments having been evaluated.
compose(method, envir, ...) ## S4 method for signature 'lcMethod' compose(method, envir = NULL)
compose(method, envir, ...) ## S4 method for signature 'lcMethod' compose(method, envir = NULL)
method |
The |
envir |
The |
... |
Not used. |
The evaluated and finalized lcMethod
object.
In general, there is no need to extend this method for a specific method, as all arguments are automatically evaluated by the compose,lcMethod
method.
However, in case there is a need to extend processing or to prevent evaluation of specific arguments (e.g., for handling errors), the method can be overridden for the specific lcMethod
subclass.
setMethod("compose", "lcMethodExample", function(method, envir = NULL) { newMethod <- callNextMethod() # further processing return(newMethod) })
The steps for estimating a lcMethod
object are defined and executed as follows:
compose()
: Evaluate and finalize the method argument values.
validate()
: Check the validity of the method argument values in relation to the dataset.
prepareData()
: Process the training data for fitting.
preFit()
: Prepare environment for estimation, independent of training data.
fit()
: Estimate the specified method on the training data, outputting an object inheriting from lcModel
.
postFit()
: Post-process the outputted lcModel
object.
The result of the fitting procedure is an lcModel object that inherits from the lcModel
class.
Compute the posterior confusion matrix (PCM).
The entry represents the probability (or number, in case of
scale = TRUE
) of a trajectory
belonging to cluster is assigned to cluster
under the specified trajectory cluster assignment strategy.
confusionMatrix(object, strategy = which.max, scale = TRUE, ...)
confusionMatrix(object, strategy = which.max, scale = TRUE, ...)
object |
The model, of type |
strategy |
The strategy for assigning trajectories to a specific cluster, see |
scale |
Whether to express the confusion in probabilities ( |
... |
Additional arguments passed to |
A K-by-K confusion matrix
with K = nClusters(object)
.
postprob clusterProportions trajectoryAssignments APPA OCC
data(latrendData) if (rlang::is_installed("lcmm")) { method <- lcMethodLcmmGMM( fixed = Y ~ Time, mixture = ~ Time, random = ~ 1, id = "Id", time = "Time" ) model <- latrend(method, latrendData) confusionMatrix(model) }
data(latrendData) if (rlang::is_installed("lcmm")) { method <- lcMethodLcmmGMM( fixed = Y ~ Time, mixture = ~ Time, random = ~ 1, id = "Id", time = "Time" ) model <- latrend(method, latrendData) confusionMatrix(model) }
Check whether the fitted object converged.
converged(object, ...) ## S4 method for signature 'lcModel' converged(object, ...)
converged(object, ...) ## S4 method for signature 'lcModel' converged(object, ...)
object |
The model. |
... |
Not used. |
Either logical
indicating convergence, or a numeric
status code.
The default lcModel
implementation returns NA
.
Classes extending lcModel
can override this method to return a convergence status or code.
setMethod("converged", "lcModelExt", function(object, ...) { # return convergence code })
Other lcModel functions:
clusterNames()
,
clusterProportions()
,
clusterSizes()
,
clusterTrajectories()
,
coef.lcModel()
,
deviance.lcModel()
,
df.residual.lcModel()
,
estimationTime()
,
externalMetric()
,
fitted.lcModel()
,
fittedTrajectories()
,
getCall.lcModel()
,
getLcMethod()
,
ids()
,
lcModel-class
,
metric()
,
model.frame.lcModel()
,
nClusters()
,
nIds()
,
nobs.lcModel()
,
plot-lcModel-method
,
plotClusterTrajectories()
,
plotFittedTrajectories()
,
postprob()
,
predict.lcModel()
,
predictAssignments()
,
predictForCluster()
,
predictPostprob()
,
qqPlot()
,
residuals.lcModel()
,
sigma.lcModel()
,
strip()
,
time.lcModel()
,
trajectoryAssignments()
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData, nClusters = 2) converged(model)
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData, nClusters = 2) converged(model)
Create the test fold data for validation
createTestDataFold(data, trainData, id = getOption("latrend.id"))
createTestDataFold(data, trainData, id = getOption("latrend.id"))
data |
A |
trainData |
A |
id |
The trajectory identifier variable. |
createTrainDataFolds
Other validation methods:
createTestDataFolds()
,
createTrainDataFolds()
,
latrendBoot()
,
latrendCV()
,
lcModel-data-filters
data(latrendData) if (require("caret")) { trainDataList <- createTrainDataFolds(latrendData, id = "Id", folds = 10) testData1 <- createTestDataFold(latrendData, trainDataList[[1]], id = "Id") }
data(latrendData) if (require("caret")) { trainDataList <- createTrainDataFolds(latrendData, id = "Id", folds = 10) testData1 <- createTestDataFold(latrendData, trainDataList[[1]], id = "Id") }
Create all k test folds from the training data
createTestDataFolds(data, trainDataList, ...)
createTestDataFolds(data, trainDataList, ...)
data |
A |
trainDataList |
A |
... |
Arguments passed to createTestDataFold. |
Other validation methods:
createTestDataFold()
,
createTrainDataFolds()
,
latrendBoot()
,
latrendCV()
,
lcModel-data-filters
data(latrendData) if (require("caret")) { trainDataList <- createTrainDataFolds(latrendData, folds = 10, id = "Id") testDataList <- createTestDataFolds(latrendData, trainDataList) }
data(latrendData) if (require("caret")) { trainDataList <- createTrainDataFolds(latrendData, folds = 10, id = "Id") testDataList <- createTestDataFolds(latrendData, trainDataList) }
Create the training data for each of the k models in k-fold cross validation evaluation
createTrainDataFolds( data, folds = 10L, id = getOption("latrend.id"), seed = NULL )
createTrainDataFolds( data, folds = 10L, id = getOption("latrend.id"), seed = NULL )
data |
A |
folds |
The number of folds. By default, a 10-fold scheme is used. |
id |
The trajectory identifier variable. |
seed |
The seed to use, in order to ensure reproducible fold generation at a later moment. |
A list
of data.frame
of the folds
training datasets.
Other validation methods:
createTestDataFold()
,
createTestDataFolds()
,
latrendBoot()
,
latrendCV()
,
lcModel-data-filters
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") if (require("caret")) { trainFolds <- createTrainDataFolds(latrendData, folds = 5, id = "Id", seed = 1) foldModels <- latrendBatch(method, data = trainFolds) testDataFolds <- createTestDataFolds(latrendData, trainFolds) }
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") if (require("caret")) { trainFolds <- createTrainDataFolds(latrendData, folds = 5, id = "Id", seed = 1) foldModels <- latrendBatch(method, data = trainFolds) testDataFolds <- createTestDataFolds(latrendData, trainFolds) }
Define an external metric for lcModels
defineExternalMetric( name, fun, warnIfExists = getOption("latrend.warnMetricOverride", TRUE) )
defineExternalMetric( name, fun, warnIfExists = getOption("latrend.warnMetricOverride", TRUE) )
name |
The name of the metric. |
fun |
The function to compute the metric, accepting a lcModel object as input. |
warnIfExists |
Whether to output a warning when the metric is already defined. |
Other metric functions:
defineInternalMetric()
,
externalMetric()
,
getExternalMetricDefinition()
,
getExternalMetricNames()
,
getInternalMetricDefinition()
,
getInternalMetricNames()
,
metric()
Define an internal metric for lcModels
defineInternalMetric( name, fun, warnIfExists = getOption("latrend.warnMetricOverride", TRUE) )
defineInternalMetric( name, fun, warnIfExists = getOption("latrend.warnMetricOverride", TRUE) )
name |
The name of the metric. |
fun |
The function to compute the metric, accepting a lcModel object as input. |
warnIfExists |
Whether to output a warning when the metric is already defined. |
Other metric functions:
defineExternalMetric()
,
externalMetric()
,
getExternalMetricDefinition()
,
getExternalMetricNames()
,
getInternalMetricDefinition()
,
getInternalMetricNames()
,
metric()
defineInternalMetric("BIC", fun = BIC) mae <- function(object) { mean(abs(residuals(object))) } defineInternalMetric("MAE", fun = mae)
defineInternalMetric("BIC", fun = BIC) mae <- function(object) { mean(abs(residuals(object))) } defineInternalMetric("MAE", fun = mae)
Get the deviance of the fitted lcModel
object.
## S3 method for class 'lcModel' deviance(object, ...)
## S3 method for class 'lcModel' deviance(object, ...)
object |
The |
... |
Additional arguments. |
The default implementation checks for the existence of the deviance()
function for the internal model, and returns the output, if available.
A numeric
with the deviance value. If unavailable, NA
is returned.
Other lcModel functions:
clusterNames()
,
clusterProportions()
,
clusterSizes()
,
clusterTrajectories()
,
coef.lcModel()
,
converged()
,
df.residual.lcModel()
,
estimationTime()
,
externalMetric()
,
fitted.lcModel()
,
fittedTrajectories()
,
getCall.lcModel()
,
getLcMethod()
,
ids()
,
lcModel-class
,
metric()
,
model.frame.lcModel()
,
nClusters()
,
nIds()
,
nobs.lcModel()
,
plot-lcModel-method
,
plotClusterTrajectories()
,
plotFittedTrajectories()
,
postprob()
,
predict.lcModel()
,
predictAssignments()
,
predictForCluster()
,
predictPostprob()
,
qqPlot()
,
residuals.lcModel()
,
sigma.lcModel()
,
strip()
,
time.lcModel()
,
trajectoryAssignments()
Extract the residual degrees of freedom from a lcModel
## S3 method for class 'lcModel' df.residual(object, ...)
## S3 method for class 'lcModel' df.residual(object, ...)
object |
The |
... |
Additional arguments. |
A numeric
with the residual degrees of freedom. If unavailable, NA
is returned.
stats::df.residual nobs residuals
Other lcModel functions:
clusterNames()
,
clusterProportions()
,
clusterSizes()
,
clusterTrajectories()
,
coef.lcModel()
,
converged()
,
deviance.lcModel()
,
estimationTime()
,
externalMetric()
,
fitted.lcModel()
,
fittedTrajectories()
,
getCall.lcModel()
,
getLcMethod()
,
ids()
,
lcModel-class
,
metric()
,
model.frame.lcModel()
,
nClusters()
,
nIds()
,
nobs.lcModel()
,
plot-lcModel-method
,
plotClusterTrajectories()
,
plotFittedTrajectories()
,
postprob()
,
predict.lcModel()
,
predictAssignments()
,
predictForCluster()
,
predictPostprob()
,
qqPlot()
,
residuals.lcModel()
,
sigma.lcModel()
,
strip()
,
time.lcModel()
,
trajectoryAssignments()
Get the elapsed time for estimating the given model.
For lcModel
: Get the estimation time of the model, determined by the time taken for the associated fit()
function to finish.
estimationTime(object, unit = "secs", ...) ## S4 method for signature 'lcModel' estimationTime(object, unit = "secs", ...) ## S4 method for signature 'lcModels' estimationTime(object, unit = "secs", ...) ## S4 method for signature 'list' estimationTime(object, unit = "secs", ...)
estimationTime(object, unit = "secs", ...) ## S4 method for signature 'lcModel' estimationTime(object, unit = "secs", ...) ## S4 method for signature 'lcModels' estimationTime(object, unit = "secs", ...) ## S4 method for signature 'list' estimationTime(object, unit = "secs", ...)
object |
The model. |
unit |
The time unit in which the estimation time should be outputted. By default, estimation time is in seconds. For accepted units, see base::difftime. |
... |
Not used. |
A non-negative scalar numeric
representing the estimation time in the specified unit..
Other lcModel functions:
clusterNames()
,
clusterProportions()
,
clusterSizes()
,
clusterTrajectories()
,
coef.lcModel()
,
converged()
,
deviance.lcModel()
,
df.residual.lcModel()
,
externalMetric()
,
fitted.lcModel()
,
fittedTrajectories()
,
getCall.lcModel()
,
getLcMethod()
,
ids()
,
lcModel-class
,
metric()
,
model.frame.lcModel()
,
nClusters()
,
nIds()
,
nobs.lcModel()
,
plot-lcModel-method
,
plotClusterTrajectories()
,
plotFittedTrajectories()
,
postprob()
,
predict.lcModel()
,
predictAssignments()
,
predictForCluster()
,
predictPostprob()
,
qqPlot()
,
residuals.lcModel()
,
sigma.lcModel()
,
strip()
,
time.lcModel()
,
trajectoryAssignments()
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData) estimationTime(model) estimationTime(model, unit = 'mins') estimationTime(model, unit = 'days')
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData) estimationTime(model) estimationTime(model, unit = 'mins') estimationTime(model, unit = 'days')
Substitutes the call arguments if they can be evaluated without error.
## S3 method for class 'lcMethod' evaluate( object, classes = "ANY", try = TRUE, exclude = character(), envir = NULL, ... )
## S3 method for class 'lcMethod' evaluate( object, classes = "ANY", try = TRUE, exclude = character(), envir = NULL, ... )
object |
The |
classes |
Substitute only arguments with specific class types. By default, all types are substituted. |
try |
Whether to try to evaluate arguments and ignore errors (the default), or to fail on any argument evaluation error. |
exclude |
Arguments to exclude from evaluation. |
envir |
The |
... |
Not used. |
A new lcMethod
object with the substituted arguments.
Other lcMethod functions:
[[,lcMethod-method
,
as.data.frame.lcMethod()
,
as.data.frame.lcMethods()
,
as.lcMethods()
,
as.list.lcMethod()
,
formula.lcMethod()
,
lcMethod-class
,
names,lcMethod-method
,
update.lcMethod()
Compute one or more external metrics for two or more objects.
Note that there are many external metrics available, and there exists no external metric that works best in all scenarios. It is recommended to carefully consider which metric is most appropriate for your use case.
Many of the external metrics depend on implementations in other packages:
clusterCrit (Desgraupes 2018)
mclustcomp (You 2018)
igraph (Csardi and Nepusz 2006)
psych (Revelle 2019)
See mclustcomp::mclustcomp()
for a grouped overview of similarity metrics.
Call getInternalMetricNames()
to retrieve the names of the defined internal metrics.
Call getExternalMetricNames()
to retrieve the names of the defined internal metrics.
## S4 method for signature 'lcModel,lcModel' externalMetric( object, object2, name = getOption("latrend.externalMetric"), ... ) ## S4 method for signature 'lcModels,missing' externalMetric(object, object2, name = "adjustedRand") ## S4 method for signature 'lcModels,character' externalMetric(object, object2 = "adjustedRand") ## S4 method for signature 'lcModels,lcModel' externalMetric(object, object2, name, drop = TRUE) ## S4 method for signature 'list,lcModel' externalMetric(object, object2, name, drop = TRUE)
## S4 method for signature 'lcModel,lcModel' externalMetric( object, object2, name = getOption("latrend.externalMetric"), ... ) ## S4 method for signature 'lcModels,missing' externalMetric(object, object2, name = "adjustedRand") ## S4 method for signature 'lcModels,character' externalMetric(object, object2 = "adjustedRand") ## S4 method for signature 'lcModels,lcModel' externalMetric(object, object2, name, drop = TRUE) ## S4 method for signature 'list,lcModel' externalMetric(object, object2, name, drop = TRUE)
object |
The object to compare to the second object |
object2 |
The second object |
name |
The name(s) of the external metric(s) to compute. If no names are given, the names specified in the |
... |
Additional arguments. |
drop |
Whether to return a |
For externalMetric(lcModel, lcModel)
: A numeric
vector of the computed metrics.
For externalMetric(lcModels)
: A distance matrix of class dist representing
the pairwise comparisons.
For externalMetric(lcModels, name)
: A distance matrix of class dist representing
the pairwise comparisons.
For externalMetric(lcModels, lcModel)
: A named numeric
vector or data.frame
containing the computed model metrics.
For externalMetric(list, lcModel)
: A named numeric
vector or data.frame
containing the computed model metrics.
Metric name | Description | Function / Reference |
adjustedRand |
Adjusted Rand index. Based on the Rand index, but adjusted for agreements occurring by chance. A score of 1 indicates a perfect agreement, whereas a score of 0 indicates an agreement no better than chance. | mclustcomp::mclustcomp() , (Hubert and Arabie 1985) |
CohensKappa |
Cohen's kappa. A partitioning agreement metric correcting for random chance. A score of 1 indicates a perfect agreement, whereas a score of 0 indicates an agreement no better than chance. | psych::cohen.kappa() , (Cohen 1960) |
F |
F-score | mclustcomp::mclustcomp() |
F1 |
F1-score, also referred to as the Sørensen–Dice Coefficient, or Dice similarity coefficient | mclustcomp::mclustcomp() |
FolkesMallows |
Fowlkes-Mallows index | mclustcomp::mclustcomp() |
Hubert |
Hubert index | clusterCrit::extCriteria() |
Jaccard |
Jaccard index | mclustcomp::mclustcomp() |
jointEntropy |
Joint entropy between model assignments | mclustcomp::mclustcomp() |
Kulczynski |
Kulczynski index | clusterCrit::extCriteria() |
MaximumMatch |
Maximum match measure | mclustcomp::mclustcomp() |
McNemar |
McNemar statistic | clusterCrit::extCriteria() |
MeilaHeckerman |
Meila-Heckerman measure | mclustcomp::mclustcomp() |
Mirkin |
Mirkin metric | mclustcomp::mclustcomp() |
MI |
Mutual information | mclustcomp::mclustcomp() |
NMI |
Normalized mutual information | igraph::compare() |
NSJ |
Normalized version of splitJoin . The proportion of edits relative to the maximum changes (twice the number of ids) |
|
NVI |
Normalized variation of information | mclustcomp::mclustcomp() |
Overlap |
Overlap coefficient, also referred to as the Szymkiewicz–Simpson coefficient | mclustcomp::mclustcomp() (M K and K 2016) |
PD |
Partition difference | mclustcomp::mclustcomp() |
Phi |
Phi coefficient. | clusterCrit::extCriteria() |
precision |
precision | clusterCrit::extCriteria() |
Rand |
Rand index | mclustcomp::mclustcomp() |
recall |
recall | clusterCrit::extCriteria() |
RogersTanimoto |
Rogers-Tanimoto dissimilarity | clusterCrit::extCriteria() |
RusselRao |
Russell-Rao dissimilarity | clusterCrit::extCriteria() |
SMC |
Simple matching coefficient | mclustcomp::mclustcomp() |
splitJoin |
total split-join index | igraph::split_join_distance() |
splitJoin.ref |
Split-join index of the first model to the second model. In other words, it is the edit-distance between the two partitionings. | |
SokalSneath1 |
Type-1 Sokal-Sneath dissimilarity | clusterCrit::extCriteria() |
SokalSneath2 |
Type-2 Sokal-Sneath dissimilarity | clusterCrit::extCriteria() |
VI |
Variation of information | mclustcomp::mclustcomp() |
Wallace1 |
Type-1 Wallace criterion | mclustcomp::mclustcomp() |
Wallace2 |
Type-2 Wallace criterion | mclustcomp::mclustcomp() |
WMSSE |
Weighted minimum sum of squared errors between cluster trajectories | |
WMMSE |
Weighted minimum mean of squared errors between cluster trajectories | |
WMMAE |
Weighted minimum mean of absolute errors between cluster trajectories | |
See the documentation of the defineExternalMetric()
function for details on how to define your own external metrics.
Cohen J (1960).
“A Coefficient of Agreement for Nominal Scales.”
Educational and Psychological Measurement, 20(1), 37-46.
Csardi G, Nepusz T (2006).
“The igraph software package for complex network research.”
InterJournal, Complex Systems, 1695.
https://igraph.org.
Desgraupes B (2018).
clusterCrit: Clustering Indices.
R package version 1.2.8, https://CRAN.R-project.org/package=clusterCrit.
Hubert L, Arabie P (1985).
“Comparing Partitions.”
Journal of Classification, 2(1), 193–218.
ISSN 1432-1343, doi:10.1007/BF01908075.
M K V, K K (2016).
“A Survey on Similarity Measures in Text Mining.”
Machine Learning and Applications: An International Journal, 3, 19-28.
doi:10.5121/mlaij.2016.3103.
Revelle W (2019).
psych: Procedures for Psychological, Psychometric, and Personality Research.
Northwestern University, Evanston, Illinois.
R package version 1.9.12, https://CRAN.R-project.org/package=psych.
You K (2018).
mclustcomp: Measures for Comparing Clusters.
R package version 0.3.1, https://CRAN.R-project.org/package=mclustcomp.
Other metric functions:
defineExternalMetric()
,
defineInternalMetric()
,
getExternalMetricDefinition()
,
getExternalMetricNames()
,
getInternalMetricDefinition()
,
getInternalMetricNames()
,
metric()
Other lcModel functions:
clusterNames()
,
clusterProportions()
,
clusterSizes()
,
clusterTrajectories()
,
coef.lcModel()
,
converged()
,
deviance.lcModel()
,
df.residual.lcModel()
,
estimationTime()
,
fitted.lcModel()
,
fittedTrajectories()
,
getCall.lcModel()
,
getLcMethod()
,
ids()
,
lcModel-class
,
metric()
,
model.frame.lcModel()
,
nClusters()
,
nIds()
,
nobs.lcModel()
,
plot-lcModel-method
,
plotClusterTrajectories()
,
plotFittedTrajectories()
,
postprob()
,
predict.lcModel()
,
predictAssignments()
,
predictForCluster()
,
predictPostprob()
,
qqPlot()
,
residuals.lcModel()
,
sigma.lcModel()
,
strip()
,
time.lcModel()
,
trajectoryAssignments()
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model2 <- latrend(method, latrendData, nClusters = 2) model3 <- latrend(method, latrendData, nClusters = 3) if (require("mclustcomp")) { externalMetric(model2, model3, "adjustedRand") }
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model2 <- latrend(method, latrendData, nClusters = 2) model3 <- latrend(method, latrendData, nClusters = 3) if (require("mclustcomp")) { externalMetric(model2, model3, "adjustedRand") }
lcMethod
estimation step: logic for fitting the method to the processed dataNote: this function should not be called directly, as it is part of the lcMethod
estimation procedure.
For fitting an lcMethod
object to a dataset, use the latrend()
function or one of the other standard estimation functions.
The fit()
function of the lcMethod
object estimates the model with the evaluated method specification, processed training data, and prepared environment.
fit(method, data, envir, verbose, ...) ## S4 method for signature 'lcMethod' fit(method, data, envir, verbose)
fit(method, data, envir, verbose, ...) ## S4 method for signature 'lcMethod' fit(method, data, envir, verbose)
method |
An object inheriting from |
data |
A |
envir |
The |
verbose |
A R.utils::Verbose object indicating the level of verbosity. |
... |
Not used. |
The fitted object, inheriting from lcModel
.
This method should be implemented for all lcMethod
subclasses.
setMethod("fit", "lcMethodExample", function(method, data, envir, verbose) { # estimate the model or cluster parameters coefs <- FIT_CODE # create the lcModel object new("lcModelExample", method = method, data = data, model = coefs, clusterNames = make.clusterNames(method$nClusters) ) })
The steps for estimating a lcMethod
object are defined and executed as follows:
compose()
: Evaluate and finalize the method argument values.
validate()
: Check the validity of the method argument values in relation to the dataset.
prepareData()
: Process the training data for fitting.
preFit()
: Prepare environment for estimation, independent of training data.
fit()
: Estimate the specified method on the training data, outputting an object inheriting from lcModel
.
postFit()
: Post-process the outputted lcModel
object.
The result of the fitting procedure is an lcModel object that inherits from the lcModel
class.
Returns the cluster-specific fitted values for the given lcModel
object.
The default implementation calls predict()
with newdata = NULL
.
## S3 method for class 'lcModel' fitted(object, ..., clusters = trajectoryAssignments(object))
## S3 method for class 'lcModel' fitted(object, ..., clusters = trajectoryAssignments(object))
object |
The |
... |
Additional arguments. |
clusters |
Optional cluster assignments per id. If unspecified, a |
A numeric
vector of the fitted values for the respective class, or a matrix
of fitted values for each cluster.
Classes extending lcModel
can override this method to adapt the computation of the predicted values for the training data.
Note that the implementation of this function is only needed when predict()
and predictForCluster()
are not defined for the lcModel
subclass.
fitted.lcModelExt <- function(object, ..., clusters = trajectoryAssignments(object)) { pred = predict(object, newdata = NULL) transformFitted(pred = pred, model = object, clusters = clusters) }
The transformFitted()
function takes care of transforming the prediction input to the right output format.
fittedTrajectories plotFittedTrajectories stats::fitted predict.lcModel trajectoryAssignments transformFitted
Other lcModel functions:
clusterNames()
,
clusterProportions()
,
clusterSizes()
,
clusterTrajectories()
,
coef.lcModel()
,
converged()
,
deviance.lcModel()
,
df.residual.lcModel()
,
estimationTime()
,
externalMetric()
,
fittedTrajectories()
,
getCall.lcModel()
,
getLcMethod()
,
ids()
,
lcModel-class
,
metric()
,
model.frame.lcModel()
,
nClusters()
,
nIds()
,
nobs.lcModel()
,
plot-lcModel-method
,
plotClusterTrajectories()
,
plotFittedTrajectories()
,
postprob()
,
predict.lcModel()
,
predictAssignments()
,
predictForCluster()
,
predictPostprob()
,
qqPlot()
,
residuals.lcModel()
,
sigma.lcModel()
,
strip()
,
time.lcModel()
,
trajectoryAssignments()
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData) fitted(model)
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData) fitted(model)
Extract the fitted trajectories
fittedTrajectories(object, ...) ## S4 method for signature 'lcModel' fittedTrajectories( object, at = time(object), what = "mu", clusters = trajectoryAssignments(object), ... )
fittedTrajectories(object, ...) ## S4 method for signature 'lcModel' fittedTrajectories( object, at = time(object), what = "mu", clusters = trajectoryAssignments(object), ... )
object |
The model. |
... |
For |
at |
The time points at which to compute the id-specific trajectories. The default implementation merely filters the output, i.e., fitted values can only be outputted for times at which the model was trained. |
what |
The distributional parameter to compute the response for. |
clusters |
The cluster assignments for the strata to base the trajectories on. |
The default lcModel
implementation uses the output of fitted()
of the respective model.
A data.frame
representing the fitted response per trajectory per moment in time for the respective cluster.
For lcModel
: A data.frame
with columns id, time, response, and "Cluster".
Other lcModel functions:
clusterNames()
,
clusterProportions()
,
clusterSizes()
,
clusterTrajectories()
,
coef.lcModel()
,
converged()
,
deviance.lcModel()
,
df.residual.lcModel()
,
estimationTime()
,
externalMetric()
,
fitted.lcModel()
,
getCall.lcModel()
,
getLcMethod()
,
ids()
,
lcModel-class
,
metric()
,
model.frame.lcModel()
,
nClusters()
,
nIds()
,
nobs.lcModel()
,
plot-lcModel-method
,
plotClusterTrajectories()
,
plotFittedTrajectories()
,
postprob()
,
predict.lcModel()
,
predictAssignments()
,
predictForCluster()
,
predictPostprob()
,
qqPlot()
,
residuals.lcModel()
,
sigma.lcModel()
,
strip()
,
time.lcModel()
,
trajectoryAssignments()
data(latrendData) # Note: not a great example because the fitted trajectories # are identical to the respective cluster trajectory method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData) fittedTrajectories(model) fittedTrajectories(model, at = time(model)[c(1, 2)])
data(latrendData) # Note: not a great example because the fitted trajectories # are identical to the respective cluster trajectory method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData) fittedTrajectories(model) fittedTrajectories(model, at = time(model)[c(1, 2)])
Extracts the associated formula
for the given distributional parameter.
## S3 method for class 'lcMethod' formula(x, what = "mu", envir = NULL, ...)
## S3 method for class 'lcMethod' formula(x, what = "mu", envir = NULL, ...)
x |
The |
what |
The distributional parameter to which this formula applies. By default, the formula specifies |
envir |
The |
... |
Additional arguments. |
The formula
for the given distributional parameter.
Other lcMethod functions:
[[,lcMethod-method
,
as.data.frame.lcMethod()
,
as.data.frame.lcMethods()
,
as.lcMethods()
,
as.list.lcMethod()
,
evaluate.lcMethod()
,
lcMethod-class
,
names,lcMethod-method
,
update.lcMethod()
method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") formula(method) # Y ~ Time
method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") formula(method) # Y ~ Time
Get the formula associated with the fitted lcModel
object.
This is determined by the formula
argument of the lcMethod
specification that was used to fit the model.
## S3 method for class 'lcModel' formula(x, what = "mu", ...)
## S3 method for class 'lcModel' formula(x, what = "mu", ...)
x |
The |
what |
The distributional parameter. |
... |
Additional arguments. |
Returns the associated formula
, or response ~ 0
if not specified.
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, data = latrendData) formula(model) # Y ~ Time
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, data = latrendData) formula(model) # Y ~ Time
Generate longitudinal test data
generateLongData( sizes = c(40, 60), fixed = Value ~ 1, cluster = ~1 + Time, random = ~1, id = getOption("latrend.id"), data = data.frame(Time = seq(0, 1, by = 0.1)), fixedCoefs = 0, clusterCoefs = cbind(c(-2, 1), c(2, -1)), randomScales = cbind(0.1, 0.1), rrandom = rnorm, noiseScales = c(0.1, 0.1), rnoise = rnorm, clusterNames = LETTERS[seq_along(sizes)], shuffle = FALSE, seed = NULL )
generateLongData( sizes = c(40, 60), fixed = Value ~ 1, cluster = ~1 + Time, random = ~1, id = getOption("latrend.id"), data = data.frame(Time = seq(0, 1, by = 0.1)), fixedCoefs = 0, clusterCoefs = cbind(c(-2, 1), c(2, -1)), randomScales = cbind(0.1, 0.1), rrandom = rnorm, noiseScales = c(0.1, 0.1), rnoise = rnorm, clusterNames = LETTERS[seq_along(sizes)], shuffle = FALSE, seed = NULL )
sizes |
Number of strata per cluster. |
fixed |
Fixed effects formula. |
cluster |
Cluster effects formula. |
random |
Random effects formula. |
id |
Name of the strata. |
data |
Data with covariates to use for generation. Stratified data may be specified by adding a grouping column. |
fixedCoefs |
Coefficients matrix for the fixed effects. |
clusterCoefs |
Coefficients matrix for the cluster effects. |
randomScales |
Standard deviations matrix for the size of the variance components (random effects). |
rrandom |
Random sampler for generating the variance components at location 0. |
noiseScales |
Scale of the random noise passed to rnoise. Either scalar or defined per cluster. |
rnoise |
Random sampler for generating noise at location 0 with the respective scale. |
clusterNames |
A |
shuffle |
Whether to randomly reorder the strata in which they appear in the data.frame. |
seed |
Optional seed to set for the PRNG. The set PRNG state persists after the function completes. |
longdata <- generateLongData( sizes = c(40, 70), id = "Id", cluster = ~poly(Time, 2, raw = TRUE), clusterCoefs = cbind(c(1, 2, 5), c(-3, 4, .2)) ) if (require("ggplot2")) { plotTrajectories(longdata, response = "Value", id = "Id", time = "Time") }
longdata <- generateLongData( sizes = c(40, 70), id = "Id", cluster = ~poly(Time, 2, raw = TRUE), clusterCoefs = cbind(c(1, 2, 5), c(-3, 4, .2)) ) if (require("ggplot2")) { plotTrajectories(longdata, response = "Value", id = "Id", time = "Time") }
Returns the default arguments associated with the respective lcMethod
subclass.
These arguments are automatically included into the lcMethod
object during initialization.
getArgumentDefaults(object, ...) ## S4 method for signature 'lcMethod' getArgumentDefaults(object)
getArgumentDefaults(object, ...) ## S4 method for signature 'lcMethod' getArgumentDefaults(object)
object |
The method specification object. |
... |
Not used. |
A named list
of argument values.
Although implementing this method is optional, it prevents users from having to specify all arguments every time they want to create a method specification.
In this example, most of the default arguments are defined as arguments of the function
lcMethodExample
, which we can include in the list by calling formals. Copying the arguments from functions
is especially useful when your method implementation is based on an existing function.
setMethod("getArgumentDefaults", "lcMethodExample", function(object) { list( formals(lcMethodExample), formals(funFEM::funFEM), extra = Value ~ 1, tol = 1e-4, callNextMethod() ) })
It is recommended to add callNextMethod()
to the end of the list.
This enables inheriting the default arguments from superclasses.
Other lcMethod implementations:
getArgumentExclusions()
,
lcMethod-class
,
lcMethodAkmedoids
,
lcMethodCrimCV
,
lcMethodDtwclust
,
lcMethodFeature
,
lcMethodFunFEM
,
lcMethodFunction
,
lcMethodGCKM
,
lcMethodKML
,
lcMethodLMKM
,
lcMethodLcmmGBTM
,
lcMethodLcmmGMM
,
lcMethodMclustLLPA
,
lcMethodMixAK_GLMM
,
lcMethodMixtoolsGMM
,
lcMethodMixtoolsNPRM
,
lcMethodRandom
,
lcMethodStratify
Returns the names of arguments that should be excluded during instantiation of the specification.
getArgumentExclusions(object, ...) ## S4 method for signature 'lcMethod' getArgumentExclusions(object)
getArgumentExclusions(object, ...) ## S4 method for signature 'lcMethod' getArgumentExclusions(object)
object |
The object. |
... |
Not used. |
A character vector
of argument names.
This function only needs to be implemented if you want to avoid users from specifying redundant arguments or arguments that are set automatically or conditionally on other arguments.
setMethod("getArgumentExclusions", "lcMethodExample", function(object) { c( "doPlot", "verbose", callNextMethod() ) }) Adding `callNextMethod()` to the end of the return vector enables inheriting exclusions from superclasses.
lcMethod getArgumentExclusions
Other lcMethod implementations:
getArgumentDefaults()
,
lcMethod-class
,
lcMethodAkmedoids
,
lcMethodCrimCV
,
lcMethodDtwclust
,
lcMethodFeature
,
lcMethodFunFEM
,
lcMethodFunction
,
lcMethodGCKM
,
lcMethodKML
,
lcMethodLMKM
,
lcMethodLcmmGBTM
,
lcMethodLcmmGMM
,
lcMethodMclustLLPA
,
lcMethodMixAK_GLMM
,
lcMethodMixtoolsGMM
,
lcMethodMixtoolsNPRM
,
lcMethodRandom
,
lcMethodStratify
Get a citation object indicating how to cite the underlying R packages used for estimating or representing the given method or model.
getCitation(object, ...) ## S4 method for signature 'lcMethod' getCitation(object, ...) ## S4 method for signature 'lcModel' getCitation(object, ...)
getCitation(object, ...) ## S4 method for signature 'lcMethod' getCitation(object, ...) ## S4 method for signature 'lcModel' getCitation(object, ...)
object |
The object |
... |
Not used. |
A utils::citation object.
Get the external metric definition
getExternalMetricDefinition(name)
getExternalMetricDefinition(name)
name |
The name of the metric. |
The metric function, or NULL if not defined.
Other metric functions:
defineExternalMetric()
,
defineInternalMetric()
,
externalMetric()
,
getExternalMetricNames()
,
getInternalMetricDefinition()
,
getInternalMetricNames()
,
metric()
Get the names of the available external metrics
getExternalMetricNames()
getExternalMetricNames()
Other metric functions:
defineExternalMetric()
,
defineInternalMetric()
,
externalMetric()
,
getExternalMetricDefinition()
,
getInternalMetricDefinition()
,
getInternalMetricNames()
,
metric()
Get the internal metric definition
getInternalMetricDefinition(name)
getInternalMetricDefinition(name)
name |
The name of the metric. |
The metric function, or NULL if not defined.
Other metric functions:
defineExternalMetric()
,
defineInternalMetric()
,
externalMetric()
,
getExternalMetricDefinition()
,
getExternalMetricNames()
,
getInternalMetricNames()
,
metric()
Get the names of the available internal metrics
getInternalMetricNames()
getInternalMetricNames()
Other metric functions:
defineExternalMetric()
,
defineInternalMetric()
,
externalMetric()
,
getExternalMetricDefinition()
,
getExternalMetricNames()
,
getInternalMetricDefinition()
,
metric()
Get the object label, if any.
Extracts the assigned label from the given lcMethod
or lcModel
object.
By default, the label is determined from the "label"
argument of the lcMethod
object.
The label of an lcModel
object is set upon estimation by latrend()
to the label of its associated lcMethod
object.
getLabel(object, ...) ## S4 method for signature 'lcMethod' getLabel(object, ...) ## S4 method for signature 'lcModel' getLabel(object, ...)
getLabel(object, ...) ## S4 method for signature 'lcMethod' getLabel(object, ...) ## S4 method for signature 'lcModel' getLabel(object, ...)
object |
The object. |
... |
Not used. |
A scalar character
. The empty string is returned if there is no label.
method <- lcMethodLMKM(Y ~ Time, time = "Time") getLabel(method) # "" getLabel(update(method, label = "v2")) # "v2"
method <- lcMethodLMKM(Y ~ Time, time = "Time") getLabel(method) # "" getLabel(update(method, label = "v2")) # "v2"
Get the lcMethod
specification that was used for fitting the given object.
getLcMethod(object, ...) ## S4 method for signature 'lcModel' getLcMethod(object)
getLcMethod(object, ...) ## S4 method for signature 'lcModel' getLcMethod(object)
object |
The model. |
... |
Not used. |
An lcMethod
object.
Other lcModel functions:
clusterNames()
,
clusterProportions()
,
clusterSizes()
,
clusterTrajectories()
,
coef.lcModel()
,
converged()
,
deviance.lcModel()
,
df.residual.lcModel()
,
estimationTime()
,
externalMetric()
,
fitted.lcModel()
,
fittedTrajectories()
,
getCall.lcModel()
,
ids()
,
lcModel-class
,
metric()
,
model.frame.lcModel()
,
nClusters()
,
nIds()
,
nobs.lcModel()
,
plot-lcModel-method
,
plotClusterTrajectories()
,
plotFittedTrajectories()
,
postprob()
,
predict.lcModel()
,
predictAssignments()
,
predictForCluster()
,
predictPostprob()
,
qqPlot()
,
residuals.lcModel()
,
sigma.lcModel()
,
strip()
,
time.lcModel()
,
trajectoryAssignments()
method <- lcMethodRandom("Y", id = "Id", time = "Time") model <- latrend(method, latrendData) getLcMethod(model)
method <- lcMethodRandom("Y", id = "Id", time = "Time") model <- latrend(method, latrendData) getLcMethod(model)
Get the name associated with the given object.
getShortName()
: Extracts the short object name
getName(object, ...) getShortName(object, ...) ## S4 method for signature 'lcMethod' getName(object, ...) ## S4 method for signature 'NULL' getName(object, ...) ## S4 method for signature 'lcMethod' getShortName(object, ...) ## S4 method for signature 'NULL' getShortName(object, ...) ## S4 method for signature 'lcModel' getName(object) ## S4 method for signature 'lcModel' getShortName(object)
getName(object, ...) getShortName(object, ...) ## S4 method for signature 'lcMethod' getName(object, ...) ## S4 method for signature 'NULL' getName(object, ...) ## S4 method for signature 'lcMethod' getShortName(object, ...) ## S4 method for signature 'NULL' getShortName(object, ...) ## S4 method for signature 'lcModel' getName(object) ## S4 method for signature 'lcModel' getShortName(object)
object |
The object. |
... |
Not used. |
For lcModel
: The name is determined by its associated lcMethod
name and label, unless specified otherwise.
A nonempty string, as character
.
When implementing your own lcMethod
subclass, override these methods to provide full and abbreviated names.
setMethod("getName", "lcMethodExample", function(object) "example name") setMethod("getShortName", "lcMethodExample", function(object) "EX")
Similar methods can be implemented for your lcModel
subclass,
however in practice this is not needed as the names are determined by default from the lcMethod
object that was used to fit the lcModel
object.
method <- lcMethodLMKM(Y ~ Time) getName(method) # "lm-kmeans" method <- lcMethodLMKM(Y ~ Time) getShortName(method) # "LMKM"
method <- lcMethodLMKM(Y ~ Time) getName(method) # "lm-kmeans" method <- lcMethodLMKM(Y ~ Time) getShortName(method) # "LMKM"
Get the trajectory ids on which the model was fitted
ids(object)
ids(object)
object |
The |
The order returned by ids(object)
determines the id order for any output involving id-specific values, such as in trajectoryAssignments()
or postprob()
.
A character vector
or integer vector
of the identifier for every fitted trajectory.
Other lcModel functions:
clusterNames()
,
clusterProportions()
,
clusterSizes()
,
clusterTrajectories()
,
coef.lcModel()
,
converged()
,
deviance.lcModel()
,
df.residual.lcModel()
,
estimationTime()
,
externalMetric()
,
fitted.lcModel()
,
fittedTrajectories()
,
getCall.lcModel()
,
getLcMethod()
,
lcModel-class
,
metric()
,
model.frame.lcModel()
,
nClusters()
,
nIds()
,
nobs.lcModel()
,
plot-lcModel-method
,
plotClusterTrajectories()
,
plotFittedTrajectories()
,
postprob()
,
predict.lcModel()
,
predictAssignments()
,
predictForCluster()
,
predictPostprob()
,
qqPlot()
,
residuals.lcModel()
,
sigma.lcModel()
,
strip()
,
time.lcModel()
,
trajectoryAssignments()
data(latrendData) method <- lcMethodRandom("Y", id = "Id", time = "Time") model <- latrend(method, latrendData) ids(model) # 1, 2, ..., 200
data(latrendData) method <- lcMethodRandom("Y", id = "Id", time = "Time") model <- latrend(method, latrendData) ids(model) # 1, 2, ..., 200
Extracts the trajectory identifier variable (i.e., column name) from the given object
.
idVariable(object, ...) ## S4 method for signature 'lcMethod' idVariable(object, ...) ## S4 method for signature 'lcModel' idVariable(object) ## S4 method for signature 'ANY' idVariable(object)
idVariable(object, ...) ## S4 method for signature 'lcMethod' idVariable(object, ...) ## S4 method for signature 'lcModel' idVariable(object) ## S4 method for signature 'ANY' idVariable(object)
object |
The object. |
... |
Not used. |
A nonempty string, as character
.
Other variables:
responseVariable()
,
timeVariable()
method <- lcMethodLMKM(Y ~ Time, id = "Traj") idVariable(method) # "Traj" method <- lcMethodRandom("Y", id = "Id", time = "Time") model <- latrend(method, latrendData) idVariable(model) # "Id"
method <- lcMethodLMKM(Y ~ Time, id = "Traj") idVariable(method) # "Traj" method <- lcMethodRandom("Y", id = "Id", time = "Time") model <- latrend(method, latrendData) idVariable(model) # "Id"
Initialization of lcMethod
objects, converting arbitrary arguments to arguments as part of an lcMethod
object.
## S4 method for signature 'lcMethod' initialize(.Object, ...)
## S4 method for signature 'lcMethod' initialize(.Object, ...)
.Object |
The newly allocated |
... |
Other method arguments. |
new("lcMethodLMKM", formula = Y ~ Time, id = "Id", time = "Time")
new("lcMethodLMKM", formula = Y ~ Time, id = "Id", time = "Time")
Virtual class for internal use. Do not use.
## S4 method for signature 'lcMetaMethod' compose(method, envir = NULL) ## S4 method for signature 'lcMetaMethod' getLcMethod(object, ...) ## S4 method for signature 'lcMetaMethod' getName(object, ...) ## S4 method for signature 'lcMetaMethod' getShortName(object, ...) ## S4 method for signature 'lcMetaMethod' idVariable(object, ...) ## S4 method for signature 'lcMetaMethod' preFit(method, data, envir, verbose) ## S4 method for signature 'lcMetaMethod' prepareData(method, data, verbose) ## S4 method for signature 'lcMetaMethod' fit(method, data, envir, verbose) ## S4 method for signature 'lcMetaMethod' postFit(method, data, model, envir, verbose) ## S4 method for signature 'lcMetaMethod' responseVariable(object, ...) ## S4 method for signature 'lcMetaMethod' timeVariable(object, ...) ## S4 method for signature 'lcMetaMethod' validate(method, data, envir = NULL, ...) ## S3 method for class 'lcMetaMethod' update(object, ...) ## S4 method for signature 'lcFitConverged' fit(method, data, envir, verbose) ## S4 method for signature 'lcFitConverged' validate(method, data, envir = NULL, ...) ## S4 method for signature 'lcFitRep' fit(method, data, envir, verbose) ## S4 method for signature 'lcFitRep' validate(method, data, envir = NULL, ...)
## S4 method for signature 'lcMetaMethod' compose(method, envir = NULL) ## S4 method for signature 'lcMetaMethod' getLcMethod(object, ...) ## S4 method for signature 'lcMetaMethod' getName(object, ...) ## S4 method for signature 'lcMetaMethod' getShortName(object, ...) ## S4 method for signature 'lcMetaMethod' idVariable(object, ...) ## S4 method for signature 'lcMetaMethod' preFit(method, data, envir, verbose) ## S4 method for signature 'lcMetaMethod' prepareData(method, data, verbose) ## S4 method for signature 'lcMetaMethod' fit(method, data, envir, verbose) ## S4 method for signature 'lcMetaMethod' postFit(method, data, model, envir, verbose) ## S4 method for signature 'lcMetaMethod' responseVariable(object, ...) ## S4 method for signature 'lcMetaMethod' timeVariable(object, ...) ## S4 method for signature 'lcMetaMethod' validate(method, data, envir = NULL, ...) ## S3 method for class 'lcMetaMethod' update(object, ...) ## S4 method for signature 'lcFitConverged' fit(method, data, envir, verbose) ## S4 method for signature 'lcFitConverged' validate(method, data, envir = NULL, ...) ## S4 method for signature 'lcFitRep' fit(method, data, envir, verbose) ## S4 method for signature 'lcFitRep' validate(method, data, envir = NULL, ...)
method |
The |
envir |
The |
object |
The model. |
... |
Not used. |
data |
A |
verbose |
A R.utils::Verbose object indicating the level of verbosity. |
model |
The |
An overview of the latrend package and its capabilities can be found here.
The latrend()
function fits a specified longitudinal cluster method to the given data comprising the trajectories.
This function runs all steps of the standardized method estimation procedure, as implemented by the given lcMethod
object.
The result of this procedure is the estimated lcModel.
latrend( method, data, ..., envir = NULL, verbose = getOption("latrend.verbose") )
latrend( method, data, ..., envir = NULL, verbose = getOption("latrend.verbose") )
method |
An lcMethod object specifying the longitudinal cluster method to apply, or the name (as |
data |
The data of the trajectories to which to estimate the method for.
Any inputs supported by |
... |
Any other arguments to update the |
envir |
The |
verbose |
The level of verbosity. Either an object of class |
If a seed value is specified in the lcMethod
object or arguments to latrend
, this seed is set using set.seed
prior to the preFit step.
A lcModel object representing the fitted solution.
Other longitudinal cluster fit functions:
latrendBatch()
,
latrendBoot()
,
latrendCV()
,
latrendRep()
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, data = latrendData) model <- latrend("lcMethodLMKM", formula = Y ~ Time, id = "Id", time = "Time", data = latrendData) model <- latrend(method, data = latrendData, nClusters = 3, seed = 1)
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, data = latrendData) model <- latrend("lcMethodLMKM", formula = Y ~ Time, id = "Id", time = "Time", data = latrendData) model <- latrend(method, data = latrendData, nClusters = 3, seed = 1)
This page provides high-level guidelines on which methods are applicable to your dataset. Note that this is intended as a quick-start.
Recommended overview and comparison papers:
(Den Teuling et al. 2021): A tutorial and overview on methods for longitudinal clustering.
Den Teuling et al. (2021) compared KmL, MixTVEM, GBTM, GMM, and GCKM.
Twisk and Hoekstra (2012) compared KmL, GCKM, LLCA, GBTM and GMM.
Verboon and Pat-El (2022) compared the kml, traj and lcmm packages in R.
Martin and von Oertzen (2015) compared KmL, LCA, and GMM.
Disclaimer: The table below has been adapted from a pre-print of (Den Teuling et al. 2021).
Approach | Strengths | Limitations | Methods |
Cross-sectional clustering | Suitable for large datasets — Many available algorithms — Non-parametric cluster trajectory representation | Requires time-aligned complete data — Sensitive to measurement noise | lcMethodKML lcMethodMclustLLPA lcMethodMixtoolsNPRM |
Distance-based clustering | Suitable for medium-sized datasets — Many distance metrics — Distance matrix only needs to be computed once | Scales poorly with number of trajectories — No robust cluster trajectory representation — Some distance metrics require aligned observations | lcMethodDtwclust |
Feature-based clustering | Suitable for large datasets — Configurable — Features only needs to be computed once — Compact trajectory representation | Generally requires intensive longitudinal data — Sensitive to outliers | lcMethodFeature lcMethodAkmedoids lcMethodLMKM lcMethodGCKM |
Model-based clustering | Parametric cluster trajectory — Incorporate (domain) assumptions — Low sample size requirements | Computationally intensive — Scales poorly with number of clusters — Convergence challenges | lcMethodLcmmGBTM lcMethodLcmmGMM lcMethodCrimCV lcMethodFlexmix lcMethodFlexmixGBTM lcMethodFunFEM lcMethodMixAK_GLMM lcMethodMixtoolsGMM lcMethodMixTVEM |
It is strongly encouraged to evaluate and compare several candidate methods in order to identify the most suitable method.
Den Teuling N, Pauws S, Heuvel Evd (2021).
“Clustering of longitudinal data: A tutorial on a variety of approaches.”
doi:10.48550/ARXIV.2111.05469, https://arxiv.org/abs/2111.05469.
Den Teuling NGP, Pauws SC, van den Heuvel ER (2021).
“A comparison of methods for clustering longitudinal data with slowly changing trends.”
Communications in Statistics - Simulation and Computation.
doi:10.1080/03610918.2020.1861464.
Martin DP, von Oertzen T (2015).
“Growth mixture models outperform simpler clustering algorithms when detecting longitudinal heterogeneity, even with small sample sizes.”
Struct. Equ. Model., 22(2), 264–275.
ISSN 1070-5511, doi:10.1080/10705511.2014.936340.
Twisk J, Hoekstra T (2012).
“Classifying developmental trajectories over time should be done with great caution: A comparison between methods.”
Journal of Clinical Epidemiology, 65(10), 1078–1087.
ISSN 0895-4356, doi:10.1016/j.jclinepi.2012.04.010.
Verboon P, Pat-El R (2022).
“Clustering Longitudinal Data Using R: A Monte Carlo Study.”
Methodology, 18(2), 144-163.
doi:10.5964/meth.7143.
latrend-methods latrend-estimation latrend-metrics
The latrend estimation functions expect univariate
longitudinal data that can be represented in a data.frame
with one row per trajectory observation:
Trajectory identifier: numeric
, character
, or factor
Observation time: numeric
Observation value: numeric
In principle, any type of longitudinal data structure is supported,
given that it can be transformed to the required data.frame
format using the generic trajectories function.
Support can be added by implementing the trajectories function for the respective signature.
This means that users can implement their own data adapters as needed.
The following datasets are included with the package:
lcMethod
estimation functionsThis page presents an overview of the different functions that are available for estimating one or more longitudinal cluster methods. All functions are prefixed by "latrend".
latrend()
: estimate a method on a longitudinal dataset, returning the resulting model.
latrendBatch()
: estimate multiple methods on multiple longitudinal datasets, returning a list of models.
latrendRep()
: repeatedly estimate a method on a longitudinal dataset, returning a list of models.
latrendBoot()
: repeatedly estimate a method on bootstrapped longitudinal dataset, returning a list of models.
latrendCV()
: repeatedly estimate a method using cross-validation on a longitudinal dataset, returning a list of models.
The functions involving repeated estimation support parallel computation. See here.
latrend-package lcMethod-estimation
Generics used by latrend for different classes
This page provides an overview of the currently supported methods for longitudinal clustering. For general recommendations on which method to apply to your dataset, see here.
Method | Description | Source |
lcMethodAkmedoids | Anchored k-medoids (Adepeju et al. 2020) | akmedoids |
lcMethodCrimCV | Group-based trajectory modeling of count data (Nielsen 2018) | crimCV |
lcMethodDtwclust | Methods for distance-based clustering, including dynamic time warping (Sardá-Espinosa 2019) | dtwclust |
lcMethodFeature | Feature-based clustering | |
lcMethodFlexmix | Interface to the FlexMix framework (Grün and Leisch 2008) | flexmix |
lcMethodFlexmixGBTM | Group-based trajectory modeling | flexmix |
lcMethodFunFEM | Model-based clustering using funFEM (Bouveyron 2015) | funFEM |
lcMethodGCKM | Growth-curve modeling and k-means | lme4 |
lcMethodKML | Longitudinal k-means (Genolini et al. 2015) | kml |
lcMethodLcmmGBTM | Group-based trajectory modeling (Proust-Lima et al. 2017) | lcmm |
lcMethodLcmmGMM | Growth mixture modeling (Proust-Lima et al. 2017) | lcmm |
lcMethodLMKM | Feature-based clustering using linear regression and k-means | |
lcMethodMclustLLPA | Longitudinal latent profile analysis (Scrucca et al. 2016) | mclust |
lcMethodMixAK_GLMM | Mixture of generalized linear mixed models | mixAK |
lcMethodMixtoolsGMM | Growth mixture modeling | mixtools |
lcMethodMixtoolsNPRM | Non-parametric repeated measures clustering (Benaglia et al. 2009) | mixtools |
lcMethodMixTVEM | Mixture of time-varying effects models | |
lcMethodRandom | Random partitioning | |
lcMethodStratify | Stratification rule | |
In addition, the functionality of any method can be extended via meta methods. This is used for extending the estimation procedure of a method, such as repeated fitting and selecting the best result, or fitting until convergence.
It is strongly encouraged to evaluate and compare several candidate methods in order to identify the most suitable method.
Adepeju M, Langton S, Bannister J (2020).
akmedoids: Anchored Kmedoids for Longitudinal Data Clustering.
R package version 0.1.5, https://CRAN.R-project.org/package=akmedoids.
Benaglia T, Chauveau D, Hunter DR, Young D (2009).
“mixtools: An R Package for Analyzing Finite Mixture Models.”
Journal of Statistical Software, 32(6), 1–29.
doi:10.18637/jss.v032.i06.
Bouveyron C (2015).
funFEM: Clustering in the Discriminative Functional Subspace.
R package version 1.1, https://CRAN.R-project.org/package=funFEM.
Genolini C, Alacoque X, Sentenac M, Arnaud C (2015).
“kml and kml3d: R Packages to Cluster Longitudinal Data.”
Journal of Statistical Software, 65(4), 1–34.
doi:10.18637/jss.v065.i04.
Grün B, Leisch F (2008).
“FlexMix Version 2: Finite Mixtures with Concomitant Variables and Varying and Constant Parameters.”
Journal of Statistical Software, 28(4), 1–35.
doi:10.18637/jss.v028.i04.
Nielsen JD (2018).
crimCV: Group-Based Modelling of Longitudinal Data.
R package version 0.9.6, https://CRAN.R-project.org/package=crimCV.
Proust-Lima C, Philipps V, Liquet B (2017).
“Estimation of Extended Mixed Models Using Latent Classes and Latent Processes: The R Package lcmm.”
Journal of Statistical Software, 78(2), 1–56.
doi:10.18637/jss.v078.i02.
Sardá-Espinosa A (2019).
“Time-Series Clustering in R Using the dtwclust Package.”
The R Journal.
doi:10.32614/RJ-2019-023.
Scrucca L, Fop M, Murphy TB, Raftery AE (2016).
“mclust 5: clustering, classification and density estimation using Gaussian finite mixture models.”
The R Journal, 8(1), 205–233.
latrend-approaches latrend-estimation latrend-metrics
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, data = latrendData)
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, data = latrendData)
The package supports a variety of metrics that help to evaluate and compare estimated models.
Internal metrics: metrics that assess the adequacy of the model with respect to the data.
External metrics: metrics that compare two models.
Users can implement new metrics through defineInternalMetric()
and defineExternalMetric()
.
Custom-defined metrics are accessible using the same by-name mechanism as the other metrics.
Metric name | Description | Function / Reference |
AIC |
Akaike information criterion. A goodness-of-fit estimator that adjusts for model complexity (i.e., the number of parameters). Only available for models that support the computation of the model log-likelihood through logLik. | stats::AIC() , (Akaike 1974) |
APPA.mean |
Mean of the average posterior probability of assignment (APPA) across clusters. A measure of the precision of the trajectory classifications. A score of 1 indicates perfect classification. | APPA() , (Nagin 2005) |
APPA.min |
Lowest APPA among the clusters | APPA() , (Nagin 2005) |
ASW |
Average silhouette width based on the Euclidean distance | (Rousseeuw 1987) |
BIC |
Bayesian information criterion. A goodness-of-fit estimator that corrects for the degrees of freedom (i.e., the number of parameters) and sample size. Only available for models that support the computation of the model log-likelihood through logLik. | stats::BIC() , (Schwarz 1978) |
CAIC |
Consistent Akaike information criterion | (Bozdogan 1987) |
CLC |
Classification likelihood criterion | (McLachlan and Peel 2000) |
converged |
Whether the model converged during estimation | converged() |
deviance |
The model deviance | stats::deviance() |
Dunn |
The Dunn index | (Dunn 1974) |
entropy |
Entropy of the posterior probabilities | |
estimationTime |
The time needed for fitting the model | estimationTime() |
ED |
Euclidean distance between the cluster trajectories and the assigned observed trajectories | |
ED.fit |
Euclidean distance between the cluster trajectories and the assigned fitted trajectories | |
ICL.BIC |
Integrated classification likelihood (ICL) approximated using the BIC | (Biernacki et al. 2000) |
logLik |
Model log-likelihood | stats::logLik() |
MAE |
Mean absolute error of the fitted trajectories (assigned to the most likely respective cluster) to the observed trajectories | |
Mahalanobis |
Mahalanobis distance between the cluster trajectories and the assigned observed trajectories | (Mahalanobis 1936) |
MSE |
Mean squared error of the fitted trajectories (assigned to the most likely respective cluster) to the observed trajectories | |
relativeEntropy , RE |
A measure of the precision of the trajectory classification. A value of 1 indicates perfect classification, whereas a value of 0 indicates a non-informative uniform classification. It is the normalized version of entropy , scaled between [0, 1]. |
(Ramaswamy et al. 1993), (Muthén 2004) |
RMSE |
Root mean squared error of the fitted trajectories (assigned to the most likely respective cluster) to the observed trajectories | |
RSS |
Residual sum of squares under most likely cluster allocation | |
scaledEntropy |
See relativeEntropy |
|
sigma |
The residual standard deviation | stats::sigma() |
ssBIC |
Sample-size adjusted BIC | (Sclove 1987) |
SED |
Standardized Euclidean distance between the cluster trajectories and the assigned observed trajectories | |
SED.fit |
The cluster-weighted standardized Euclidean distance between the cluster trajectories and the assigned fitted trajectories | |
WMAE |
MAE weighted by cluster-assignment probability |
|
WMSE |
MSE weighted by cluster-assignment probability |
|
WRMSE |
RMSE weighted by cluster-assignment probability |
|
WRSS |
RSS weighted by cluster-assignment probability |
|
Metric name | Description | Function / Reference |
adjustedRand |
Adjusted Rand index. Based on the Rand index, but adjusted for agreements occurring by chance. A score of 1 indicates a perfect agreement, whereas a score of 0 indicates an agreement no better than chance. | mclustcomp::mclustcomp() , (Hubert and Arabie 1985) |
CohensKappa |
Cohen's kappa. A partitioning agreement metric correcting for random chance. A score of 1 indicates a perfect agreement, whereas a score of 0 indicates an agreement no better than chance. | psych::cohen.kappa() , (Cohen 1960) |
F |
F-score | mclustcomp::mclustcomp() |
F1 |
F1-score, also referred to as the Sørensen–Dice Coefficient, or Dice similarity coefficient | mclustcomp::mclustcomp() |
FolkesMallows |
Fowlkes-Mallows index | mclustcomp::mclustcomp() |
Hubert |
Hubert index | clusterCrit::extCriteria() |
Jaccard |
Jaccard index | mclustcomp::mclustcomp() |
jointEntropy |
Joint entropy between model assignments | mclustcomp::mclustcomp() |
Kulczynski |
Kulczynski index | clusterCrit::extCriteria() |
MaximumMatch |
Maximum match measure | mclustcomp::mclustcomp() |
McNemar |
McNemar statistic | clusterCrit::extCriteria() |
MeilaHeckerman |
Meila-Heckerman measure | mclustcomp::mclustcomp() |
Mirkin |
Mirkin metric | mclustcomp::mclustcomp() |
MI |
Mutual information | mclustcomp::mclustcomp() |
NMI |
Normalized mutual information | igraph::compare() |
NSJ |
Normalized version of splitJoin . The proportion of edits relative to the maximum changes (twice the number of ids) |
|
NVI |
Normalized variation of information | mclustcomp::mclustcomp() |
Overlap |
Overlap coefficient, also referred to as the Szymkiewicz–Simpson coefficient | mclustcomp::mclustcomp() (M K and K 2016) |
PD |
Partition difference | mclustcomp::mclustcomp() |
Phi |
Phi coefficient. | clusterCrit::extCriteria() |
precision |
precision | clusterCrit::extCriteria() |
Rand |
Rand index | mclustcomp::mclustcomp() |
recall |
recall | clusterCrit::extCriteria() |
RogersTanimoto |
Rogers-Tanimoto dissimilarity | clusterCrit::extCriteria() |
RusselRao |
Russell-Rao dissimilarity | clusterCrit::extCriteria() |
SMC |
Simple matching coefficient | mclustcomp::mclustcomp() |
splitJoin |
total split-join index | igraph::split_join_distance() |
splitJoin.ref |
Split-join index of the first model to the second model. In other words, it is the edit-distance between the two partitionings. | |
SokalSneath1 |
Type-1 Sokal-Sneath dissimilarity | clusterCrit::extCriteria() |
SokalSneath2 |
Type-2 Sokal-Sneath dissimilarity | clusterCrit::extCriteria() |
VI |
Variation of information | mclustcomp::mclustcomp() |
Wallace1 |
Type-1 Wallace criterion | mclustcomp::mclustcomp() |
Wallace2 |
Type-2 Wallace criterion | mclustcomp::mclustcomp() |
WMSSE |
Weighted minimum sum of squared errors between cluster trajectories | |
WMMSE |
Weighted minimum mean of squared errors between cluster trajectories | |
WMMAE |
Weighted minimum mean of absolute errors between cluster trajectories | |
The model estimation functions support parallel computation through the use of the foreach mechanism. In order to make use of parallel execution, a parallel back-end must be registered.
On Windows, the parallel-package can be used to define parallel socket workers.
nCores <- parallel::detectCores(logical = FALSE) cl <- parallel::makeCluster(nCores)
Then, register the cluster as the parallel back-end using the doParallel
package:
doParallel::registerDoParallel(cl)
If you defined your own lcMethod
or lcModel
extension classes, make sure to load them on the workers as well.
This can be done, for example, using:
parallel::clusterEvalQ(cl, expr = setClass('lcMethodMyImpl', contains = "lcMethod"))
On Unix systems, it is easier to setup parallelization as the R process is forked.
In this example we use the doMC
package:
nCores <- parallel::detectCores(logical = FALSE) doMC::registerDoMC(nCores)
latrendRep, latrendBatch, latrendBoot, latrendCV
data(latrendData) # parallel latrendRep() method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") models <- latrendRep(method, data = latrendData, .rep = 5, parallel = TRUE) # parallel latrendBatch() methods <- lcMethods(method, nClusters = 1:3) models <- latrendBatch(methods, data = latrendData, parallel = TRUE)
data(latrendData) # parallel latrendRep() method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") models <- latrendRep(method, data = latrendData, .rep = 5, parallel = TRUE) # parallel latrendBatch() methods <- lcMethods(method, nClusters = 1:3) models <- latrendBatch(methods, data = latrendData, parallel = TRUE)
Fit a list of longitudinal cluster methods on one or more datasets.
latrendBatch( methods, data, cartesian = TRUE, seed = NULL, parallel = FALSE, errorHandling = "stop", envir = NULL, verbose = getOption("latrend.verbose") )
latrendBatch( methods, data, cartesian = TRUE, seed = NULL, parallel = FALSE, errorHandling = "stop", envir = NULL, verbose = getOption("latrend.verbose") )
methods |
A |
data |
The dataset(s) to which to fit the respective |
cartesian |
Whether to fit the provided methods on each of the datasets. If |
seed |
Sets the seed for generating a seed number for the methods.
Seeds are only set for methods without a seed argument or |
parallel |
Whether to enable parallel evaluation. See latrend-parallel. Method evaluation and dataset transformation is done on the calling thread. |
errorHandling |
Whether to |
envir |
The |
verbose |
The level of verbosity. Either an object of class |
Methods and datasets are evaluated and validated prior to any fitting. This ensures that the batch estimation fails as early as possible in case of errors.
A lcModels
object.
In case of a model fit error under errorHandling = pass
, a list
is returned.
lcMethods
Other longitudinal cluster fit functions:
latrend()
,
latrendBoot()
,
latrendCV()
,
latrendRep()
data(latrendData) refMethod <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") methods <- lcMethods(refMethod, nClusters = 1:2) models <- latrendBatch(methods, data = latrendData) # different dataset per method models <- latrendBatch( methods, data = .( subset(latrendData, Time > .5), subset(latrendData, Time < .5) ) )
data(latrendData) refMethod <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") methods <- lcMethods(refMethod, nClusters = 1:2) models <- latrendBatch(methods, data = latrendData) # different dataset per method models <- latrendBatch( methods, data = .( subset(latrendData, Time > .5), subset(latrendData, Time < .5) ) )
Performs bootstrapping, generating samples from the given data at the id level, fitting a lcModel to each sample.
latrendBoot( method, data, samples = 50, seed = NULL, parallel = FALSE, errorHandling = "stop", envir = NULL, verbose = getOption("latrend.verbose") )
latrendBoot( method, data, samples = 50, seed = NULL, parallel = FALSE, errorHandling = "stop", envir = NULL, verbose = getOption("latrend.verbose") )
method |
An lcMethod object specifying the longitudinal cluster method to apply, or the name (as |
data |
A |
samples |
The number of bootstrap samples to evaluate. |
seed |
The seed to use. Optional. |
parallel |
Whether to enable parallel evaluation. See latrend-parallel. Method evaluation and dataset transformation is done on the calling thread. |
errorHandling |
Whether to |
envir |
The |
verbose |
The level of verbosity. Either an object of class |
A lcModels
object of length samples
.
Other longitudinal cluster fit functions:
latrend()
,
latrendBatch()
,
latrendCV()
,
latrendRep()
Other validation methods:
createTestDataFold()
,
createTestDataFolds()
,
createTrainDataFolds()
,
latrendCV()
,
lcModel-data-filters
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") bootModels <- latrendBoot(method, latrendData, samples = 10) bootMAE <- metric(bootModels, name = "MAE") mean(bootMAE) sd(bootMAE)
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") bootModels <- latrendBoot(method, latrendData, samples = 10) bootMAE <- metric(bootModels, name = "MAE") mean(bootMAE) sd(bootMAE)
Apply k-fold cross validation for internal cluster validation. Creates k random subsets ("folds") from the data, estimating a model for each of the k-1 combined folds.
latrendCV( method, data, folds = 10, seed = NULL, parallel = FALSE, errorHandling = "stop", envir = NULL, verbose = getOption("latrend.verbose") )
latrendCV( method, data, folds = 10, seed = NULL, parallel = FALSE, errorHandling = "stop", envir = NULL, verbose = getOption("latrend.verbose") )
method |
An lcMethod object specifying the longitudinal cluster method to apply, or the name (as |
data |
A |
folds |
The number of folds. Ten folds by default. |
seed |
The seed to use. Optional. |
parallel |
Whether to enable parallel evaluation. See latrend-parallel. Method evaluation and dataset transformation is done on the calling thread. |
errorHandling |
Whether to |
envir |
The |
verbose |
The level of verbosity. Either an object of class |
A lcModels
object of containing the folds
training models.
Other longitudinal cluster fit functions:
latrend()
,
latrendBatch()
,
latrendBoot()
,
latrendRep()
Other validation methods:
createTestDataFold()
,
createTestDataFolds()
,
createTrainDataFolds()
,
latrendBoot()
,
lcModel-data-filters
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") if (require("caret")) { model <- latrendCV(method, latrendData, folds = 5, seed = 1) model <- latrendCV(method, subset(latrendData, Time < .5), folds = 5) }
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") if (require("caret")) { model <- latrendCV(method, latrendData, folds = 5, seed = 1) model <- latrendCV(method, subset(latrendData, Time < .5), folds = 5) }
An artificial longitudinal dataset comprising 200 trajectories belonging to one of 3 classes. Each trajectory deviates in intercept and slope from its respective class trajectory.
latrendData
latrendData
A data.frame
comprising longitudinal observations from 200 trajectories.
Each row represents the observed value of a trajectory at a specific moment in time.
integer
: The trajectory identifier.
numeric
: The measurement time, between 0 and 2.
numeric
: The observed value at the respective time Time
for trajectory Id
.
factor
: The reference class.
data(latrendData) head(latrendData) #> Id Time Y Class #> 1 1 0.0000000 -1.08049205 Class 1 #> 2 1 0.2222222 -0.68024151 Class 1 #> 3 1 0.4444444 -0.65148373 Class 1 #> 4 1 0.6666667 -0.39115398 Class 1 #> 5 1 0.8888889 -0.19407876 Class 1 #> 6 1 1.1111111 -0.02991783 Class 1
This dataset was generated using generateLongData.
data(latrendData) if (require("ggplot2")) { plotTrajectories(latrendData, id = "Id", time = "Time", response = "Y") # plot according to the reference class plotTrajectories(latrendData, id = "Id", time = "Time", response = "Y", cluster = "Class") }
data(latrendData) if (require("ggplot2")) { plotTrajectories(latrendData, id = "Id", time = "Time", response = "Y") # plot according to the reference class plotTrajectories(latrendData, id = "Id", time = "Time", response = "Y", cluster = "Class") }
Performs a repeated fit of the specified latrend model on the given data.
latrendRep( method, data, .rep = 10, ..., .errorHandling = "stop", .seed = NULL, .parallel = FALSE, envir = NULL, verbose = getOption("latrend.verbose") )
latrendRep( method, data, .rep = 10, ..., .errorHandling = "stop", .seed = NULL, .parallel = FALSE, envir = NULL, verbose = getOption("latrend.verbose") )
method |
An lcMethod object specifying the longitudinal cluster method to apply, or the name (as |
data |
The data of the trajectories to which to estimate the method for.
Any inputs supported by |
.rep |
The number of repeated fits. |
... |
Any other arguments to update the |
.errorHandling |
Whether to |
.seed |
Set the seed for generating the respective seed for each of the repeated fits. |
.parallel |
Whether to use parallel evaluation. See latrend-parallel. |
envir |
The |
verbose |
The level of verbosity. Either an object of class |
This method is faster than repeatedly calling latrend as it only prepares the data via prepareData()
once.
A lcModels
object containing the resulting models.
Other longitudinal cluster fit functions:
latrend()
,
latrendBatch()
,
latrendBoot()
,
latrendCV()
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") models <- latrendRep(method, data = latrendData, .rep = 5) # 5 repeated runs models <- latrendRep(method, data = latrendData, .seed = 1, .rep = 3)
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") models <- latrendRep(method, data = latrendData, .rep = 5) # 5 repeated runs models <- latrendRep(method, data = latrendData, .seed = 1, .rep = 3)
approx models have defined cluster trajectories at fixed moments in time, which should be interpolated
For a correct implementation, lcApproxModel
requires the extending class to implement clusterTrajectories(at=NULL)
to return the fixed cluster trajectories
## S3 method for class 'lcApproxModel' fitted(object, ..., clusters = trajectoryAssignments(object)) ## S4 method for signature 'lcApproxModel' predictForCluster( object, newdata, cluster, what = "mu", approxFun = approx, ... )
## S3 method for class 'lcApproxModel' fitted(object, ..., clusters = trajectoryAssignments(object)) ## S4 method for signature 'lcApproxModel' predictForCluster( object, newdata, cluster, what = "mu", approxFun = approx, ... )
object |
The |
... |
Additional arguments. |
clusters |
Optional cluster assignments per id. If unspecified, a |
newdata |
A |
cluster |
The cluster name (as |
what |
The distributional parameter to predict. By default, the mean response 'mu' is predicted. The cluster membership predictions can be obtained by specifying |
approxFun |
Function to interpolate between measurement moments, approx() by default. |
A collection of special methods that adapt the fitting procedure of the underlying longitudinal cluster method.
NOTE: the underlying implementation is experimental and may change in the future.
Supported fit methods:
lcFitConverged
: Fit a method until a converged result is obtained.
lcFitRep
: Repeatedly fit a method and return the best result based on a given internal metric.
lcFitRepMin
: Repeatedly fit a method and return the best result that minimizes the given internal metric.
lcFitRepMax
: Repeatedly fit a method and return the best result that maximizes the given internal metric.
lcFitConverged(method, maxRep = Inf) lcFitRep(method, rep = 10, metric, maximize) lcFitRepMin(method, rep = 10, metric) lcFitRepMax(method, rep = 10, metric)
lcFitConverged(method, maxRep = Inf) lcFitRep(method, rep = 10, metric, maximize) lcFitRepMin(method, rep = 10, metric) lcFitRepMax(method, rep = 10, metric)
method |
The |
maxRep |
The maximum number of fit attempts |
rep |
The number of fits |
metric |
The internal metric to assess the fit. |
maximize |
Whether to maximize the metric. Otherwise, it is minimized. |
Meta methods are immutable and cannot be updated after instantiation. Calling update()
on a meta method is only used to update arguments of the underlying lcMethod object.
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time", nClusters = 2) metaMethod <- lcFitConverged(method, maxRep = 10) metaMethod model <- latrend(metaMethod, latrendData) data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time", nClusters = 2) repMethod <- lcFitRep(method, rep = 10, metric = "RSS", maximize = FALSE) repMethod model <- latrend(repMethod, latrendData) minMethod <- lcFitRepMin(method, rep = 10, metric = "RSS") maxMethod <- lcFitRepMax(method, rep = 10, metric = "ASW")
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time", nClusters = 2) metaMethod <- lcFitConverged(method, maxRep = 10) metaMethod model <- latrend(metaMethod, latrendData) data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time", nClusters = 2) repMethod <- lcFitRep(method, rep = 10, metric = "RSS", maximize = FALSE) repMethod model <- latrend(repMethod, latrendData) minMethod <- lcFitRepMin(method, rep = 10, metric = "RSS") maxMethod <- lcFitRepMax(method, rep = 10, metric = "ASW")
lcMethod
objects represent the specification of a method for longitudinal clustering.
Furthermore, the object class contains the logic for estimating the respective method.
You can specify a longitudinal cluster method through one of the method-specific constructor functions,
e.g., lcMethodKML()
, lcMethodLcmmGBTM()
, or lcMethodDtwclust()
.
Alternatively, you can instantiate methods through methods::new()
, e.g., by calling new("lcMethodKML", response = "Value")
.
In both cases, default values are specified for omitted arguments.
Because the lcMethod
arguments may be unevaluated, argument retrieval functions such as [[
accept an envir
argument.
A default environment
can be assigned or obtained from a lcMethod
object using the environment()
function.
arguments
A list
representing the arguments of the lcMethod
object.
Arguments are not evaluated upon creation of the method object.
Instead, arguments are stored similar to a call
object, and are only evaluated when a method is fitted.
Do not modify or access.
sourceCalls
A list of calls for tracking the original call after substitution. Used for printing objects which require too many characters (e.g. ,function definitions, matrices). Do not modify or access.
An lcMethod
objects represent the specification of a method with a set of configurable parameters (referred to as arguments).
Arguments can be of any type.
It is up to the lcMethod
implementation of validate()
to ensure that the required arguments are present and are of the expected type.
Arguments can have almost any name. Exceptions include the names "data"
, "envir"
, and "verbose"
.
Furthermore, argument names may not start with a period ("."
).
Arguments cannot be directly modified, i.e., lcMethod
objects are immutable.
Modifying an argument involves creating an altered copy through the update.lcMethod method.
The base class lcMethod
provides the logic for storing, evaluating, and printing the method parameters.
Subclasses of lcMethod
differ only in the fitting procedure logic.
To implement your own lcMethod
subclass, you'll want to implement at least the following functions:
fit()
: The main function for estimating your method.
getName()
: The name of your method.
getShortName()
: The abbreviated name of your method.
getArgumentDefaults()
: Sensible default argument values to your method.
For more complex methods, the additional functions as part of the fitting procedure will be of use.
Other lcMethod implementations:
getArgumentDefaults()
,
getArgumentExclusions()
,
lcMethodAkmedoids
,
lcMethodCrimCV
,
lcMethodDtwclust
,
lcMethodFeature
,
lcMethodFunFEM
,
lcMethodFunction
,
lcMethodGCKM
,
lcMethodKML
,
lcMethodLMKM
,
lcMethodLcmmGBTM
,
lcMethodLcmmGMM
,
lcMethodMclustLLPA
,
lcMethodMixAK_GLMM
,
lcMethodMixtoolsGMM
,
lcMethodMixtoolsNPRM
,
lcMethodRandom
,
lcMethodStratify
Other lcMethod functions:
[[,lcMethod-method
,
as.data.frame.lcMethod()
,
as.data.frame.lcMethods()
,
as.lcMethods()
,
as.list.lcMethod()
,
evaluate.lcMethod()
,
formula.lcMethod()
,
names,lcMethod-method
,
update.lcMethod()
method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time", nClusters = 2) method method <- new("lcMethodLMKM", formula = Y ~ Time, id = "Id", time = "Time", nClusters = 2) # get argument names names(method) # evaluate argument method$nClusters # create a copy with updated nClusters argument method3 <- update(method, nClusters = 3)
method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time", nClusters = 2) method method <- new("lcMethodLMKM", formula = Y ~ Time, id = "Id", time = "Time", nClusters = 2) # get argument names names(method) # evaluate argument method$nClusters # create a copy with updated nClusters argument method3 <- update(method, nClusters = 3)
lcMethod
) estimation procedureEach longitudinal cluster method represented by a lcMethod class implements a series of standardized steps that produce the estimated method as its output.
These steps, as part of the estimation procedure, are executed by the latrend()
function and other functions prefixed by "latrend" (e.g., latrendRep()
, latrendBoot()
, latrendCV()
).
The steps for estimating a lcMethod
object are defined and executed as follows:
compose()
: Evaluate and finalize the method argument values.
validate()
: Check the validity of the method argument values in relation to the dataset.
prepareData()
: Process the training data for fitting.
preFit()
: Prepare environment for estimation, independent of training data.
fit()
: Estimate the specified method on the training data, outputting an object inheriting from lcModel
.
postFit()
: Post-process the outputted lcModel
object.
The result of the fitting procedure is an lcModel object that inherits from the lcModel
class.
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, data = latrendData) summary(model)
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, data = latrendData) summary(model)
Specify AKMedoids method
lcMethodAkmedoids( response, time = getOption("latrend.time"), id = getOption("latrend.id"), nClusters = 3, clusterCenter = median, crit = "Calinski_Harabasz", ... )
lcMethodAkmedoids( response, time = getOption("latrend.time"), id = getOption("latrend.id"), nClusters = 3, clusterCenter = median, crit = "Calinski_Harabasz", ... )
response |
The name of the response variable. |
time |
The name of the time variable. |
id |
The name of the trajectory identification variable. |
nClusters |
The number of clusters to estimate. |
clusterCenter |
A function for computing the cluster center representation. |
crit |
Criterion to apply for internal model selection. Not applicable. |
... |
Arguments passed to |
Adepeju M, Langton S, Bannister J (2020). akmedoids: Anchored Kmedoids for Longitudinal Data Clustering. R package version 0.1.5, https://CRAN.R-project.org/package=akmedoids.
Other lcMethod implementations:
getArgumentDefaults()
,
getArgumentExclusions()
,
lcMethod-class
,
lcMethodCrimCV
,
lcMethodDtwclust
,
lcMethodFeature
,
lcMethodFunFEM
,
lcMethodFunction
,
lcMethodGCKM
,
lcMethodKML
,
lcMethodLMKM
,
lcMethodLcmmGBTM
,
lcMethodLcmmGMM
,
lcMethodMclustLLPA
,
lcMethodMixAK_GLMM
,
lcMethodMixtoolsGMM
,
lcMethodMixtoolsNPRM
,
lcMethodRandom
,
lcMethodStratify
data(latrendData) if (rlang::is_installed("akmedoids")) { method <- lcMethodAkmedoids(response = "Y", time = "Time", id = "Id", nClusters = 3) model <- latrend(method, data = latrendData) }
data(latrendData) if (rlang::is_installed("akmedoids")) { method <- lcMethodAkmedoids(response = "Y", time = "Time", id = "Id", nClusters = 3) model <- latrend(method, data = latrendData) }
Specify a zero-inflated repeated-measures GBTM method
lcMethodCrimCV( response, time = getOption("latrend.time"), id = getOption("latrend.id"), nClusters = 2, ... )
lcMethodCrimCV( response, time = getOption("latrend.time"), id = getOption("latrend.id"), nClusters = 2, ... )
response |
The name of the response variable. |
time |
The name of the time variable. |
id |
The name of the trajectory identifier variable. |
nClusters |
The number of clusters to estimate. |
... |
Arguments passed to crimCV::crimCV. The following external arguments are ignored: Dat, ng. |
Nielsen JD (2018). crimCV: Group-Based Modelling of Longitudinal Data. R package version 0.9.6, https://CRAN.R-project.org/package=crimCV.
Other lcMethod implementations:
getArgumentDefaults()
,
getArgumentExclusions()
,
lcMethod-class
,
lcMethodAkmedoids
,
lcMethodDtwclust
,
lcMethodFeature
,
lcMethodFunFEM
,
lcMethodFunction
,
lcMethodGCKM
,
lcMethodKML
,
lcMethodLMKM
,
lcMethodLcmmGBTM
,
lcMethodLcmmGMM
,
lcMethodMclustLLPA
,
lcMethodMixAK_GLMM
,
lcMethodMixtoolsGMM
,
lcMethodMixtoolsNPRM
,
lcMethodRandom
,
lcMethodStratify
# This example is not tested because crimCV sometimes fails # to converge and throws the error "object 'Frtr' not found" ## Not run: data(latrendData) if (require("crimCV")) { method <- lcMethodCrimCV("Y", id = "Id", time = "Time", nClusters = 3, dpolyp = 1, init = 2) model <- latrend(method, data = subset(latrendData, Time > .5)) if (require("ggplot2")) { plot(model) } data(TO1adj) method <- lcMethodCrimCV(response = "Offenses", time = "Offense", id = "Subject", nClusters = 2, dpolyp = 1, init = 2) model <- latrend(method, data = TO1adj[1:100, ]) } ## End(Not run)
# This example is not tested because crimCV sometimes fails # to converge and throws the error "object 'Frtr' not found" ## Not run: data(latrendData) if (require("crimCV")) { method <- lcMethodCrimCV("Y", id = "Id", time = "Time", nClusters = 3, dpolyp = 1, init = 2) model <- latrend(method, data = subset(latrendData, Time > .5)) if (require("ggplot2")) { plot(model) } data(TO1adj) method <- lcMethodCrimCV(response = "Offenses", time = "Offense", id = "Subject", nClusters = 2, dpolyp = 1, init = 2) model <- latrend(method, data = TO1adj[1:100, ]) } ## End(Not run)
Specify time series clustering via dtwclust
lcMethodDtwclust( response, time = getOption("latrend.time"), id = getOption("latrend.id"), nClusters = 2, ... )
lcMethodDtwclust( response, time = getOption("latrend.time"), id = getOption("latrend.id"), nClusters = 2, ... )
response |
The name of the response variable. |
time |
The name of the time variable. |
id |
The name of the trajectory identifier variable. |
nClusters |
Number of clusters. |
... |
Arguments passed to dtwclust::tsclust. The following arguments are ignored: series, k, trace. |
Sardá-Espinosa A (2019). “Time-Series Clustering in R Using the dtwclust Package.” The R Journal. doi:10.32614/RJ-2019-023.
Other lcMethod implementations:
getArgumentDefaults()
,
getArgumentExclusions()
,
lcMethod-class
,
lcMethodAkmedoids
,
lcMethodCrimCV
,
lcMethodFeature
,
lcMethodFunFEM
,
lcMethodFunction
,
lcMethodGCKM
,
lcMethodKML
,
lcMethodLMKM
,
lcMethodLcmmGBTM
,
lcMethodLcmmGMM
,
lcMethodMclustLLPA
,
lcMethodMixAK_GLMM
,
lcMethodMixtoolsGMM
,
lcMethodMixtoolsNPRM
,
lcMethodRandom
,
lcMethodStratify
data(latrendData) if (require("dtwclust")) { method <- lcMethodDtwclust("Y", id = "Id", time = "Time", nClusters = 3) model <- latrend(method, latrendData) }
data(latrendData) if (require("dtwclust")) { method <- lcMethodDtwclust("Y", id = "Id", time = "Time", nClusters = 3) model <- latrend(method, latrendData) }
Feature-based clustering.
lcMethodFeature( response, representationStep, clusterStep, standardize = scale, center = meanNA, time = getOption("latrend.time"), id = getOption("latrend.id"), ... )
lcMethodFeature( response, representationStep, clusterStep, standardize = scale, center = meanNA, time = getOption("latrend.time"), id = getOption("latrend.id"), ... )
response |
The name of the response variable. |
representationStep |
A |
clusterStep |
A |
standardize |
A |
center |
The |
time |
The name of the time variable. |
id |
The name of the trajectory identification variable. |
... |
Additional arguments. |
In this example we define a feature-based approach where each trajectory is represented using a linear regression model. The coefficients of the trajectories are then clustered using k-means.
Note that this method is already implemented as lcMethodLMKM()
.
Representation step:
repStep <- function(method, data, verbose) { library(data.table) library(magrittr) xdata = as.data.table(data) coefdata <- xdata[, lm(method$formula, .SD) keyby = c(method$id) ] # exclude the id column coefmat <- subset(coefdata, select = -1) rownames(coefmat) <- coefdata[[method$id]] return(coefmat) }
Cluster step:
clusStep <- function(method, data, repMat, envir, verbose) { km <- kmeans(repMat, centers = method$nClusters) lcModelPartition( response = method$response, data = data, trajectoryAssignments = km$cluster ) }
Now specify the method and fit the model:
data(latrendData) method <- lcMethodFeature( formula = Y ~ Time, response = "Y", id = "Id", time = "Time", representationStep = repStep, clusterStep = clusStep model <- latrend(method, data = latrendData) )
Other lcMethod implementations:
getArgumentDefaults()
,
getArgumentExclusions()
,
lcMethod-class
,
lcMethodAkmedoids
,
lcMethodCrimCV
,
lcMethodDtwclust
,
lcMethodFunFEM
,
lcMethodFunction
,
lcMethodGCKM
,
lcMethodKML
,
lcMethodLMKM
,
lcMethodLcmmGBTM
,
lcMethodLcmmGMM
,
lcMethodMclustLLPA
,
lcMethodMixAK_GLMM
,
lcMethodMixtoolsGMM
,
lcMethodMixtoolsNPRM
,
lcMethodRandom
,
lcMethodStratify
Wrapper to the flexmix()
method from the flexmix
package.
lcMethodFlexmix( formula, formula.mb = ~1, time = getOption("latrend.time"), id = getOption("latrend.id"), nClusters = 2, ... )
lcMethodFlexmix( formula, formula.mb = ~1, time = getOption("latrend.time"), id = getOption("latrend.id"), nClusters = 2, ... )
formula |
A |
formula.mb |
A |
time |
The name of the time variable. |
id |
The name of the trajectory identifier variable. |
nClusters |
The number of clusters to estimate. |
... |
Arguments passed to flexmix::flexmix. The following arguments are ignored: data, concomitant, k. |
Grün B, Leisch F (2008). “FlexMix Version 2: Finite Mixtures with Concomitant Variables and Varying and Constant Parameters.” Journal of Statistical Software, 28(4), 1–35. doi:10.18637/jss.v028.i04.
Other lcMethod package interfaces:
lcMethodFlexmixGBTM
data(latrendData) if (require("flexmix")) { method <- lcMethodFlexmix(Y ~ Time, id = "Id", time = "Time", nClusters = 3) model <- latrend(method, latrendData) }
data(latrendData) if (require("flexmix")) { method <- lcMethodFlexmix(Y ~ Time, id = "Id", time = "Time", nClusters = 3) model <- latrend(method, latrendData) }
Fits a GBTM based on the flexmix::FLXMRglm driver.
lcMethodFlexmixGBTM( formula, formula.mb = ~1, time = getOption("latrend.time"), id = getOption("latrend.id"), nClusters = 2, ... )
lcMethodFlexmixGBTM( formula, formula.mb = ~1, time = getOption("latrend.time"), id = getOption("latrend.id"), nClusters = 2, ... )
formula |
A |
formula.mb |
A |
time |
The name of the time variable. |
id |
The name of the trajectory identifier variable. |
nClusters |
The number of clusters to estimate. |
... |
Arguments passed to flexmix::flexmix or flexmix::FLXMRglm. The following arguments are ignored: data, k, trace. |
Grün B, Leisch F (2008). “FlexMix Version 2: Finite Mixtures with Concomitant Variables and Varying and Constant Parameters.” Journal of Statistical Software, 28(4), 1–35. doi:10.18637/jss.v028.i04.
Other lcMethod package interfaces:
lcMethodFlexmix
data(latrendData) if (require("flexmix")) { method <- lcMethodFlexmixGBTM(Y ~ Time, id = "Id", time = "Time", nClusters = 3) model <- latrend(method, latrendData) }
data(latrendData) if (require("flexmix")) { method <- lcMethodFlexmixGBTM(Y ~ Time, id = "Id", time = "Time", nClusters = 3) model <- latrend(method, latrendData) }
Specify a custom method based on a function
lcMethodFunction( response, fun, center = meanNA, time = getOption("latrend.time"), id = getOption("latrend.id"), name = "custom" )
lcMethodFunction( response, fun, center = meanNA, time = getOption("latrend.time"), id = getOption("latrend.id"), name = "custom" )
response |
The name of the response variable. |
fun |
The cluster |
center |
Optional |
time |
The name of the time variable. |
id |
The name of the trajectory identification variable. |
name |
The name of the method. |
Other lcMethod implementations:
getArgumentDefaults()
,
getArgumentExclusions()
,
lcMethod-class
,
lcMethodAkmedoids
,
lcMethodCrimCV
,
lcMethodDtwclust
,
lcMethodFeature
,
lcMethodFunFEM
,
lcMethodGCKM
,
lcMethodKML
,
lcMethodLMKM
,
lcMethodLcmmGBTM
,
lcMethodLcmmGMM
,
lcMethodMclustLLPA
,
lcMethodMixAK_GLMM
,
lcMethodMixtoolsGMM
,
lcMethodMixtoolsNPRM
,
lcMethodRandom
,
lcMethodStratify
data(latrendData) # Stratification based on the mean response level clusfun <- function(data, response, id, time, ...) { clusters <- data.table::as.data.table(data)[, mean(Y) > 0, by = Id]$V1 lcModelPartition( data = data, trajectoryAssignments = factor( clusters, levels = c(FALSE, TRUE), labels = c("Low", "High") ), response = response, time = time, id = id ) } method <- lcMethodFunction(response = "Y", fun = clusfun, id = "Id", time = "Time") model <- latrend(method, data = latrendData)
data(latrendData) # Stratification based on the mean response level clusfun <- function(data, response, id, time, ...) { clusters <- data.table::as.data.table(data)[, mean(Y) > 0, by = Id]$V1 lcModelPartition( data = data, trajectoryAssignments = factor( clusters, levels = c(FALSE, TRUE), labels = c("Low", "High") ), response = response, time = time, id = id ) } method <- lcMethodFunction(response = "Y", fun = clusfun, id = "Id", time = "Time") model <- latrend(method, data = latrendData)
Specify a FunFEM method
lcMethodFunFEM( response, time = getOption("latrend.time"), id = getOption("latrend.id"), nClusters = 2, basis = function(time) fda::create.bspline.basis(time, nbasis = 10, norder = 4), ... )
lcMethodFunFEM( response, time = getOption("latrend.time"), id = getOption("latrend.id"), nClusters = 2, basis = function(time) fda::create.bspline.basis(time, nbasis = 10, norder = 4), ... )
response |
The name of the response variable. |
time |
The name of the time variable. |
id |
The name of the trajectory identifier variable. |
nClusters |
The number of clusters to estimate. |
basis |
The basis function. By default, a 3rd-order B-spline with 10 breaks is used. |
... |
Arguments passed to funFEM::funFEM. The following external arguments are ignored: fd, K, disp, graph. |
Bouveyron C (2015). funFEM: Clustering in the Discriminative Functional Subspace. R package version 1.1, https://CRAN.R-project.org/package=funFEM.
Other lcMethod implementations:
getArgumentDefaults()
,
getArgumentExclusions()
,
lcMethod-class
,
lcMethodAkmedoids
,
lcMethodCrimCV
,
lcMethodDtwclust
,
lcMethodFeature
,
lcMethodFunction
,
lcMethodGCKM
,
lcMethodKML
,
lcMethodLMKM
,
lcMethodLcmmGBTM
,
lcMethodLcmmGMM
,
lcMethodMclustLLPA
,
lcMethodMixAK_GLMM
,
lcMethodMixtoolsGMM
,
lcMethodMixtoolsNPRM
,
lcMethodRandom
,
lcMethodStratify
data(latrendData) if (require("funFEM") && require("fda")) { method <- lcMethodFunFEM("Y", id = "Id", time = "Time", nClusters = 3) model <- latrend(method, latrendData) method <- lcMethodFunFEM("Y", basis = function(time) { create.bspline.basis(time, nbasis = 10, norder = 4) } ) }
data(latrendData) if (require("funFEM") && require("fda")) { method <- lcMethodFunFEM("Y", id = "Id", time = "Time", nClusters = 3) model <- latrend(method, latrendData) method <- lcMethodFunFEM("Y", basis = function(time) { create.bspline.basis(time, nbasis = 10, norder = 4) } ) }
Two-step clustering through latent growth curve modeling and k-means.
lcMethodGCKM( formula, time = getOption("latrend.time"), id = getOption("latrend.id"), nClusters = 2, center = meanNA, standardize = scale, ... )
lcMethodGCKM( formula, time = getOption("latrend.time"), id = getOption("latrend.id"), nClusters = 2, center = meanNA, standardize = scale, ... )
formula |
Formula, including a random effects component for the trajectory. See lme4::lmer formula syntax. |
time |
The name of the time variable.. |
id |
The name of the trajectory identifier variable. |
nClusters |
The number of clusters. |
center |
A |
standardize |
A |
... |
Arguments passed to lme4::lmer. The following external arguments are ignored: data, centers, trace. |
Other lcMethod implementations:
getArgumentDefaults()
,
getArgumentExclusions()
,
lcMethod-class
,
lcMethodAkmedoids
,
lcMethodCrimCV
,
lcMethodDtwclust
,
lcMethodFeature
,
lcMethodFunFEM
,
lcMethodFunction
,
lcMethodKML
,
lcMethodLMKM
,
lcMethodLcmmGBTM
,
lcMethodLcmmGMM
,
lcMethodMclustLLPA
,
lcMethodMixAK_GLMM
,
lcMethodMixtoolsGMM
,
lcMethodMixtoolsNPRM
,
lcMethodRandom
,
lcMethodStratify
data(latrendData) if (require("lme4")) { method <- lcMethodGCKM(Y ~ (Time | Id), id = "Id", time = "Time", nClusters = 3) model <- latrend(method, latrendData) }
data(latrendData) if (require("lme4")) { method <- lcMethodGCKM(Y ~ (Time | Id), id = "Id", time = "Time", nClusters = 3) model <- latrend(method, latrendData) }
Specify a longitudinal k-means (KML) method
lcMethodKML( response, time = getOption("latrend.time"), id = getOption("latrend.id"), nClusters = 2, ... )
lcMethodKML( response, time = getOption("latrend.time"), id = getOption("latrend.id"), nClusters = 2, ... )
response |
The name of the response variable. |
time |
The name of the time variable. |
id |
The name of the trajectory identifier variable. |
nClusters |
The number of clusters to estimate. |
... |
Arguments passed to kml::parALGO and kml::kml. The following external arguments are ignored: object, nbClusters, parAlgo, toPlot, saveFreq |
Genolini C, Alacoque X, Sentenac M, Arnaud C (2015). “kml and kml3d: R Packages to Cluster Longitudinal Data.” Journal of Statistical Software, 65(4), 1–34. doi:10.18637/jss.v065.i04.
Other lcMethod implementations:
getArgumentDefaults()
,
getArgumentExclusions()
,
lcMethod-class
,
lcMethodAkmedoids
,
lcMethodCrimCV
,
lcMethodDtwclust
,
lcMethodFeature
,
lcMethodFunFEM
,
lcMethodFunction
,
lcMethodGCKM
,
lcMethodLMKM
,
lcMethodLcmmGBTM
,
lcMethodLcmmGMM
,
lcMethodMclustLLPA
,
lcMethodMixAK_GLMM
,
lcMethodMixtoolsGMM
,
lcMethodMixtoolsNPRM
,
lcMethodRandom
,
lcMethodStratify
data(latrendData) if (require("kml")) { method <- lcMethodKML("Y", id = "Id", time = "Time", nClusters = 3) model <- latrend(method, latrendData) }
data(latrendData) if (require("kml")) { method <- lcMethodKML("Y", id = "Id", time = "Time", nClusters = 3) model <- latrend(method, latrendData) }
Group-based trajectory modeling through fixed-effects modeling.
lcMethodLcmmGBTM( fixed, mixture = ~1, classmb = ~1, time = getOption("latrend.time"), id = getOption("latrend.id"), nClusters = 2, init = "default", ... )
lcMethodLcmmGBTM( fixed, mixture = ~1, classmb = ~1, time = getOption("latrend.time"), id = getOption("latrend.id"), nClusters = 2, init = "default", ... )
fixed |
The fixed effects formula. |
mixture |
The mixture-specific effects formula. See lcmm::hlme for details. |
classmb |
The cluster membership formula for the multinomial logistic model. See lcmm::hlme for details. |
time |
The name of the time variable. |
id |
The name of the trajectory identifier variable. This replaces the |
nClusters |
The number of clusters to fit. This replaces the |
init |
Alternative for the
The argument is ignored if the |
... |
Arguments passed to lcmm::hlme. The following arguments are ignored: data, fixed, random, mixture, subject, classmb, returndata, ng, verbose, subset. |
Proust-Lima C, Philipps V, Liquet B (2017). “Estimation of Extended Mixed Models Using Latent Classes and Latent Processes: The R Package lcmm.” Journal of Statistical Software, 78(2), 1–56. doi:10.18637/jss.v078.i02.
Proust-Lima C, Philipps V, Diakite A, Liquet B (2019). lcmm: Extended Mixed Models Using Latent Classes and Latent Processes. R package version: 1.8.1, https://cran.r-project.org/package=lcmm.
Other lcMethod implementations:
getArgumentDefaults()
,
getArgumentExclusions()
,
lcMethod-class
,
lcMethodAkmedoids
,
lcMethodCrimCV
,
lcMethodDtwclust
,
lcMethodFeature
,
lcMethodFunFEM
,
lcMethodFunction
,
lcMethodGCKM
,
lcMethodKML
,
lcMethodLMKM
,
lcMethodLcmmGMM
,
lcMethodMclustLLPA
,
lcMethodMixAK_GLMM
,
lcMethodMixtoolsGMM
,
lcMethodMixtoolsNPRM
,
lcMethodRandom
,
lcMethodStratify
data(latrendData) if (rlang::is_installed("lcmm")) { method <- lcMethodLcmmGBTM( fixed = Y ~ Time, mixture = ~ 1, id = "Id", time = "Time", nClusters = 3 ) gbtm <- latrend(method, data = latrendData) summary(gbtm) method <- lcMethodLcmmGBTM( fixed = Y ~ Time, mixture = ~ Time, id = "Id", time = "Time", nClusters = 3 ) }
data(latrendData) if (rlang::is_installed("lcmm")) { method <- lcMethodLcmmGBTM( fixed = Y ~ Time, mixture = ~ 1, id = "Id", time = "Time", nClusters = 3 ) gbtm <- latrend(method, data = latrendData) summary(gbtm) method <- lcMethodLcmmGBTM( fixed = Y ~ Time, mixture = ~ Time, id = "Id", time = "Time", nClusters = 3 ) }
Growth mixture modeling through latent-class linear mixed modeling.
lcMethodLcmmGMM( fixed, mixture = ~1, random = ~1, classmb = ~1, time = getOption("latrend.time"), id = getOption("latrend.id"), init = "lme", nClusters = 2, ... )
lcMethodLcmmGMM( fixed, mixture = ~1, random = ~1, classmb = ~1, time = getOption("latrend.time"), id = getOption("latrend.id"), init = "lme", nClusters = 2, ... )
fixed |
The fixed effects formula. |
mixture |
The mixture-specific effects formula. See lcmm::hlme for details. |
random |
The random effects formula. See lcmm::hlme for details. |
classmb |
The cluster membership formula for the multinomial logistic model. See lcmm::hlme for details. |
time |
The name of the time variable. |
id |
The name of the trajectory identifier variable. This replaces the |
init |
Alternative for the
The argument is ignored if the |
nClusters |
The number of clusters to fit. This replaces the |
... |
Arguments passed to lcmm::hlme. The following arguments are ignored: data, fixed, random, mixture, subject, classmb, returndata, ng, verbose, subset. |
Proust-Lima C, Philipps V, Liquet B (2017). “Estimation of Extended Mixed Models Using Latent Classes and Latent Processes: The R Package lcmm.” Journal of Statistical Software, 78(2), 1–56. doi:10.18637/jss.v078.i02.
Proust-Lima C, Philipps V, Diakite A, Liquet B (2019). lcmm: Extended Mixed Models Using Latent Classes and Latent Processes. R package version: 1.8.1, https://cran.r-project.org/package=lcmm.
Other lcMethod implementations:
getArgumentDefaults()
,
getArgumentExclusions()
,
lcMethod-class
,
lcMethodAkmedoids
,
lcMethodCrimCV
,
lcMethodDtwclust
,
lcMethodFeature
,
lcMethodFunFEM
,
lcMethodFunction
,
lcMethodGCKM
,
lcMethodKML
,
lcMethodLMKM
,
lcMethodLcmmGBTM
,
lcMethodMclustLLPA
,
lcMethodMixAK_GLMM
,
lcMethodMixtoolsGMM
,
lcMethodMixtoolsNPRM
,
lcMethodRandom
,
lcMethodStratify
data(latrendData) if (rlang::is_installed("lcmm")) { method <- lcMethodLcmmGMM( fixed = Y ~ Time, mixture = ~ Time, random = ~ 1, id = "Id", time = "Time", nClusters = 2 ) gmm <- latrend(method, data = latrendData) summary(gmm) # define method with gridsearch method <- lcMethodLcmmGMM( fixed = Y ~ Time, mixture = ~ Time, random = ~ 1, id = "Id", time = "Time", nClusters = 3, init = "gridsearch", gridsearch.maxiter = 10, gridsearch.rep = 50, gridsearch.parallel = TRUE ) }
data(latrendData) if (rlang::is_installed("lcmm")) { method <- lcMethodLcmmGMM( fixed = Y ~ Time, mixture = ~ Time, random = ~ 1, id = "Id", time = "Time", nClusters = 2 ) gmm <- latrend(method, data = latrendData) summary(gmm) # define method with gridsearch method <- lcMethodLcmmGMM( fixed = Y ~ Time, mixture = ~ Time, random = ~ 1, id = "Id", time = "Time", nClusters = 3, init = "gridsearch", gridsearch.maxiter = 10, gridsearch.rep = 50, gridsearch.parallel = TRUE ) }
Two-step clustering through linear regression modeling and k-means
lcMethodLMKM( formula, time = getOption("latrend.time"), id = getOption("latrend.id"), nClusters = 2, center = meanNA, standardize = scale, ... )
lcMethodLMKM( formula, time = getOption("latrend.time"), id = getOption("latrend.id"), nClusters = 2, center = meanNA, standardize = scale, ... )
formula |
A |
time |
The name of the time variable. |
id |
The name of the trajectory identification variable. |
nClusters |
The number of clusters to estimate. |
center |
A |
standardize |
A |
... |
Arguments passed to stats::lm. The following external arguments are ignored: x, data, control, centers, trace. |
Other lcMethod implementations:
getArgumentDefaults()
,
getArgumentExclusions()
,
lcMethod-class
,
lcMethodAkmedoids
,
lcMethodCrimCV
,
lcMethodDtwclust
,
lcMethodFeature
,
lcMethodFunFEM
,
lcMethodFunction
,
lcMethodGCKM
,
lcMethodKML
,
lcMethodLcmmGBTM
,
lcMethodLcmmGMM
,
lcMethodMclustLLPA
,
lcMethodMixAK_GLMM
,
lcMethodMixtoolsGMM
,
lcMethodMixtoolsNPRM
,
lcMethodRandom
,
lcMethodStratify
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time", nClusters = 3) model <- latrend(method, latrendData)
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time", nClusters = 3) model <- latrend(method, latrendData)
Latent profile analysis or finite Gaussian mixture modeling.
lcMethodMclustLLPA( response, time = getOption("latrend.time"), id = getOption("latrend.id"), nClusters = 2, ... )
lcMethodMclustLLPA( response, time = getOption("latrend.time"), id = getOption("latrend.id"), nClusters = 2, ... )
response |
The name of the response variable. |
time |
The name of the time variable. |
id |
The name of the trajectory identifier variable. |
nClusters |
The number of clusters to estimate. |
... |
Arguments passed to mclust::Mclust. The following external arguments are ignored: data, G, verbose. |
Scrucca L, Fop M, Murphy TB, Raftery AE (2016). “mclust 5: clustering, classification and density estimation using Gaussian finite mixture models.” The R Journal, 8(1), 205–233.
Other lcMethod implementations:
getArgumentDefaults()
,
getArgumentExclusions()
,
lcMethod-class
,
lcMethodAkmedoids
,
lcMethodCrimCV
,
lcMethodDtwclust
,
lcMethodFeature
,
lcMethodFunFEM
,
lcMethodFunction
,
lcMethodGCKM
,
lcMethodKML
,
lcMethodLMKM
,
lcMethodLcmmGBTM
,
lcMethodLcmmGMM
,
lcMethodMixAK_GLMM
,
lcMethodMixtoolsGMM
,
lcMethodMixtoolsNPRM
,
lcMethodRandom
,
lcMethodStratify
data(latrendData) if (require("mclust")) { method <- lcMethodMclustLLPA("Y", id = "Id", time = "Time", nClusters = 3) model <- latrend(method, latrendData) }
data(latrendData) if (require("mclust")) { method <- lcMethodMclustLLPA("Y", id = "Id", time = "Time", nClusters = 3) model <- latrend(method, latrendData) }
Specify a GLMM iwht a normal mixture in the random effects
lcMethodMixAK_GLMM( fixed, random, time = getOption("latrend.time"), id = getOption("latrend.id"), nClusters = 2, ... )
lcMethodMixAK_GLMM( fixed, random, time = getOption("latrend.time"), id = getOption("latrend.id"), nClusters = 2, ... )
fixed |
A |
random |
A |
time |
The name of the time variable. |
id |
The name of the trajectory identifier variable. This is used to generate the |
nClusters |
The number of clusters. |
... |
Arguments passed to mixAK::GLMM_MCMC. The following external arguments are ignored: y, x, z, random.intercept, silent. |
This method currently does not appear to work under R 4.2 due to an error triggered by the mixAK package during fitting.
Komárek A (2009). “A New R Package for Bayesian Estimation of Multivariate Normal Mixtures Allowing for Selection of the Number of Components and Interval-Censored Data.” Computational Statistics and Data Analysis, 53(12), 3932–3947. doi:10.1016/j.csda.2009.05.006.
Other lcMethod implementations:
getArgumentDefaults()
,
getArgumentExclusions()
,
lcMethod-class
,
lcMethodAkmedoids
,
lcMethodCrimCV
,
lcMethodDtwclust
,
lcMethodFeature
,
lcMethodFunFEM
,
lcMethodFunction
,
lcMethodGCKM
,
lcMethodKML
,
lcMethodLMKM
,
lcMethodLcmmGBTM
,
lcMethodLcmmGMM
,
lcMethodMclustLLPA
,
lcMethodMixtoolsGMM
,
lcMethodMixtoolsNPRM
,
lcMethodRandom
,
lcMethodStratify
data(latrendData) # this example only runs when the mixAK package is installed try({ method <- lcMethodMixAK_GLMM(fixed = Y ~ 1, random = ~ Time, id = "Id", time = "Time", nClusters = 3) model <- latrend(method, latrendData) summary(model) })
data(latrendData) # this example only runs when the mixAK package is installed try({ method <- lcMethodMixAK_GLMM(fixed = Y ~ 1, random = ~ Time, id = "Id", time = "Time", nClusters = 3) model <- latrend(method, latrendData) summary(model) })
Specify mixed mixture regression model using mixtools
lcMethodMixtoolsGMM( formula, time = getOption("latrend.time"), id = getOption("latrend.id"), nClusters = 2, ... )
lcMethodMixtoolsGMM( formula, time = getOption("latrend.time"), id = getOption("latrend.id"), nClusters = 2, ... )
formula |
Formula, including a random effects component for the trajectory. See lme4::lmer formula syntax. |
time |
The name of the time variable.. |
id |
The name of the trajectory identifier variable. |
nClusters |
The number of clusters. |
... |
Arguments passed to mixtools::regmixEM.mixed. The following arguments are ignored: data, y, x, w, k, addintercept.fixed, verb. |
Benaglia T, Chauveau D, Hunter DR, Young D (2009). “mixtools: An R Package for Analyzing Finite Mixture Models.” Journal of Statistical Software, 32(6), 1–29. doi:10.18637/jss.v032.i06.
Other lcMethod implementations:
getArgumentDefaults()
,
getArgumentExclusions()
,
lcMethod-class
,
lcMethodAkmedoids
,
lcMethodCrimCV
,
lcMethodDtwclust
,
lcMethodFeature
,
lcMethodFunFEM
,
lcMethodFunction
,
lcMethodGCKM
,
lcMethodKML
,
lcMethodLMKM
,
lcMethodLcmmGBTM
,
lcMethodLcmmGMM
,
lcMethodMclustLLPA
,
lcMethodMixAK_GLMM
,
lcMethodMixtoolsNPRM
,
lcMethodRandom
,
lcMethodStratify
data(latrendData) if (require("mixtools")) { method <- lcMethodMixtoolsGMM( formula = Y ~ Time + (1 | Id), id = "Id", time = "Time", nClusters = 3, arb.R = FALSE ) }
data(latrendData) if (require("mixtools")) { method <- lcMethodMixtoolsGMM( formula = Y ~ Time + (1 | Id), id = "Id", time = "Time", nClusters = 3, arb.R = FALSE ) }
Specify non-parametric estimation for independent repeated measures
lcMethodMixtoolsNPRM( response, time = getOption("latrend.time"), id = getOption("latrend.id"), nClusters = 2, blockid = NULL, bw = NULL, h = NULL, ... )
lcMethodMixtoolsNPRM( response, time = getOption("latrend.time"), id = getOption("latrend.id"), nClusters = 2, blockid = NULL, bw = NULL, h = NULL, ... )
response |
The name of the response variable. |
time |
The name of the time variable. |
id |
The name of the trajectory identifier variable. |
nClusters |
The number of clusters to estimate. |
blockid |
See mixtools::npEM. |
bw |
See mixtools::npEM. |
h |
See mixtools::npEM. |
... |
Arguments passed to mixtools::npEM. The following optional arguments are ignored: data, x, mu0, verb. |
Benaglia T, Chauveau D, Hunter DR, Young D (2009). “mixtools: An R Package for Analyzing Finite Mixture Models.” Journal of Statistical Software, 32(6), 1–29. doi:10.18637/jss.v032.i06.
Other lcMethod implementations:
getArgumentDefaults()
,
getArgumentExclusions()
,
lcMethod-class
,
lcMethodAkmedoids
,
lcMethodCrimCV
,
lcMethodDtwclust
,
lcMethodFeature
,
lcMethodFunFEM
,
lcMethodFunction
,
lcMethodGCKM
,
lcMethodKML
,
lcMethodLMKM
,
lcMethodLcmmGBTM
,
lcMethodLcmmGMM
,
lcMethodMclustLLPA
,
lcMethodMixAK_GLMM
,
lcMethodMixtoolsGMM
,
lcMethodRandom
,
lcMethodStratify
data(latrendData) if (require("mixtools")) { method <- lcMethodMixtoolsNPRM("Y", id = "Id", time = "Time", nClusters = 3) model <- latrend(method, latrendData) }
data(latrendData) if (require("mixtools")) { method <- lcMethodMixtoolsNPRM("Y", id = "Id", time = "Time", nClusters = 3) model <- latrend(method, latrendData) }
Specify a MixTVEM
lcMethodMixTVEM( formula, formula.mb = ~1, time = getOption("latrend.time"), id = getOption("latrend.id"), nClusters = 2, ... )
lcMethodMixTVEM( formula, formula.mb = ~1, time = getOption("latrend.time"), id = getOption("latrend.id"), nClusters = 2, ... )
formula |
A |
formula.mb |
A |
time |
The name of the time variable. |
id |
The name of the trajectory identifier variable. |
nClusters |
The number of clusters. This replaces the |
... |
Arguments passed to the |
In order to use this method, you must download and source MixTVEM.R. See the reference below.
https://github.com/dziakj1/MixTVEM
Dziak JJ, Li R, Tan X, Shiffman S, Shiyko MP (2015). “Modeling intensive longitudinal data with mixtures of nonparametric trajectories and time-varying effects.” Psychological Methods, 20(4), 444–469. ISSN 1939-1463.
# this example only runs if you download and place MixTVEM.R in your wd try({ source("MixTVEM.R") method = lcMethodMixTVEM( Value ~ time(1) - 1, time = 'Assessment', id = "Id", nClusters = 3 ) })
# this example only runs if you download and place MixTVEM.R in your wd try({ source("MixTVEM.R") method = lcMethodMixTVEM( Value ~ time(1) - 1, time = 'Assessment', id = "Id", nClusters = 3 ) })
Creates a model with random cluster assignments according to the random cluster proportions drawn from a Dirichlet distribution.
lcMethodRandom( response, alpha = 10, center = meanNA, time = getOption("latrend.time"), id = getOption("latrend.id"), nClusters = 2, name = "random", ... )
lcMethodRandom( response, alpha = 10, center = meanNA, time = getOption("latrend.time"), id = getOption("latrend.id"), nClusters = 2, name = "random", ... )
response |
The name of the response variable. |
alpha |
The Dirichlet parameters. Either |
center |
Optional |
time |
The name of the time variable. |
id |
The name of the trajectory identification variable. |
nClusters |
The number of clusters. |
name |
The name of the method. |
... |
Additional arguments, such as the seed. |
Frigyik BA, Kapila A, Gupta MR (2010). “Introduction to the Dirichlet distribution and related processes.” Technical Report UWEETR-2010-0006, Department of Electrical Engineering, University of Washington.
Other lcMethod implementations:
getArgumentDefaults()
,
getArgumentExclusions()
,
lcMethod-class
,
lcMethodAkmedoids
,
lcMethodCrimCV
,
lcMethodDtwclust
,
lcMethodFeature
,
lcMethodFunFEM
,
lcMethodFunction
,
lcMethodGCKM
,
lcMethodKML
,
lcMethodLMKM
,
lcMethodLcmmGBTM
,
lcMethodLcmmGMM
,
lcMethodMclustLLPA
,
lcMethodMixAK_GLMM
,
lcMethodMixtoolsGMM
,
lcMethodMixtoolsNPRM
,
lcMethodStratify
data(latrendData) method <- lcMethodRandom(response = "Y", id = "Id", time = "Time") model <- latrend(method, latrendData) # uniform clusters method <- lcMethodRandom( alpha = 1e3, nClusters = 3, response = "Y", id = "Id", time = "Time" ) # single large cluster method <- lcMethodRandom( alpha = c(100, 1, 1, 1), nClusters = 4, response = "Y", id = "Id", time = "Time" )
data(latrendData) method <- lcMethodRandom(response = "Y", id = "Id", time = "Time") model <- latrend(method, latrendData) # uniform clusters method <- lcMethodRandom( alpha = 1e3, nClusters = 3, response = "Y", id = "Id", time = "Time" ) # single large cluster method <- lcMethodRandom( alpha = c(100, 1, 1, 1), nClusters = 4, response = "Y", id = "Id", time = "Time" )
Generates a list of lcMethod
objects for all combinations of the provided argument values.
lcMethods(method, ..., envir = NULL)
lcMethods(method, ..., envir = NULL)
method |
The |
... |
Any other arguments to update the |
envir |
The |
A list
of lcMethod
objects.
data(latrendData) baseMethod <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") methods <- lcMethods(baseMethod, nClusters = 1:6) nclus <- 1:6 methods <- lcMethods(baseMethod, nClusters = nclus) # list notation, useful for providing functions methods <- lcMethods(baseMethod, nClusters = .(1, 3, 5)) length(methods) # 3
data(latrendData) baseMethod <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") methods <- lcMethods(baseMethod, nClusters = 1:6) nclus <- 1:6 methods <- lcMethods(baseMethod, nClusters = nclus) # list notation, useful for providing functions methods <- lcMethods(baseMethod, nClusters = .(1, 3, 5)) length(methods) # 3
Specify a stratification method
lcMethodStratify( response, stratify, center = meanNA, nClusters = NaN, clusterNames = NULL, time = getOption("latrend.time"), id = getOption("latrend.id"), name = "stratify" )
lcMethodStratify( response, stratify, center = meanNA, nClusters = NaN, clusterNames = NULL, time = getOption("latrend.time"), id = getOption("latrend.id"), name = "stratify" )
response |
The name of the response variable. |
stratify |
An |
center |
The |
nClusters |
The number of clusters. This is optional, as this can be derived from the largest assignment number by default, or the number of |
clusterNames |
The names of the clusters. If a |
time |
The name of the time variable. |
id |
The name of the trajectory identification variable. |
name |
The name of the method. |
Other lcMethod implementations:
getArgumentDefaults()
,
getArgumentExclusions()
,
lcMethod-class
,
lcMethodAkmedoids
,
lcMethodCrimCV
,
lcMethodDtwclust
,
lcMethodFeature
,
lcMethodFunFEM
,
lcMethodFunction
,
lcMethodGCKM
,
lcMethodKML
,
lcMethodLMKM
,
lcMethodLcmmGBTM
,
lcMethodLcmmGMM
,
lcMethodMclustLLPA
,
lcMethodMixAK_GLMM
,
lcMethodMixtoolsGMM
,
lcMethodMixtoolsNPRM
,
lcMethodRandom
data(latrendData) # Stratification based on the mean response level method <- lcMethodStratify( "Y", mean(Y) > 0, clusterNames = c("Low", "High"), id = "Id", time = "Time" ) model <- latrend(method, latrendData) summary(model) # Stratification function stratfun <- function(trajdata) { trajmean <- mean(trajdata$Y) factor( trajmean > 1.7, levels = c(FALSE, TRUE), labels = c("Low", "High") ) } method <- lcMethodStratify("Y", stratfun, id = "Id", time = "Time") # Multiple clusters stratfun3 <- function(trajdata) { trajmean <- mean(trajdata$Y) cut( trajmean, c(-Inf, .5, 2, Inf), labels = c("Low", "Medium", "High") ) } method <- lcMethodStratify("Y", stratfun3, id = "Id", time = "Time")
data(latrendData) # Stratification based on the mean response level method <- lcMethodStratify( "Y", mean(Y) > 0, clusterNames = c("Low", "High"), id = "Id", time = "Time" ) model <- latrend(method, latrendData) summary(model) # Stratification function stratfun <- function(trajdata) { trajmean <- mean(trajdata$Y) factor( trajmean > 1.7, levels = c(FALSE, TRUE), labels = c("Low", "High") ) } method <- lcMethodStratify("Y", stratfun, id = "Id", time = "Time") # Multiple clusters stratfun3 <- function(trajdata) { trajmean <- mean(trajdata$Y) cut( trajmean, c(-Inf, .5, 2, Inf), labels = c("Low", "Medium", "High") ) } method <- lcMethodStratify("Y", stratfun3, id = "Id", time = "Time")
lcModel
)A longitudinal cluster model ([lcModel][lcModel-class]
) describes the clustered representation of a certain longitudinal dataset.
A lcModel
is obtained by estimating a specified longitudinal cluster method on a longitudinal dataset.
The estimation is done via one of the latrend estimation functions.
A longitudinal cluster result represents the dataset in terms of a partitioning of the trajectories into a number of clusters.
The trajectoryAssignments()
function outputs the most likely membership for the respective trajectories.
Each cluster has a longitudinal representation, obtained via clusterTrajectories()
, and can be plotted via plotClusterTrajectories()
.
Clusters and partitioning:
nClusters()
: The number of clusters this model represents.
clusterNames()
: The names of the clusters.
clusterSizes()
: The respective number of trajectories assigned to each cluster.
clusterProportions()
: The respective proportional size of each cluster.
trajectoryAssignments()
: The most likely cluster membership of each trajectory.
postprob()
: The posterior probability of each trajectory to each cluster.
Longitudinal cluster representation (i.e., trends):
clusterTrajectories()
: A data.frame
containing the longitudinal representation of each cluster.
plotClusterTrajectories()
: Plots the longitudinal representation of each cluster.
fittedTrajectories()
: A data.frame
containing the longitudinal representation of each trajectory. For many methods, this is the cluster center.
plotFittedTrajectories()
: Plot the trajectory representation.
Training data:
nIds()
: The number of trajectories used for estimation.
ids()
: A vector of identifiers of the trajectories that were used for estimation.
nobs()
: The number of observations used for estimation, across trajectories.
time()
: Moments in time on which observations are present.
trajectories()
: The trajectories that were used for estimation.
plotTrajectories()
: Plot the trajectories that were used for estimation.
Model evaluation:
summary()
: Obtain a summary of the model.
metric()
: Compute an internal metric.
externalMetric()
: Compute an external metric in relation to a second lcModel
.
converged()
: Whether the estimation procedure converged.
estimationTime()
: Total time that was needed for the fitting steps.
sigma()
: Residual error scale.
qqPlot()
: QQ plot of the model residuals.
Model prediction:
predictForCluster()
: Cluster-specific prediction on new data. Not supported for all methods.
predictPostprob()
: Predict posterior probability for new data. Not supported for all methods.
predictAssignments()
: Predict cluster membership for new data. Not supported for all methods.
Other functionality:
getLcMethod()
: Get the method specification by which this model was estimated.
update()
: Retrain a model with altered method arguments.
strip()
: Removes non-essential (meta) data and environments from the model to facilitate efficient serialization.
data(latrendData) # define the method method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") # estimate the method, giving the model model <- latrend(method, data = latrendData) if (require("ggplot2")) { plotClusterTrajectories(model) }
data(latrendData) # define the method method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") # estimate the method, giving the model model <- latrend(method, data = latrendData) if (require("ggplot2")) { plotClusterTrajectories(model) }
lcModel
classAbstract class for defining estimated longitudinal cluster models.
object |
The |
... |
Any additional arguments. |
An extending class must implement the following methods to ensure basic functionality:
predict.lcModelExt
: Used to obtain the fitted cluster trajectories and trajectories.
postprob(lcModelExt)
: The posterior probability matrix is used to determine the cluster assignments of the trajectories.
For predicting the posterior probability for unseen data, the predictPostprob()
should be implemented.
method
The lcMethod-class object specifying the arguments under which the model was fitted.
call
The call
that was used to create this lcModel
object. Typically, this is the call to latrend()
or any of the other fitting functions.
model
An arbitrary underlying model representation.
data
A data.frame
object, or an expression to resolves to the data.frame
object.
date
The date-time when the model estimation was initiated.
id
The name of the trajectory identifier column.
time
The name of the time variable.
response
The name of the response variable.
label
The label assigned to this model.
ids
The trajectory identifier values the model was fitted on.
times
The exact times on which the model has been trained
clusterNames
The names of the clusters.
estimationTime
The time, in seconds, that it took to fit the model.
tag
An arbitrary user-specified data structure. This slot may be accessed and updated directly.
Other lcModel functions:
clusterNames()
,
clusterProportions()
,
clusterSizes()
,
clusterTrajectories()
,
coef.lcModel()
,
converged()
,
deviance.lcModel()
,
df.residual.lcModel()
,
estimationTime()
,
externalMetric()
,
fitted.lcModel()
,
fittedTrajectories()
,
getCall.lcModel()
,
getLcMethod()
,
ids()
,
metric()
,
model.frame.lcModel()
,
nClusters()
,
nIds()
,
nobs.lcModel()
,
plot-lcModel-method
,
plotClusterTrajectories()
,
plotFittedTrajectories()
,
postprob()
,
predict.lcModel()
,
predictAssignments()
,
predictForCluster()
,
predictPostprob()
,
qqPlot()
,
residuals.lcModel()
,
sigma.lcModel()
,
strip()
,
time.lcModel()
,
trajectoryAssignments()
Represents an arbitrary partitioning of a set of trajectories. As such, this model has no predictive capabilities. The cluster trajectories are represented by the specified center function (mean by default).
lcModelPartition( data, response, trajectoryAssignments, nClusters = NA, clusterNames = character(), time = getOption("latrend.time"), id = getOption("latrend.id"), name = "part", center = meanNA, method = NULL, converged = TRUE, model = NULL, envir = parent.frame() )
lcModelPartition( data, response, trajectoryAssignments, nClusters = NA, clusterNames = character(), time = getOption("latrend.time"), id = getOption("latrend.id"), name = "part", center = meanNA, method = NULL, converged = TRUE, model = NULL, envir = parent.frame() )
data |
A |
response |
The name of the response variable. |
trajectoryAssignments |
A |
nClusters |
The number of clusters. Should be |
clusterNames |
The names of the clusters, or a function with input |
time |
The name of the time variable. |
id |
The name of the trajectory identification variable. |
name |
The name of the method. |
center |
The |
method |
Optional |
converged |
Set the converged state. |
model |
An optional object to attach to the |
envir |
The |
# comparing a model to the ground truth using the adjusted Rand index data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData, nClusters = 3) # extract the reference class from the Class column trajLabels <- aggregate(Class ~ Id, head, 1, data = latrendData) trajLabels$Cluster <- trajLabels$Class refModel <- lcModelPartition(latrendData, response = "Y", trajectoryAssignments = trajLabels) if (require("mclustcomp")) { externalMetric(model, refModel, "adjustedRand") }
# comparing a model to the ground truth using the adjusted Rand index data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData, nClusters = 3) # extract the reference class from the Class column trajLabels <- aggregate(Class ~ Id, head, 1, data = latrendData) trajLabels$Cluster <- trajLabels$Class refModel <- lcModelPartition(latrendData, response = "Y", trajectoryAssignments = trajLabels) if (require("mclustcomp")) { externalMetric(model, refModel, "adjustedRand") }
lcModel
objectsA general overview of the lcModels class can be found here.
The lcModels()
function creates a flat (named) list of lcModel
objects. Duplicates are preserved.
lcModels(...)
lcModels(...)
... |
|
A lcModels
object containing all specified lcModel
objects.
Print an argument summary for each of the models.
Convert to a data.frame
of method arguments.
Subset the list.
Compute an internal metric or external metric.
Obtain the best model according to minimizing or maximizing a metric.
Obtain the summed estimation time.
Plot a metric across a variable.
Other lcModels functions:
as.lcModels()
,
lcModels-class
,
max.lcModels()
,
min.lcModels()
,
plotMetric()
,
print.lcModels()
,
subset.lcModels()
lmkmMethod <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") lmkmModel <- latrend(lmkmMethod, latrendData) rngMethod <- lcMethodRandom("Y", id = "Id", time = "Time") rngModel <- latrend(rngMethod, latrendData) lcModels(lmkmModel, rngModel) lcModels(defaults = c(lmkmModel, rngModel))
lmkmMethod <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") lmkmModel <- latrend(lmkmMethod, latrendData) rngMethod <- lcMethodRandom("Y", id = "Id", time = "Time") rngModel <- latrend(rngMethod, latrendData) lcModels(lmkmModel, rngModel) lcModels(defaults = c(lmkmModel, rngModel))
lcModels
: a list of lcModel
objectsThe lcModels
S3
class represents a list
of one or more lcModel
objects.
This makes it easier to work with a collection of models in a more structured manner.
A list of models is outputted from the repeated estimation functions such as latrendRep()
, latrendBatch()
, and others.
You can construct a list of models using the lcModels()
function.
Print an argument summary for each of the models.
Convert to a data.frame
of method arguments.
Subset the list.
Compute an internal metric or external metric.
Obtain the best model according to minimizing or maximizing a metric.
Obtain the summed estimation time.
Plot a metric across a variable.
Other lcModels functions:
as.lcModels()
,
lcModels
,
max.lcModels()
,
min.lcModels()
,
plotMetric()
,
print.lcModels()
,
subset.lcModels()
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") models <- latrendRep(method, data = latrendData, .rep = 5) # 5 repeated runs bestModel <- min(models, "MAE")
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") models <- latrendRep(method, data = latrendData, .rep = 5) # 5 repeated runs bestModel <- min(models, "MAE")
Create a lcModel with pre-defined weighted partitioning
lcModelWeightedPartition( data, response, weights, clusterNames = colnames(weights), time = getOption("latrend.time"), id = getOption("latrend.id"), name = "wpart" )
lcModelWeightedPartition( data, response, weights, clusterNames = colnames(weights), time = getOption("latrend.time"), id = getOption("latrend.id"), name = "wpart" )
data |
A |
response |
The name of the response variable. |
weights |
A |
clusterNames |
The names of the clusters, or a function with input |
time |
The name of the time variable. |
id |
The name of the trajectory identification variable. |
name |
The name of the method. |
Extract the log-likelihood of a lcModel
## S3 method for class 'lcModel' logLik(object, ...)
## S3 method for class 'lcModel' logLik(object, ...)
object |
The |
... |
Additional arguments. |
The default implementation checks for the existence of the logLik()
function for the internal model, and returns the output, if available.
A numeric
with the computed log-likelihood. If unavailable, NA
is returned.
data(latrendData) if (rlang::is_installed("lcmm")) { method <- lcMethodLcmmGBTM( fixed = Y ~ Time, mixture = ~ 1, id = "Id", time = "Time", nClusters = 3 ) gbtm <- latrend(method, data = latrendData) logLik(gbtm) }
data(latrendData) if (rlang::is_installed("lcmm")) { method <- lcMethodLcmmGBTM( fixed = Y ~ Time, mixture = ~ 1, id = "Id", time = "Time", nClusters = 3 ) gbtm <- latrend(method, data = latrendData) logLik(gbtm) }
Select the lcModel with the highest metric value
## S3 method for class 'lcModels' max(x, name, ...)
## S3 method for class 'lcModels' max(x, name, ...)
x |
The |
name |
The name of the internal metric. |
... |
Additional arguments. |
The lcModel with the highest metric value
Print an argument summary for each of the models.
Convert to a data.frame
of method arguments.
Subset the list.
Compute an internal metric or external metric.
Obtain the best model according to minimizing or maximizing a metric.
Obtain the summed estimation time.
Plot a metric across a variable.
Other lcModels functions:
as.lcModels()
,
lcModels
,
lcModels-class
,
min.lcModels()
,
plotMetric()
,
print.lcModels()
,
subset.lcModels()
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model1 <- latrend(method, latrendData, nClusters = 1) model2 <- latrend(method, latrendData, nClusters = 2) model3 <- latrend(method, latrendData, nClusters = 3) models <- lcModels(model1, model2, model3) if (require("clusterCrit")) { max(models, "Dunn") }
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model1 <- latrend(method, latrendData, nClusters = 1) model2 <- latrend(method, latrendData, nClusters = 2) model3 <- latrend(method, latrendData, nClusters = 3) models <- lcModels(model1, model2, model3) if (require("clusterCrit")) { max(models, "Dunn") }
Compute one or more internal metrics for the given lcModel
object.
Note that there are many metrics available, and there exists no metric that works best in all scenarios. It is recommended to carefully consider which metric is most appropriate for your use case.
Recommended overview papers:
Arbelaitz et al. (2013) provide an extensive overview validity indices for cluster algorithms.
van der Nest et al. (2020) provide an overview of metrics for mixture models (GBTM, GMM); primarily likelihood-based or posterior probability-based metrics.
Henson et al. (2007) provide an overview of likelihood-based metrics for mixture models.
Call getInternalMetricNames()
to retrieve the names of the defined internal metrics.
See the Details section below for a list of supported metrics.
metric(object, name = getOption("latrend.metric", c("WRSS", "APPA.mean")), ...) ## S4 method for signature 'lcModel' metric(object, name = getOption("latrend.metric", c("WRSS", "APPA.mean")), ...) ## S4 method for signature 'list' metric(object, name, drop = TRUE) ## S4 method for signature 'lcModels' metric(object, name, drop = TRUE)
metric(object, name = getOption("latrend.metric", c("WRSS", "APPA.mean")), ...) ## S4 method for signature 'lcModel' metric(object, name = getOption("latrend.metric", c("WRSS", "APPA.mean")), ...) ## S4 method for signature 'list' metric(object, name, drop = TRUE) ## S4 method for signature 'lcModels' metric(object, name, drop = TRUE)
object |
The |
name |
The name(s) of the metric(s) to compute. If no names are given, the names specified in the |
... |
Additional arguments. |
drop |
Whether to return a |
For metric(lcModel)
: A named numeric
vector with the computed model metrics.
For metric(list)
: A data.frame
with a metric per column.
For metric(lcModels)
: A data.frame
with a metric per column.
Metric name | Description | Function / Reference |
AIC |
Akaike information criterion. A goodness-of-fit estimator that adjusts for model complexity (i.e., the number of parameters). Only available for models that support the computation of the model log-likelihood through logLik. | stats::AIC() , (Akaike 1974) |
APPA.mean |
Mean of the average posterior probability of assignment (APPA) across clusters. A measure of the precision of the trajectory classifications. A score of 1 indicates perfect classification. | APPA() , (Nagin 2005) |
APPA.min |
Lowest APPA among the clusters | APPA() , (Nagin 2005) |
ASW |
Average silhouette width based on the Euclidean distance | (Rousseeuw 1987) |
BIC |
Bayesian information criterion. A goodness-of-fit estimator that corrects for the degrees of freedom (i.e., the number of parameters) and sample size. Only available for models that support the computation of the model log-likelihood through logLik. | stats::BIC() , (Schwarz 1978) |
CAIC |
Consistent Akaike information criterion | (Bozdogan 1987) |
CLC |
Classification likelihood criterion | (McLachlan and Peel 2000) |
converged |
Whether the model converged during estimation | converged() |
deviance |
The model deviance | stats::deviance() |
Dunn |
The Dunn index | (Dunn 1974) |
entropy |
Entropy of the posterior probabilities | |
estimationTime |
The time needed for fitting the model | estimationTime() |
ED |
Euclidean distance between the cluster trajectories and the assigned observed trajectories | |
ED.fit |
Euclidean distance between the cluster trajectories and the assigned fitted trajectories | |
ICL.BIC |
Integrated classification likelihood (ICL) approximated using the BIC | (Biernacki et al. 2000) |
logLik |
Model log-likelihood | stats::logLik() |
MAE |
Mean absolute error of the fitted trajectories (assigned to the most likely respective cluster) to the observed trajectories | |
Mahalanobis |
Mahalanobis distance between the cluster trajectories and the assigned observed trajectories | (Mahalanobis 1936) |
MSE |
Mean squared error of the fitted trajectories (assigned to the most likely respective cluster) to the observed trajectories | |
relativeEntropy , RE |
A measure of the precision of the trajectory classification. A value of 1 indicates perfect classification, whereas a value of 0 indicates a non-informative uniform classification. It is the normalized version of entropy , scaled between [0, 1]. |
(Ramaswamy et al. 1993), (Muthén 2004) |
RMSE |
Root mean squared error of the fitted trajectories (assigned to the most likely respective cluster) to the observed trajectories | |
RSS |
Residual sum of squares under most likely cluster allocation | |
scaledEntropy |
See relativeEntropy |
|
sigma |
The residual standard deviation | stats::sigma() |
ssBIC |
Sample-size adjusted BIC | (Sclove 1987) |
SED |
Standardized Euclidean distance between the cluster trajectories and the assigned observed trajectories | |
SED.fit |
The cluster-weighted standardized Euclidean distance between the cluster trajectories and the assigned fitted trajectories | |
WMAE |
MAE weighted by cluster-assignment probability |
|
WMSE |
MSE weighted by cluster-assignment probability |
|
WRMSE |
RMSE weighted by cluster-assignment probability |
|
WRSS |
RSS weighted by cluster-assignment probability |
|
See the documentation of the defineInternalMetric()
function for details on how to define your own metrics.
Akaike H (1974).
“A new look at the statistical model identification.”
IEEE Transactions on Automatic Control, 19(6), 716-723.
doi:10.1109/TAC.1974.1100705.
Arbelaitz O, Gurrutxaga I, Muguerza J, Pérez JM, Perona I (2013).
“An extensive comparative study of cluster validity indices.”
Pattern recognition, 46(1), 243–256.
ISSN 0031-3203, doi:10.1016/j.patcog.2012.07.021.
Biernacki C, Celeux G, Govaert G (2000).
“Assessing a mixture model for clustering with the integrated completed likelihood.”
IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(7), 719-725.
doi:10.1109/34.865189.
Bozdogan H (1987).
“Model Selection and Akaike's Information Criterion (AIC): The General Theory and Its Analytical Extensions.”
Psychometrika, 52, 345–370.
doi:10.1007/BF02294361.
Dunn JC (1974).
“Well-Separated Clusters and Optimal Fuzzy Partitions.”
Journal of Cybernetics, 4(1), 95-104.
doi:10.1080/01969727408546059.
Henson JM, Reise SP, Kim KH (2007).
“Detecting Mixtures From Structural Model Differences Using Latent Variable Mixture Modeling: A Comparison of Relative Model Fit Statistics.”
Structural Equation Modeling: A Multidisciplinary Journal, 14(2), 202–226.
doi:10.1080/10705510709336744.
Mahalanobis PC (1936).
“On the generalized distance in statistics.”
Proceedings of the National Institute of Sciences (Calcutta), 2(1), 49–55.
McLachlan G, Peel D (2000).
Finite Mixture Models.
John Wiley & Sons, Inc.
ISBN 9780471006268.
Muthén B (2004).
“Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data.”
In The SAGE Handbook of Quantitative Methodology for the Social Sciences, 346–369.
SAGE Publications, Inc.
doi:10.4135/9781412986311.n19.
Nagin DS (2005).
Group-based modeling of development.
Harvard University Press.
ISBN 9780674041318, doi:10.4159/9780674041318.
Ramaswamy V, Desarbo W, Reibstein D, Robinson W (1993).
“An Empirical Pooling Approach for Estimating Marketing Mix Elasticities with PIMS Data.”
Marketing Science, 12(1), 103-124.
doi:10.1287/mksc.12.1.103.
Rousseeuw PJ (1987).
“Silhouettes: A graphical aid to the interpretation and validation of cluster analysis.”
Journal of Computational and Applied Mathematics, 20, 53-65.
ISSN 0377-0427, doi:10.1016/0377-0427(87)90125-7.
Schwarz G (1978).
“Estimating the Dimension of a Model.”
The Annals of Statistics, 6(2), 461 – 464.
Sclove SL (1987).
“Application of model-selection criteria to some problems in multivariate analysis.”
Psychometrika, 52(3), 333–343.
doi:10.1007/BF02294360.
van der Nest G, Lima Passos V, Candel MJ, van Breukelen GJ (2020).
“An overview of mixture modelling for latent evolutions in longitudinal data: Modelling approaches, fit statistics and software.”
Advances in Life Course Research, 43, 100323.
ISSN 1040-2608, doi:10.1016/j.alcr.2019.100323.
externalMetric min.lcModels max.lcModels
Other metric functions:
defineExternalMetric()
,
defineInternalMetric()
,
externalMetric()
,
getExternalMetricDefinition()
,
getExternalMetricNames()
,
getInternalMetricDefinition()
,
getInternalMetricNames()
Other lcModel functions:
clusterNames()
,
clusterProportions()
,
clusterSizes()
,
clusterTrajectories()
,
coef.lcModel()
,
converged()
,
deviance.lcModel()
,
df.residual.lcModel()
,
estimationTime()
,
externalMetric()
,
fitted.lcModel()
,
fittedTrajectories()
,
getCall.lcModel()
,
getLcMethod()
,
ids()
,
lcModel-class
,
model.frame.lcModel()
,
nClusters()
,
nIds()
,
nobs.lcModel()
,
plot-lcModel-method
,
plotClusterTrajectories()
,
plotFittedTrajectories()
,
postprob()
,
predict.lcModel()
,
predictAssignments()
,
predictForCluster()
,
predictPostprob()
,
qqPlot()
,
residuals.lcModel()
,
sigma.lcModel()
,
strip()
,
time.lcModel()
,
trajectoryAssignments()
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData) metric(model, "WMAE") if (require("clusterCrit")) { metric(model, c("WMAE", "Dunn")) }
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData) metric(model, "WMAE") if (require("clusterCrit")) { metric(model, c("WMAE", "Dunn")) }
Select the lcModel with the lowest metric value
## S3 method for class 'lcModels' min(x, name, ...)
## S3 method for class 'lcModels' min(x, name, ...)
x |
The |
name |
The name of the internal metric. |
... |
Additional arguments. |
The lcModel with the lowest metric value
Print an argument summary for each of the models.
Convert to a data.frame
of method arguments.
Subset the list.
Compute an internal metric or external metric.
Obtain the best model according to minimizing or maximizing a metric.
Obtain the summed estimation time.
Plot a metric across a variable.
Other lcModels functions:
as.lcModels()
,
lcModels
,
lcModels-class
,
max.lcModels()
,
plotMetric()
,
print.lcModels()
,
subset.lcModels()
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model1 <- latrend(method, latrendData, nClusters = 1) model2 <- latrend(method, latrendData, nClusters = 2) model3 <- latrend(method, latrendData, nClusters = 3) models <- lcModels(model1, model2, model3) min(models, "WMAE")
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model1 <- latrend(method, latrendData, nClusters = 1) model2 <- latrend(method, latrendData, nClusters = 2) model3 <- latrend(method, latrendData, nClusters = 3) models <- lcModels(model1, model2, model3) min(models, "WMAE")
Evaluates the data call in the environment that the model was trained in.
## S3 method for class 'lcModel' model.data(object, ...)
## S3 method for class 'lcModel' model.data(object, ...)
object |
The |
... |
Additional arguments. |
The full data.frame
that was used for fitting the lcModel
.
model.frame.lcModel time.lcModel
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData) model.data(model)
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData) model.data(model)
See stats::model.frame()
for more details.
## S3 method for class 'lcModel' model.frame(formula, ...)
## S3 method for class 'lcModel' model.frame(formula, ...)
formula |
The |
... |
Additional arguments. |
A data.frame
containing the variables used by the model.
stats::model.frame model.data.lcModel
Other lcModel functions:
clusterNames()
,
clusterProportions()
,
clusterSizes()
,
clusterTrajectories()
,
coef.lcModel()
,
converged()
,
deviance.lcModel()
,
df.residual.lcModel()
,
estimationTime()
,
externalMetric()
,
fitted.lcModel()
,
fittedTrajectories()
,
getCall.lcModel()
,
getLcMethod()
,
ids()
,
lcModel-class
,
metric()
,
nClusters()
,
nIds()
,
nobs.lcModel()
,
plot-lcModel-method
,
plotClusterTrajectories()
,
plotFittedTrajectories()
,
postprob()
,
predict.lcModel()
,
predictAssignments()
,
predictForCluster()
,
predictPostprob()
,
qqPlot()
,
residuals.lcModel()
,
sigma.lcModel()
,
strip()
,
time.lcModel()
,
trajectoryAssignments()
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, data = latrendData) model.frame(model)
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, data = latrendData) model.frame(model)
Extract the argument names or number of arguments from an lcMethod
object.
## S4 method for signature 'lcMethod' length(x) ## S4 method for signature 'lcMethod' names(x)
## S4 method for signature 'lcMethod' length(x) ## S4 method for signature 'lcMethod' names(x)
x |
The |
The number of arguments, as scalar integer
.
A character vector
of argument names.
Other lcMethod functions:
[[,lcMethod-method
,
as.data.frame.lcMethod()
,
as.data.frame.lcMethods()
,
as.lcMethods()
,
as.list.lcMethod()
,
evaluate.lcMethod()
,
formula.lcMethod()
,
lcMethod-class
,
update.lcMethod()
method <- lcMethodLMKM(Y ~ Time) names(method) length(method)
method <- lcMethodLMKM(Y ~ Time) names(method) length(method)
Get the number of clusters estimated by the given object.
nClusters(object, ...) ## S4 method for signature 'lcModel' nClusters(object, ...)
nClusters(object, ...) ## S4 method for signature 'lcModel' nClusters(object, ...)
object |
The object |
... |
Not used. |
The number of clusters: a scalar numeric
non-zero count.
Other lcModel functions:
clusterNames()
,
clusterProportions()
,
clusterSizes()
,
clusterTrajectories()
,
coef.lcModel()
,
converged()
,
deviance.lcModel()
,
df.residual.lcModel()
,
estimationTime()
,
externalMetric()
,
fitted.lcModel()
,
fittedTrajectories()
,
getCall.lcModel()
,
getLcMethod()
,
ids()
,
lcModel-class
,
metric()
,
model.frame.lcModel()
,
nIds()
,
nobs.lcModel()
,
plot-lcModel-method
,
plotClusterTrajectories()
,
plotFittedTrajectories()
,
postprob()
,
predict.lcModel()
,
predictAssignments()
,
predictForCluster()
,
predictPostprob()
,
qqPlot()
,
residuals.lcModel()
,
sigma.lcModel()
,
strip()
,
time.lcModel()
,
trajectoryAssignments()
data(latrendData) method <- lcMethodRandom("Y", id = "Id", time = "Time", nClusters = 3) model <- latrend(method, latrendData) nClusters(model) # 3
data(latrendData) method <- lcMethodRandom("Y", id = "Id", time = "Time", nClusters = 3) model <- latrend(method, latrendData) nClusters(model) # 3
Get the number of trajectories (strata) that were used for fitting the given lcModel
object.
The number of trajectories is determined from the number of unique identifiers in the training data. In case the trajectory ids were supplied using a factor
column, the number of trajectories is determined by the number of levels instead.
nIds(object)
nIds(object)
object |
The |
An integer
with the number of trajectories on which the lcModel
was fitted.
Other lcModel functions:
clusterNames()
,
clusterProportions()
,
clusterSizes()
,
clusterTrajectories()
,
coef.lcModel()
,
converged()
,
deviance.lcModel()
,
df.residual.lcModel()
,
estimationTime()
,
externalMetric()
,
fitted.lcModel()
,
fittedTrajectories()
,
getCall.lcModel()
,
getLcMethod()
,
ids()
,
lcModel-class
,
metric()
,
model.frame.lcModel()
,
nClusters()
,
nobs.lcModel()
,
plot-lcModel-method
,
plotClusterTrajectories()
,
plotFittedTrajectories()
,
postprob()
,
predict.lcModel()
,
predictAssignments()
,
predictForCluster()
,
predictPostprob()
,
qqPlot()
,
residuals.lcModel()
,
sigma.lcModel()
,
strip()
,
time.lcModel()
,
trajectoryAssignments()
data(latrendData) method <- lcMethodRandom("Y", id = "Id", time = "Time") model <- latrend(method, latrendData) nIds(model)
data(latrendData) method <- lcMethodRandom("Y", id = "Id", time = "Time") model <- latrend(method, latrendData) nIds(model)
Extracts the number of observations that contributed information towards fitting the cluster trajectories of the respective lcModel
object.
Therefore, only non-missing response observations count towards the number of observations.
## S3 method for class 'lcModel' nobs(object, ...)
## S3 method for class 'lcModel' nobs(object, ...)
object |
The |
... |
Additional arguments. |
Other lcModel functions:
clusterNames()
,
clusterProportions()
,
clusterSizes()
,
clusterTrajectories()
,
coef.lcModel()
,
converged()
,
deviance.lcModel()
,
df.residual.lcModel()
,
estimationTime()
,
externalMetric()
,
fitted.lcModel()
,
fittedTrajectories()
,
getCall.lcModel()
,
getLcMethod()
,
ids()
,
lcModel-class
,
metric()
,
model.frame.lcModel()
,
nClusters()
,
nIds()
,
plot-lcModel-method
,
plotClusterTrajectories()
,
plotFittedTrajectories()
,
postprob()
,
predict.lcModel()
,
predictAssignments()
,
predictForCluster()
,
predictPostprob()
,
qqPlot()
,
residuals.lcModel()
,
sigma.lcModel()
,
strip()
,
time.lcModel()
,
trajectoryAssignments()
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData) nobs(model)
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData) nobs(model)
Computes the odds of correct classification (OCC) for each cluster. In other words, it computes the proportion of trajectories that can be expected to be correctly classified by the model for each cluster.
OCC(object)
OCC(object)
object |
The model, of type |
An OCC of 1 indicates that the cluster assignment is no better than by random chance.
The OCC per cluster, as a numeric vector
of length nClusters(object)
.
Empty clusters will output NA
.
Nagin DS (2005). Group-based modeling of development. Harvard University Press. ISBN 9780674041318, doi:10.4159/9780674041318. Klijn SL, Weijenberg MP, Lemmens P, van den Brandt PA, Passos VL (2017). “Introducing the fit-criteria assessment plot - A visualisation tool to assist class enumeration in group-based trajectory modelling.” Statistical Methods in Medical Research, 26(5), 2424-2436. van der Nest G, Lima Passos V, Candel MJ, van Breukelen GJ (2020). “An overview of mixture modelling for latent evolutions in longitudinal data: Modelling approaches, fit statistics and software.” Advances in Life Course Research, 43, 100323. ISSN 1040-2608, doi:10.1016/j.alcr.2019.100323.
A simulated longitudinal dataset comprising 301 patients with obstructive sleep apnea (OSA) during their first 91 days (13 weeks) of PAP therapy. The longitudinal patterns were inspired by the adherence patterns reported by Yi et al. (2022), interpolated to weekly hours of usage.
PAP.adh
PAP.adh
A data.frame
comprising longitudinal data of 500 patients, each having 26 observations
over a period of 1 year.
Each row represents a patient observation interval (two weeks), with columns:
integer
: The patient identifier, where each level represents a simulated patient.
integer
: The week number, starting from 1.
numeric
: The mean hours of usage in the respective week.
Greater than or equal to zero, and typically around 4-6 hours.
factor
: The reference group (i.e., adherence pattern) from which this patient was generated.
Yi H, Dong X, Shang S, Zhang C, Xu L, Han F (2022). “Identifying longitudinal patterns of CPAP treatment in OSA using growth mixture modeling: Disease characteristics and psychological determinants.” Frontiers in Neurology, 13, 1063461. doi:10.3389/fneur.2022.1063461.
data(PAP.adh) if (require("ggplot2")) { plotTrajectories(PAP.adh, id = "Patient", time = "Week", response = "UsageHours") # plot according to cluster ground truth plotTrajectories( PAP.adh, id = "Patient", time = "Week", response = "UsageHours", cluster = "Group" ) }
data(PAP.adh) if (require("ggplot2")) { plotTrajectories(PAP.adh, id = "Patient", time = "Week", response = "UsageHours") # plot according to cluster ground truth plotTrajectories( PAP.adh, id = "Patient", time = "Week", response = "UsageHours", cluster = "Group" ) }
A simulated longitudinal dataset comprising 500 patients with obstructive sleep apnea (OSA) during their first year on CPAP therapy. The dataset contains the patient usage hours, averaged over 2-week periods.
The daily usage data underlying the downsampled dataset was simulated based on 7 different adherence patterns. The defined adherence patterns were inspired by the adherence patterns identified by Aloia et al. (2008), with slight adjustments
PAP.adh1y
PAP.adh1y
A data.frame
comprising longitudinal data of 500 patients, each having 26 observations over a period of 1 year.
Each row represents a patient observation interval (two weeks), with columns:
factor
: The patient identifier, where each level represents a simulated patient.
integer
: Two-week interval index. Starts from 1.
integer
: The last day used for the aggregation of the respective interval, integer
numeric
: The mean hours of usage in the respective week.
Greater than or equal to zero, and typically around 4-6 hours.
factor
: The reference group (i.e., adherence pattern) from which this patient was generated.
This dataset is only intended for demonstration purposes. While the data format will remain the same, the data content is subject to change in future versions.
This dataset was generated based on the cluster-specific descriptive statistics table provided in Aloia et al. (2008), with some adjustments made in order to improve cluster separation for demonstration purposes.
Aloia MS, Goodwin MS, Velicer WF, Arnedt JT, Zimmerman M, Skrekas J, Harris S, Millman RP (2008). “Time series analysis of treatment adherence patterns in individuals with obstructive sleep apnea.” Annals of Behavioral Medicine, 36(1), 44–53. ISSN 0883-6612, doi:10.1007/s12160-008-9052-9.
data(PAP.adh1y) if (require("ggplot2")) { plotTrajectories(PAP.adh1y, id = "Patient", time = "Biweek", response = "UsageHours") # plot according to cluster ground truth plotTrajectories( PAP.adh1y, id = "Patient", time = "Biweek", response = "UsageHours", cluster = "Group" ) }
data(PAP.adh1y) if (require("ggplot2")) { plotTrajectories(PAP.adh1y, id = "Patient", time = "Biweek", response = "UsageHours") # plot according to cluster ground truth plotTrajectories( PAP.adh1y, id = "Patient", time = "Biweek", response = "UsageHours", cluster = "Group" ) }
Plot a lcModel
object.
By default, this plots the cluster trajectories of the model, along with the trajectories used for estimation.
## S4 method for signature 'lcModel' plot(x, y, ...)
## S4 method for signature 'lcModel' plot(x, y, ...)
x |
The |
y |
Not used. |
... |
Arguments passed on to
|
A ggplot
object.
plotClusterTrajectories plotFittedTrajectories plotTrajectories ggplot2::ggplot
Other lcModel functions:
clusterNames()
,
clusterProportions()
,
clusterSizes()
,
clusterTrajectories()
,
coef.lcModel()
,
converged()
,
deviance.lcModel()
,
df.residual.lcModel()
,
estimationTime()
,
externalMetric()
,
fitted.lcModel()
,
fittedTrajectories()
,
getCall.lcModel()
,
getLcMethod()
,
ids()
,
lcModel-class
,
metric()
,
model.frame.lcModel()
,
nClusters()
,
nIds()
,
nobs.lcModel()
,
plotClusterTrajectories()
,
plotFittedTrajectories()
,
postprob()
,
predict.lcModel()
,
predictAssignments()
,
predictForCluster()
,
predictPostprob()
,
qqPlot()
,
residuals.lcModel()
,
sigma.lcModel()
,
strip()
,
time.lcModel()
,
trajectoryAssignments()
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData, nClusters = 3) if (require("ggplot2")) { plot(model) }
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData, nClusters = 3) if (require("ggplot2")) { plot(model) }
Grid plot for a list of models
## S4 method for signature 'lcModels' plot(x, y, ..., subset, gridArgs = list())
## S4 method for signature 'lcModels' plot(x, y, ..., subset, gridArgs = list())
x |
The |
y |
Not used. |
... |
Additional parameters passed to the |
subset |
Logical expression based on the |
gridArgs |
Named list of parameters passed to gridExtra::arrangeGrob. |
Plot the cluster trajectories associated with the given model.
plotClusterTrajectories(object, ...) ## S4 method for signature 'data.frame' plotClusterTrajectories( object, response, cluster = "Cluster", clusterOrder = character(), clusterLabeler = make.clusterPropLabels, time = getOption("latrend.time"), center = meanNA, trajectories = c(FALSE, "sd", "se", "80pct", "90pct", "95pct", "range"), facet = !isFALSE(as.logical(trajectories[1])), id = getOption("latrend.id"), ... ) ## S4 method for signature 'lcModel' plotClusterTrajectories( object, what = "mu", at = time(object), clusterOrder = character(), clusterLabeler = make.clusterPropLabels, trajectories = FALSE, facet = !isFALSE(as.logical(trajectories[1])), ... )
plotClusterTrajectories(object, ...) ## S4 method for signature 'data.frame' plotClusterTrajectories( object, response, cluster = "Cluster", clusterOrder = character(), clusterLabeler = make.clusterPropLabels, time = getOption("latrend.time"), center = meanNA, trajectories = c(FALSE, "sd", "se", "80pct", "90pct", "95pct", "range"), facet = !isFALSE(as.logical(trajectories[1])), id = getOption("latrend.id"), ... ) ## S4 method for signature 'lcModel' plotClusterTrajectories( object, what = "mu", at = time(object), clusterOrder = character(), clusterLabeler = make.clusterPropLabels, trajectories = FALSE, facet = !isFALSE(as.logical(trajectories[1])), ... )
object |
The (cluster) trajectory data. |
... |
Additional arguments passed to clusterTrajectories. |
response |
The response variable name, see responseVariable. |
cluster |
The cluster assignment column |
clusterOrder |
Specify which clusters to plot and the order. Can be the cluster names or index. By default, all clusters are shown. |
clusterLabeler |
A |
time |
The time variable name, see timeVariable. |
center |
A function for aggregating multiple points at the same point in time |
trajectories |
Whether to additionally plot the original trajectories ( Note that visualizing the expected intervals is currently only supported for time-aligned trajectories,
as the interval is computed at each unique moment in time.
By default ( |
facet |
Whether to facet by cluster. This is done by default when |
id |
Id column. Only needed when |
what |
The distributional parameter to predict. By default, the mean response 'mu' is predicted. The cluster membership predictions can be obtained by specifying |
at |
A |
A ggplot
object.
Other lcModel functions:
clusterNames()
,
clusterProportions()
,
clusterSizes()
,
clusterTrajectories()
,
coef.lcModel()
,
converged()
,
deviance.lcModel()
,
df.residual.lcModel()
,
estimationTime()
,
externalMetric()
,
fitted.lcModel()
,
fittedTrajectories()
,
getCall.lcModel()
,
getLcMethod()
,
ids()
,
lcModel-class
,
metric()
,
model.frame.lcModel()
,
nClusters()
,
nIds()
,
nobs.lcModel()
,
plot-lcModel-method
,
plotFittedTrajectories()
,
postprob()
,
predict.lcModel()
,
predictAssignments()
,
predictForCluster()
,
predictPostprob()
,
qqPlot()
,
residuals.lcModel()
,
sigma.lcModel()
,
strip()
,
time.lcModel()
,
trajectoryAssignments()
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData, nClusters = 3) if (require("ggplot2")) { plotClusterTrajectories(model) # show cluster sizes in labels plotClusterTrajectories(model, clusterLabeler = make.clusterSizeLabels) # change cluster order plotClusterTrajectories(model, clusterOrder = c('B', 'C', 'A')) # sort clusters by decreasing size plotClusterTrajectories(model, clusterOrder = order(-clusterSizes(model))) # show only specific clusters plotClusterTrajectories(model, clusterOrder = c('B', 'C')) # show assigned trajectories plotClusterTrajectories(model, trajectories = TRUE) # show 95th percentile observation interval plotClusterTrajectories(model, trajectories = "95pct") # show observation standard deviation plotClusterTrajectories(model, trajectories = "sd") # show observation standard error plotClusterTrajectories(model, trajectories = "se") # show observation range plotClusterTrajectories(model, trajectories = "range") }
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData, nClusters = 3) if (require("ggplot2")) { plotClusterTrajectories(model) # show cluster sizes in labels plotClusterTrajectories(model, clusterLabeler = make.clusterSizeLabels) # change cluster order plotClusterTrajectories(model, clusterOrder = c('B', 'C', 'A')) # sort clusters by decreasing size plotClusterTrajectories(model, clusterOrder = order(-clusterSizes(model))) # show only specific clusters plotClusterTrajectories(model, clusterOrder = c('B', 'C')) # show assigned trajectories plotClusterTrajectories(model, trajectories = TRUE) # show 95th percentile observation interval plotClusterTrajectories(model, trajectories = "95pct") # show observation standard deviation plotClusterTrajectories(model, trajectories = "sd") # show observation standard error plotClusterTrajectories(model, trajectories = "se") # show observation range plotClusterTrajectories(model, trajectories = "range") }
Plot the fitted trajectories as represented by the given model
plotFittedTrajectories(object, ...) ## S4 method for signature 'lcModel' plotFittedTrajectories(object, ...)
plotFittedTrajectories(object, ...) ## S4 method for signature 'lcModel' plotFittedTrajectories(object, ...)
object |
The model. |
... |
Arguments passed to |
A ggplot
object.
plotClusterTrajectories plotTrajectories plot
Other lcModel functions:
clusterNames()
,
clusterProportions()
,
clusterSizes()
,
clusterTrajectories()
,
coef.lcModel()
,
converged()
,
deviance.lcModel()
,
df.residual.lcModel()
,
estimationTime()
,
externalMetric()
,
fitted.lcModel()
,
fittedTrajectories()
,
getCall.lcModel()
,
getLcMethod()
,
ids()
,
lcModel-class
,
metric()
,
model.frame.lcModel()
,
nClusters()
,
nIds()
,
nobs.lcModel()
,
plot-lcModel-method
,
plotClusterTrajectories()
,
postprob()
,
predict.lcModel()
,
predictAssignments()
,
predictForCluster()
,
predictPostprob()
,
qqPlot()
,
residuals.lcModel()
,
sigma.lcModel()
,
strip()
,
time.lcModel()
,
trajectoryAssignments()
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData, nClusters = 3) if (require("ggplot2")) { plotFittedTrajectories(model) }
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData, nClusters = 3) if (require("ggplot2")) { plotFittedTrajectories(model) }
Plot one or more internal metrics for all lcModels
plotMetric(models, name, by = "nClusters", subset, group = character())
plotMetric(models, name, by = "nClusters", subset, group = character())
models |
A |
name |
The name(s) of the metric(s) to compute. If no names are given, the names specified in the |
by |
The argument name along which methods are plotted. |
subset |
Logical expression based on the |
group |
The argument names to use for determining groups of different models. By default,
all arguments are included.
Specifying |
ggplot2
object.
Print an argument summary for each of the models.
Convert to a data.frame
of method arguments.
Subset the list.
Compute an internal metric or external metric.
Obtain the best model according to minimizing or maximizing a metric.
Obtain the summed estimation time.
Plot a metric across a variable.
Other lcModels functions:
as.lcModels()
,
lcModels
,
lcModels-class
,
max.lcModels()
,
min.lcModels()
,
print.lcModels()
,
subset.lcModels()
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") methods <- lcMethods(method, nClusters = 1:3) models <- latrendBatch(methods, latrendData) if (require("ggplot2")) { plotMetric(models, "WMAE") } if (require("ggplot2") && require("clusterCrit")) { plotMetric(models, c("WMAE", "Dunn")) }
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") methods <- lcMethods(method, nClusters = 1:3) models <- latrendBatch(methods, latrendData) if (require("ggplot2")) { plotMetric(models, "WMAE") } if (require("ggplot2") && require("clusterCrit")) { plotMetric(models, c("WMAE", "Dunn")) }
Plots the output of trajectories for the given object.
plotTrajectories(object, ...) ## S4 method for signature 'data.frame' plotTrajectories( object, response, cluster, time = getOption("latrend.time"), id = getOption("latrend.id"), facet = TRUE, ... ) ## S4 method for signature 'ANY' plotTrajectories(object, ...) ## S4 method for signature 'lcModel' plotTrajectories(object, ...)
plotTrajectories(object, ...) ## S4 method for signature 'data.frame' plotTrajectories( object, response, cluster, time = getOption("latrend.time"), id = getOption("latrend.id"), facet = TRUE, ... ) ## S4 method for signature 'ANY' plotTrajectories(object, ...) ## S4 method for signature 'lcModel' plotTrajectories(object, ...)
object |
The data or model or extract the trajectories from. |
... |
Additional arguments passed to trajectories. |
response |
Response variable |
cluster |
Whether to plot trajectories grouped by cluster (determined by the "Cluster" column). Alternatively, the name of the cluster column indicating trajectory cluster membership. If unspecified, trajectories are grouped if the object contains a "Cluster" column. |
time |
The time variable name, see timeVariable. |
id |
The identifier variable name, see idVariable. |
facet |
Whether to facet by cluster. |
trajectories plotFittedTrajectories plotClusterTrajectories
data(latrendData) if (require("ggplot2")) { plotTrajectories(latrendData, response = "Y", id = "Id", time = "Time") plotTrajectories( latrendData, response = quote(exp(Y)), id = "Id", time = "Time" ) plotTrajectories( latrendData, response = "Y", id = "Id", time = "Time", cluster = "Class" ) } data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData, nClusters = 3) if (require("ggplot2")) { plotTrajectories(model) }
data(latrendData) if (require("ggplot2")) { plotTrajectories(latrendData, response = "Y", id = "Id", time = "Time") plotTrajectories( latrendData, response = quote(exp(Y)), id = "Id", time = "Time" ) plotTrajectories( latrendData, response = "Y", id = "Id", time = "Time", cluster = "Class" ) } data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData, nClusters = 3) if (require("ggplot2")) { plotTrajectories(model) }
lcMethod
estimation step: logic for post-processing the fitted lcModelNote: this function should not be called directly, as it is part of the lcMethod
estimation procedure.
For fitting an lcMethod
object to a dataset, use the latrend()
function or one of the other standard estimation functions.
The postFit()
function of the lcMethod
object defines how the lcModel
object returned by fit()
should be post-processed.
This can be used, for example, to:
Resolve label switching.
Clean up the internal model representation.
Correct estimation errors.
Compute additional metrics.
By default, this method does not do anything. It merely returns the original lcModel
object.
This is the last step in the lcMethod
fitting procedure. The postFit
method may be called again on fitted lcModel
objects, allowing post-processing to be updated for existing models.
postFit(method, data, model, envir, verbose, ...) ## S4 method for signature 'lcMethod' postFit(method, data, model, envir, verbose)
postFit(method, data, model, envir, verbose, ...) ## S4 method for signature 'lcMethod' postFit(method, data, model, envir, verbose)
method |
An object inheriting from |
data |
A |
model |
The |
envir |
The |
verbose |
A R.utils::Verbose object indicating the level of verbosity. |
... |
Not used. |
The updated lcModel
object.
The method is intended to be able to be called on previously fitted lcModel
objects as well, allowing for potential bugfixes or additions to previously fitted models.
Therefore, when implementing this method, ensure that you do not discard information from the model which would prevent the method from being run a second time on the object.
In this example, the lcModelExample
class is assumed to be defined with a slot named "centers"
:
setMethod("postFit", "lcMethodExample", function(method, data, model, envir, verbose) { # compute and store the cluster centers model@centers <- INTENSIVE_COMPUTATION return(model) })
The steps for estimating a lcMethod
object are defined and executed as follows:
compose()
: Evaluate and finalize the method argument values.
validate()
: Check the validity of the method argument values in relation to the dataset.
prepareData()
: Process the training data for fitting.
preFit()
: Prepare environment for estimation, independent of training data.
fit()
: Estimate the specified method on the training data, outputting an object inheriting from lcModel
.
postFit()
: Post-process the outputted lcModel
object.
The result of the fitting procedure is an lcModel object that inherits from the lcModel
class.
Get the posterior probability matrix with element indicating the probability of trajectory
belonging to cluster
.
postprob(object, ...) ## S4 method for signature 'lcModel' postprob(object, ...)
postprob(object, ...) ## S4 method for signature 'lcModel' postprob(object, ...)
object |
The model. |
... |
Not used. |
This method should be extended by lcModel
implementations. The default implementation returns uniform probabilities for all observations.
An I-by-K numeric matrix
with I = nIds(object)
and K = nClusters(object)
.
Classes extending lcModel
should override this method.
setMethod("postprob", "lcModelExt", function(object, ...) { # return trajectory-specific posterior probability matrix })
If you are getting errors about undefined model signatures when calling postprob(model), check whether the postprob() function is still the one defined by the latrend package. It may have been overridden when attaching another package (e.g., lcmm). If you need to attach conflicting packages, load them first.
trajectoryAssignments predictPostprob predictAssignments
Other lcModel functions:
clusterNames()
,
clusterProportions()
,
clusterSizes()
,
clusterTrajectories()
,
coef.lcModel()
,
converged()
,
deviance.lcModel()
,
df.residual.lcModel()
,
estimationTime()
,
externalMetric()
,
fitted.lcModel()
,
fittedTrajectories()
,
getCall.lcModel()
,
getLcMethod()
,
ids()
,
lcModel-class
,
metric()
,
model.frame.lcModel()
,
nClusters()
,
nIds()
,
nobs.lcModel()
,
plot-lcModel-method
,
plotClusterTrajectories()
,
plotFittedTrajectories()
,
predict.lcModel()
,
predictAssignments()
,
predictForCluster()
,
predictPostprob()
,
qqPlot()
,
residuals.lcModel()
,
sigma.lcModel()
,
strip()
,
time.lcModel()
,
trajectoryAssignments()
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData) postprob(model) if (rlang::is_installed("lcmm")) { gmmMethod = lcMethodLcmmGMM( fixed = Y ~ Time, mixture = ~ Time, id = "Id", time = "Time", idiag = TRUE, nClusters = 2 ) gmmModel <- latrend(gmmMethod, data = latrendData) postprob(gmmModel) }
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData) postprob(model) if (rlang::is_installed("lcmm")) { gmmMethod = lcMethodLcmmGMM( fixed = Y ~ Time, mixture = ~ Time, id = "Id", time = "Time", idiag = TRUE, nClusters = 2 ) gmmModel <- latrend(gmmMethod, data = latrendData) postprob(gmmModel) }
For each trajectory, the probability of the assigned cluster is 1.
postprobFromAssignments(assignments, k)
postprobFromAssignments(assignments, k)
assignments |
Integer vector indicating cluster assignment per trajectory |
k |
The number of clusters. |
Predicts the expected trajectory observations at the given time for each cluster.
## S3 method for class 'lcModel' predict(object, newdata = NULL, what = "mu", ..., useCluster = NA)
## S3 method for class 'lcModel' predict(object, newdata = NULL, what = "mu", ..., useCluster = NA)
object |
The |
newdata |
Optional |
what |
The distributional parameter to predict. By default, the mean response 'mu' is predicted. The cluster membership predictions can be obtained by specifying |
... |
Additional arguments. |
useCluster |
Whether to use the "Cluster" column in the newdata argument for computing predictions conditional on the respective cluster.
For |
If newdata
specifies the cluster membership; a data.frame
of cluster-specific predictions. Otherwise, a list
of data.frame
of cluster-specific predictions is returned.
Note: Subclasses of lcModel
should preferably implement predictForCluster()
instead of overriding predict.lcModel
as that function is designed to be easier to implement because it is single-purpose.
The predict.lcModelExt
function should be able to handle the case where newdata = NULL
by returning the fitted values.
After post-processing the non-NULL newdata input, the observation- and cluster-specific predictions can be computed.
Lastly, the output logic is handled by the transformPredict()
function. It converts the computed predictions (e.g., matrix
or data.frame
) to the appropriate output format.
predict.lcModelExt <- function(object, newdata = NULL, what = "mu", ...) { if (is.null(newdata)) { newdata = model.data(object) if (hasName(newdata, 'Cluster')) { # allowing the Cluster column to remain would break the fitted() output. newdata[['Cluster']] = NULL } } # compute cluster-specific predictions for the given newdata pred <- NEWDATA_COMPUTATIONS_HERE transformPredict(pred = pred, model = object, newdata = newdata) })
predictForCluster stats::predict fitted.lcModel clusterTrajectories trajectories predictPostprob predictAssignments
Other lcModel functions:
clusterNames()
,
clusterProportions()
,
clusterSizes()
,
clusterTrajectories()
,
coef.lcModel()
,
converged()
,
deviance.lcModel()
,
df.residual.lcModel()
,
estimationTime()
,
externalMetric()
,
fitted.lcModel()
,
fittedTrajectories()
,
getCall.lcModel()
,
getLcMethod()
,
ids()
,
lcModel-class
,
metric()
,
model.frame.lcModel()
,
nClusters()
,
nIds()
,
nobs.lcModel()
,
plot-lcModel-method
,
plotClusterTrajectories()
,
plotFittedTrajectories()
,
postprob()
,
predictAssignments()
,
predictForCluster()
,
predictPostprob()
,
qqPlot()
,
residuals.lcModel()
,
sigma.lcModel()
,
strip()
,
time.lcModel()
,
trajectoryAssignments()
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData) predFitted <- predict(model) # same result as fitted(model) # Cluster trajectory of cluster A predCluster <- predict(model, newdata = data.frame(Cluster = "A", Time = time(model))) # Prediction for id S1 given cluster A membership predId <- predict(model, newdata = data.frame(Cluster = "A", Id = "S1", Time = time(model))) # Prediction matrix for id S1 for all clusters predIdAll <- predict(model, newdata = data.frame(Id = "S1", Time = time(model)))
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData) predFitted <- predict(model) # same result as fitted(model) # Cluster trajectory of cluster A predCluster <- predict(model, newdata = data.frame(Cluster = "A", Time = time(model))) # Prediction for id S1 given cluster A membership predId <- predict(model, newdata = data.frame(Cluster = "A", Id = "S1", Time = time(model))) # Prediction matrix for id S1 for all clusters predIdAll <- predict(model, newdata = data.frame(Id = "S1", Time = time(model)))
Predict the most likely cluster membership for each trajectory in the given data.
predictAssignments(object, newdata = NULL, ...) ## S4 method for signature 'lcModel' predictAssignments(object, newdata = NULL, strategy = which.max, ...)
predictAssignments(object, newdata = NULL, ...) ## S4 method for signature 'lcModel' predictAssignments(object, newdata = NULL, strategy = which.max, ...)
object |
The model. |
newdata |
A |
... |
Not used. |
strategy |
A function returning the cluster index based on the given |
The default implementation uses predictPostprob to determine the cluster membership.
A factor
of length nrow(newdata)
that indicates the assigned cluster per trajectory per observation.
predictPostprob predict.lcModel
Other lcModel functions:
clusterNames()
,
clusterProportions()
,
clusterSizes()
,
clusterTrajectories()
,
coef.lcModel()
,
converged()
,
deviance.lcModel()
,
df.residual.lcModel()
,
estimationTime()
,
externalMetric()
,
fitted.lcModel()
,
fittedTrajectories()
,
getCall.lcModel()
,
getLcMethod()
,
ids()
,
lcModel-class
,
metric()
,
model.frame.lcModel()
,
nClusters()
,
nIds()
,
nobs.lcModel()
,
plot-lcModel-method
,
plotClusterTrajectories()
,
plotFittedTrajectories()
,
postprob()
,
predict.lcModel()
,
predictForCluster()
,
predictPostprob()
,
qqPlot()
,
residuals.lcModel()
,
sigma.lcModel()
,
strip()
,
time.lcModel()
,
trajectoryAssignments()
## Not run: data(latrendData) if (require("kml")) { model <- latrend(method = lcMethodKML("Y", id = "Id", time = "Time"), latrendData) predictAssignments(model, newdata = data.frame(Id = 999, Y = 0, Time = 0)) } ## End(Not run)
## Not run: data(latrendData) if (require("kml")) { model <- latrend(method = lcMethodKML("Y", id = "Id", time = "Time"), latrendData) predictAssignments(model, newdata = data.frame(Id = 999, Y = 0, Time = 0)) } ## End(Not run)
Predicts the expected trajectory observations at the given time under the assumption that the trajectory belongs to the specified cluster.
For lcModel
objects, the same result can be obtained by calling predict()
with the newdata
data.frame
having a "Cluster"
assignment column.
The main purpose of this function is to make it easier to implement the prediction computations for custom lcModel
classes.
predictForCluster(object, newdata = NULL, cluster, ...) ## S4 method for signature 'lcModel' predictForCluster(object, newdata = NULL, cluster, ..., what = "mu")
predictForCluster(object, newdata = NULL, cluster, ...) ## S4 method for signature 'lcModel' predictForCluster(object, newdata = NULL, cluster, ..., what = "mu")
object |
The model. |
newdata |
A |
cluster |
The cluster name (as |
... |
Arguments passed on to
|
what |
The distributional parameter to predict. By default, the mean response 'mu' is predicted. The cluster membership predictions can be obtained by specifying |
The default predictForCluster(lcModel)
method makes use of predict.lcModel()
, and vice versa. For this to work, any extending lcModel
classes, e.g., lcModelExample
, should implement either predictForCluster(lcModelExample)
or predict.lcModelExample()
. When implementing new models, it is advisable to implement predictForCluster
as the cluster-specific computation generally results in shorter and simpler code.
A vector
with the predictions per newdata
observation, or a data.frame
with the predictions and newdata alongside.
Classes extending lcModel
should override this method, unless predict.lcModel()
is preferred.
setMethod("predictForCluster", "lcModelExt", function(object, newdata = NULL, cluster, ..., what = "mu") { # return model predictions for the given data under the # assumption of the data belonging to the given cluster })
Other lcModel functions:
clusterNames()
,
clusterProportions()
,
clusterSizes()
,
clusterTrajectories()
,
coef.lcModel()
,
converged()
,
deviance.lcModel()
,
df.residual.lcModel()
,
estimationTime()
,
externalMetric()
,
fitted.lcModel()
,
fittedTrajectories()
,
getCall.lcModel()
,
getLcMethod()
,
ids()
,
lcModel-class
,
metric()
,
model.frame.lcModel()
,
nClusters()
,
nIds()
,
nobs.lcModel()
,
plot-lcModel-method
,
plotClusterTrajectories()
,
plotFittedTrajectories()
,
postprob()
,
predict.lcModel()
,
predictAssignments()
,
predictPostprob()
,
qqPlot()
,
residuals.lcModel()
,
sigma.lcModel()
,
strip()
,
time.lcModel()
,
trajectoryAssignments()
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData) predictForCluster( model, newdata = data.frame(Time = c(0, 1)), cluster = "B" ) # all fitted values under cluster B predictForCluster(model, cluster = "B")
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData) predictForCluster( model, newdata = data.frame(Time = c(0, 1)), cluster = "B" ) # all fitted values under cluster B predictForCluster(model, cluster = "B")
Returns the observation-specific posterior probabilities for the given data.
For lcModel
: The default implementation returns a uniform probability matrix.
predictPostprob(object, newdata = NULL, ...) ## S4 method for signature 'lcModel' predictPostprob(object, newdata = NULL, ...)
predictPostprob(object, newdata = NULL, ...) ## S4 method for signature 'lcModel' predictPostprob(object, newdata = NULL, ...)
object |
The model. |
newdata |
Optional |
... |
Additional arguments passed to postprob. |
A N-by-K matrix
indicating the posterior probability per trajectory per measurement on each row, for each cluster (the columns).
Here, N = nrow(newdata)
and K = nClusters(object)
.
Classes extending lcModel
should override this method to enable posterior probability predictions for new data.
setMethod("predictPostprob", "lcModelExt", function(object, newdata = NULL, ...) { # return observation-specific posterior probability matrix })
Other lcModel functions:
clusterNames()
,
clusterProportions()
,
clusterSizes()
,
clusterTrajectories()
,
coef.lcModel()
,
converged()
,
deviance.lcModel()
,
df.residual.lcModel()
,
estimationTime()
,
externalMetric()
,
fitted.lcModel()
,
fittedTrajectories()
,
getCall.lcModel()
,
getLcMethod()
,
ids()
,
lcModel-class
,
metric()
,
model.frame.lcModel()
,
nClusters()
,
nIds()
,
nobs.lcModel()
,
plot-lcModel-method
,
plotClusterTrajectories()
,
plotFittedTrajectories()
,
postprob()
,
predict.lcModel()
,
predictAssignments()
,
predictForCluster()
,
qqPlot()
,
residuals.lcModel()
,
sigma.lcModel()
,
strip()
,
time.lcModel()
,
trajectoryAssignments()
lcMethod
estimation step: method preparation logicNote: this function should not be called directly, as it is part of the lcMethod
estimation procedure.
For fitting an lcMethod
object to a dataset, use the latrend()
function or one of the other standard estimation functions.
The preFit()
function of the lcMethod
object performs preparatory work that is needed for fitting the method but should not be counted towards the method estimation time.
The work is added to the provided environment
, allowing the fit()
function to make use of the prepared work.
preFit(method, data, envir, verbose, ...) ## S4 method for signature 'lcMethod' preFit(method, data, envir, verbose)
preFit(method, data, envir, verbose, ...) ## S4 method for signature 'lcMethod' preFit(method, data, envir, verbose)
method |
An object inheriting from |
data |
A |
envir |
The |
verbose |
A R.utils::Verbose object indicating the level of verbosity. |
... |
Not used. |
The updated environment
that will be passed to fit()
.
setMethod("preFit", "lcMethodExample", function(method, data, envir, verbose) { # update envir with additional computed work envir$x <- INTENSIVE_OPERATION return(envir) })
The steps for estimating a lcMethod
object are defined and executed as follows:
compose()
: Evaluate and finalize the method argument values.
validate()
: Check the validity of the method argument values in relation to the dataset.
prepareData()
: Process the training data for fitting.
preFit()
: Prepare environment for estimation, independent of training data.
fit()
: Estimate the specified method on the training data, outputting an object inheriting from lcModel
.
postFit()
: Post-process the outputted lcModel
object.
The result of the fitting procedure is an lcModel object that inherits from the lcModel
class.
lcMethod
estimation step: logic for preparing the training dataNote: this function should not be called directly, as it is part of the lcMethod
estimation procedure.
For fitting an lcMethod
object to a dataset, use the latrend()
function or one of the other standard estimation functions.
The prepareData()
function of the lcMethod
object processes the training data prior to fitting the method.
Example uses:
Transforming the data to another format, e.g., a matrix.
Truncating the response variable.
Computing derived covariates.
Creating additional data objects.
The computed variables are stored in an environment
which is passed to the preFit()
function for further processing.
By default, this method does not do anything.
prepareData(method, data, verbose, ...) ## S4 method for signature 'lcMethod' prepareData(method, data, verbose)
prepareData(method, data, verbose, ...) ## S4 method for signature 'lcMethod' prepareData(method, data, verbose)
method |
An object inheriting from |
data |
A |
verbose |
A R.utils::Verbose object indicating the level of verbosity. |
... |
Not used. |
An environment
.
An environment
with the prepared data variable(s) that will be passed to preFit()
.
A common use case for this method is when the internal method fitting procedure expects the data in a different format.
In this example, the method converts the training data data.frame
to a matrix
of repeated and aligned trajectory measurements.
setMethod("prepareData", "lcMethodExample", function(method, data, verbose) { envir = new.env() # transform the data to matrix envir$dataMat = tsmatrix(data, id = idColumn, time = timeColumn, response = valueColumn) return(envir) })
The steps for estimating a lcMethod
object are defined and executed as follows:
compose()
: Evaluate and finalize the method argument values.
validate()
: Check the validity of the method argument values in relation to the dataset.
prepareData()
: Process the training data for fitting.
preFit()
: Prepare environment for estimation, independent of training data.
fit()
: Estimate the specified method on the training data, outputting an object inheriting from lcModel
.
postFit()
: Post-process the outputted lcModel
object.
The result of the fitting procedure is an lcModel object that inherits from the lcModel
class.
Print the arguments of an lcMethod object
## S3 method for class 'lcMethod' print(x, ..., eval = FALSE, width = 40, envir = NULL)
## S3 method for class 'lcMethod' print(x, ..., eval = FALSE, width = 40, envir = NULL)
x |
The |
... |
Not used. |
eval |
Whether to print the evaluated argument values. |
width |
Maximum number of characters per argument. |
envir |
The environment in which to evaluate the arguments when |
Print lcModels list concisely
## S3 method for class 'lcModels' print( x, ..., summary = FALSE, excludeShared = !getOption("latrend.printSharedModelArgs") )
## S3 method for class 'lcModels' print( x, ..., summary = FALSE, excludeShared = !getOption("latrend.printSharedModelArgs") )
x |
The |
... |
Not used. |
summary |
Whether to print the complete summary per model. This may be slow for long lists! |
excludeShared |
Whether to exclude model arguments which are identical across all models. |
Print an argument summary for each of the models.
Convert to a data.frame
of method arguments.
Subset the list.
Compute an internal metric or external metric.
Obtain the best model according to minimizing or maximizing a metric.
Obtain the summed estimation time.
Plot a metric across a variable.
Other lcModels functions:
as.lcModels()
,
lcModels
,
lcModels-class
,
max.lcModels()
,
min.lcModels()
,
plotMetric()
,
subset.lcModels()
Plot the quantile-quantile (Q-Q) plot for the fitted lcModel
object. This function is based on the qqplotr package.
qqPlot(model, byCluster = FALSE, ...)
qqPlot(model, byCluster = FALSE, ...)
model |
|
byCluster |
Whether to plot the Q-Q line per cluster |
... |
Additional arguments passed to residuals.lcModel, |
A ggplot
object.
residuals.lcModel metric plotClusterTrajectories
Other lcModel functions:
clusterNames()
,
clusterProportions()
,
clusterSizes()
,
clusterTrajectories()
,
coef.lcModel()
,
converged()
,
deviance.lcModel()
,
df.residual.lcModel()
,
estimationTime()
,
externalMetric()
,
fitted.lcModel()
,
fittedTrajectories()
,
getCall.lcModel()
,
getLcMethod()
,
ids()
,
lcModel-class
,
metric()
,
model.frame.lcModel()
,
nClusters()
,
nIds()
,
nobs.lcModel()
,
plot-lcModel-method
,
plotClusterTrajectories()
,
plotFittedTrajectories()
,
postprob()
,
predict.lcModel()
,
predictAssignments()
,
predictForCluster()
,
predictPostprob()
,
residuals.lcModel()
,
sigma.lcModel()
,
strip()
,
time.lcModel()
,
trajectoryAssignments()
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time", nClusters = 3) model <- latrend(method, latrendData) if (require("ggplot2") && require("qqplotr")) { qqPlot(model) }
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time", nClusters = 3) model <- latrend(method, latrendData) if (require("ggplot2") && require("qqplotr")) { qqPlot(model) }
Extract the residuals for a fitted lcModel
object.
By default, residuals are computed under the most likely cluster assignment for each trajectory.
## S3 method for class 'lcModel' residuals(object, ..., clusters = trajectoryAssignments(object))
## S3 method for class 'lcModel' residuals(object, ..., clusters = trajectoryAssignments(object))
object |
The |
... |
Additional arguments. |
clusters |
Optional cluster assignments per id. If unspecified, a |
A numeric vector
of residuals for the cluster assignments specified by clusters.
If the clusters
argument is unspecified, a matrix
of cluster-specific residuals per observations is returned.
Other lcModel functions:
clusterNames()
,
clusterProportions()
,
clusterSizes()
,
clusterTrajectories()
,
coef.lcModel()
,
converged()
,
deviance.lcModel()
,
df.residual.lcModel()
,
estimationTime()
,
externalMetric()
,
fitted.lcModel()
,
fittedTrajectories()
,
getCall.lcModel()
,
getLcMethod()
,
ids()
,
lcModel-class
,
metric()
,
model.frame.lcModel()
,
nClusters()
,
nIds()
,
nobs.lcModel()
,
plot-lcModel-method
,
plotClusterTrajectories()
,
plotFittedTrajectories()
,
postprob()
,
predict.lcModel()
,
predictAssignments()
,
predictForCluster()
,
predictPostprob()
,
qqPlot()
,
sigma.lcModel()
,
strip()
,
time.lcModel()
,
trajectoryAssignments()
Extracts the response variable from the given object
.
Get the response variable, i.e., the dependent variable.
responseVariable(object, ...) ## S4 method for signature 'lcMethod' responseVariable(object, ...) ## S4 method for signature 'lcModel' responseVariable(object, ...)
responseVariable(object, ...) ## S4 method for signature 'lcMethod' responseVariable(object, ...) ## S4 method for signature 'lcModel' responseVariable(object, ...)
object |
The object. |
... |
Not used. |
If the lcMethod
object specifies a formula
argument, then the response is extracted from the response term of the formula.
A nonempty string, as character
.
Other variables:
idVariable()
,
timeVariable()
method <- lcMethodLMKM(Y ~ Time) responseVariable(method) # "Y" data(latrendData) method <- lcMethodRandom("Y", id = "Id", time = "Time") model <- latrend(method, latrendData) responseVariable(model) # "Y"
method <- lcMethodLMKM(Y ~ Time) responseVariable(method) # "Y" data(latrendData) method <- lcMethodRandom("Y", id = "Id", time = "Time") model <- latrend(method, latrendData) responseVariable(model) # "Y"
Extracts or estimates the residual standard deviation. If sigma()
is not defined for a model, it is estimated from the residual error vector.
## S3 method for class 'lcModel' sigma(object, ...)
## S3 method for class 'lcModel' sigma(object, ...)
object |
The |
... |
Additional arguments. |
A numeric
indicating the residual standard deviation.
Other lcModel functions:
clusterNames()
,
clusterProportions()
,
clusterSizes()
,
clusterTrajectories()
,
coef.lcModel()
,
converged()
,
deviance.lcModel()
,
df.residual.lcModel()
,
estimationTime()
,
externalMetric()
,
fitted.lcModel()
,
fittedTrajectories()
,
getCall.lcModel()
,
getLcMethod()
,
ids()
,
lcModel-class
,
metric()
,
model.frame.lcModel()
,
nClusters()
,
nIds()
,
nobs.lcModel()
,
plot-lcModel-method
,
plotClusterTrajectories()
,
plotFittedTrajectories()
,
postprob()
,
predict.lcModel()
,
predictAssignments()
,
predictForCluster()
,
predictPostprob()
,
qqPlot()
,
residuals.lcModel()
,
strip()
,
time.lcModel()
,
trajectoryAssignments()
Reduce the (serialized) memory footprint of an object.
strip(object, ...) ## S4 method for signature 'lcMethod' strip(object, ..., classes = "formula") ## S4 method for signature 'ANY' strip(object, ..., classes = "formula") ## S4 method for signature 'lcModel' strip(object, ..., classes = "formula")
strip(object, ...) ## S4 method for signature 'lcMethod' strip(object, ..., classes = "formula") ## S4 method for signature 'ANY' strip(object, ..., classes = "formula") ## S4 method for signature 'lcModel' strip(object, ..., classes = "formula")
object |
The model. |
... |
Not used. |
classes |
The object classes for which to remove their assigned environment. By default, only environments from |
Serializing references to environments results in the serialization of the object together with any associated environments and references. This method removes those environments and references, greatly reducing the serialized object size.
The stripped (i.e., updated) object.
Classes extending lcModel
can override this method to remove additional non-essentials.
setMethod("strip", "lcModelExt", function(object, ..., classes = "formula") { object <- callNextMethod() # further process the object return(object) })
Other lcModel functions:
clusterNames()
,
clusterProportions()
,
clusterSizes()
,
clusterTrajectories()
,
coef.lcModel()
,
converged()
,
deviance.lcModel()
,
df.residual.lcModel()
,
estimationTime()
,
externalMetric()
,
fitted.lcModel()
,
fittedTrajectories()
,
getCall.lcModel()
,
getLcMethod()
,
ids()
,
lcModel-class
,
metric()
,
model.frame.lcModel()
,
nClusters()
,
nIds()
,
nobs.lcModel()
,
plot-lcModel-method
,
plotClusterTrajectories()
,
plotFittedTrajectories()
,
postprob()
,
predict.lcModel()
,
predictAssignments()
,
predictForCluster()
,
predictPostprob()
,
qqPlot()
,
residuals.lcModel()
,
sigma.lcModel()
,
time.lcModel()
,
trajectoryAssignments()
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData) newModel <- strip(model)
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData) newModel <- strip(model)
Subsetting a lcModels list based on method arguments
## S3 method for class 'lcModels' subset(x, subset, drop = FALSE, ...)
## S3 method for class 'lcModels' subset(x, subset, drop = FALSE, ...)
x |
The |
subset |
Logical expression based on the |
drop |
Whether to return a |
... |
Not used. |
A lcModels
list with the subset of lcModel
objects.
Print an argument summary for each of the models.
Convert to a data.frame
of method arguments.
Subset the list.
Compute an internal metric or external metric.
Obtain the best model according to minimizing or maximizing a metric.
Obtain the summed estimation time.
Plot a metric across a variable.
Other lcModels functions:
as.lcModels()
,
lcModels
,
lcModels-class
,
max.lcModels()
,
min.lcModels()
,
plotMetric()
,
print.lcModels()
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model1 <- latrend(method, latrendData, nClusters = 1) model2 <- latrend(method, latrendData, nClusters = 2) model3 <- latrend(method, latrendData, nClusters = 3) rngMethod <- lcMethodRandom("Y", id = "Id", time = "Time") rngModel <- latrend(rngMethod, latrendData) models <- lcModels(model1, model2, model3, rngModel) subset(models, nClusters > 1 & .method == 'lmkm')
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model1 <- latrend(method, latrendData, nClusters = 1) model2 <- latrend(method, latrendData, nClusters = 2) model3 <- latrend(method, latrendData, nClusters = 3) rngMethod <- lcMethodRandom("Y", id = "Id", time = "Time") rngModel <- latrend(rngMethod, latrendData) models <- lcModels(model1, model2, model3, rngModel) subset(models, nClusters > 1 & .method == 'lmkm')
Extracts all relevant information from the underlying model into a list
## S3 method for class 'lcModel' summary(object, ...)
## S3 method for class 'lcModel' summary(object, ...)
object |
The |
... |
Additional arguments. |
Test a lcMethod
subclass implementation and its resulting lcModel
implementation.
test.latrend( class = "lcMethodKML", instantiator = NULL, data = NULL, args = list(), tests = c("method", "basic", "fitted", "predict", "cluster-single", "cluster-three"), maxFails = 5L, errorOnFail = FALSE, clusterRecovery = c("warn", "ignore", "fail"), verbose = TRUE )
test.latrend( class = "lcMethodKML", instantiator = NULL, data = NULL, args = list(), tests = c("method", "basic", "fitted", "predict", "cluster-single", "cluster-three"), maxFails = 5L, errorOnFail = FALSE, clusterRecovery = c("warn", "ignore", "fail"), verbose = TRUE )
class |
The name of the |
instantiator |
A |
data |
An optional dataset comprising three highly distinct constant clusters that will be used for testing, represented by a |
args |
Other arguments passed to the instantiator function. |
tests |
A |
maxFails |
The maximum number of allowed test condition failures before testing is ended prematurely. |
errorOnFail |
Whether to throw the test errors as an error. This is always enabled while running package tests. |
clusterRecovery |
Whether to test for correct recovery/identification of the original clusters in the test data. By default, a warning is outputted. |
verbose |
Whether the output testing results. This is always disabled while running package tests. |
This is an experimental function that is subject to large changes in the future. The default dataset used for testing is subject to change.
test.latrend("lcMethodRandom", tests = c("method", "basic"), clusterRecovery = "skip")
test.latrend("lcMethodRandom", tests = c("method", "basic"), clusterRecovery = "skip")
Extract the sampling times on which the lcModel
was fitted.
## S3 method for class 'lcModel' time(x, ...)
## S3 method for class 'lcModel' time(x, ...)
x |
The |
... |
Not used. |
A numeric vector
of the unique times at which observations occur, in increasing order.
Other lcModel functions:
clusterNames()
,
clusterProportions()
,
clusterSizes()
,
clusterTrajectories()
,
coef.lcModel()
,
converged()
,
deviance.lcModel()
,
df.residual.lcModel()
,
estimationTime()
,
externalMetric()
,
fitted.lcModel()
,
fittedTrajectories()
,
getCall.lcModel()
,
getLcMethod()
,
ids()
,
lcModel-class
,
metric()
,
model.frame.lcModel()
,
nClusters()
,
nIds()
,
nobs.lcModel()
,
plot-lcModel-method
,
plotClusterTrajectories()
,
plotFittedTrajectories()
,
postprob()
,
predict.lcModel()
,
predictAssignments()
,
predictForCluster()
,
predictPostprob()
,
qqPlot()
,
residuals.lcModel()
,
sigma.lcModel()
,
strip()
,
trajectoryAssignments()
Extracts the time variable (i.e., column name) from the given object
.
timeVariable(object, ...) ## S4 method for signature 'lcMethod' timeVariable(object, ...) ## S4 method for signature 'lcModel' timeVariable(object) ## S4 method for signature 'ANY' timeVariable(object)
timeVariable(object, ...) ## S4 method for signature 'lcMethod' timeVariable(object, ...) ## S4 method for signature 'lcModel' timeVariable(object) ## S4 method for signature 'ANY' timeVariable(object)
object |
The object. |
... |
Not used. |
The time variable name, as character
.
Other variables:
idVariable()
,
responseVariable()
method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") timeVariable(method) # "Time" data(latrendData) method <- lcMethodRandom("Y", id = "Id", time = "Time") model <- latrend(method, latrendData) timeVariable(model) # "Time"
method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") timeVariable(method) # "Time" data(latrendData) method <- lcMethodRandom("Y", id = "Id", time = "Time") model <- latrend(method, latrendData) timeVariable(model) # "Time"
Transform or extract the trajectories from the given object to a standardized format.
Trajectories are ordered by Id and observation time.
For estimated models; get the trajectories used for estimation, along with the cluster membership. This data can be used for plotting or post-hoc analysis.
trajectories( object, id = idVariable(object), time = timeVariable(object), response = responseVariable(object), cluster = "Cluster", ... ) ## S4 method for signature 'data.frame' trajectories( object, id = idVariable(object), time = timeVariable(object), response = responseVariable(object), cluster = "Cluster", ... ) ## S4 method for signature 'matrix' trajectories( object, id = idVariable(object), time = timeVariable(object), response = responseVariable(object), cluster = "Cluster", ... ) ## S4 method for signature 'call' trajectories(object, ..., envir) ## S4 method for signature 'lcModel' trajectories( object, id = idVariable(object), time = timeVariable(object), response = responseVariable(object), cluster = "Cluster", ... )
trajectories( object, id = idVariable(object), time = timeVariable(object), response = responseVariable(object), cluster = "Cluster", ... ) ## S4 method for signature 'data.frame' trajectories( object, id = idVariable(object), time = timeVariable(object), response = responseVariable(object), cluster = "Cluster", ... ) ## S4 method for signature 'matrix' trajectories( object, id = idVariable(object), time = timeVariable(object), response = responseVariable(object), cluster = "Cluster", ... ) ## S4 method for signature 'call' trajectories(object, ..., envir) ## S4 method for signature 'lcModel' trajectories( object, id = idVariable(object), time = timeVariable(object), response = responseVariable(object), cluster = "Cluster", ... )
object |
The data or model or extract the trajectories from. |
id |
The identifier variable name, see idVariable. |
time |
The time variable name, see timeVariable. |
response |
The response variable name, see responseVariable. |
cluster |
Experimental feature for data.frame input: a vector of cluster membership per id |
... |
Arguments passed to trajectoryAssignments for generating the Cluster column. |
envir |
The |
The standardized data format is for method estimation by latrend, and for plotting functions.
The generic function removes unused factor levels in the Id column, and any trajectories which are only comprised of NAs in the response.
A data.frame
with columns matching the id
, time
, response
and cluster
name arguments.
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData) trajectories(model)
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData) trajectories(model)
Get the cluster membership of each trajectory associated with the given model.
For lcModel
: Classify the fitted trajectories based on the posterior probabilities computed by postprob()
, according to a given classification strategy.
By default, trajectories are assigned based on the highest posterior probability using which.max()
.
In cases where identical probabilities are expected between clusters, it is preferable to use which.is.max instead, as this function breaks ties at random.
Another strategy to consider is the function which.weight()
, which enables weighted sampling of cluster assignments based on the trajectory-specific probabilities.
trajectoryAssignments(object, ...) ## S4 method for signature 'matrix' trajectoryAssignments( object, strategy = which.max, clusterNames = colnames(object), ... ) ## S4 method for signature 'lcModel' trajectoryAssignments(object, strategy = which.max, ...)
trajectoryAssignments(object, ...) ## S4 method for signature 'matrix' trajectoryAssignments( object, strategy = which.max, clusterNames = colnames(object), ... ) ## S4 method for signature 'lcModel' trajectoryAssignments(object, strategy = which.max, ...)
object |
The model. |
... |
Any additional arguments passed to the strategy function. |
strategy |
A function returning the cluster index based on the given vector of membership probabilities. By default, ids are assigned to the cluster with the highest probability. |
clusterNames |
Optional |
In case object
is a matrix
: the posterior probability matrix
,
with the th column containing the observation- or trajectory-specific probability for cluster
.
A factor vector
indicating the cluster membership for each trajectory.
postprob clusterSizes predictAssignments
Other lcModel functions:
clusterNames()
,
clusterProportions()
,
clusterSizes()
,
clusterTrajectories()
,
coef.lcModel()
,
converged()
,
deviance.lcModel()
,
df.residual.lcModel()
,
estimationTime()
,
externalMetric()
,
fitted.lcModel()
,
fittedTrajectories()
,
getCall.lcModel()
,
getLcMethod()
,
ids()
,
lcModel-class
,
metric()
,
model.frame.lcModel()
,
nClusters()
,
nIds()
,
nobs.lcModel()
,
plot-lcModel-method
,
plotClusterTrajectories()
,
plotFittedTrajectories()
,
postprob()
,
predict.lcModel()
,
predictAssignments()
,
predictForCluster()
,
predictPostprob()
,
qqPlot()
,
residuals.lcModel()
,
sigma.lcModel()
,
strip()
,
time.lcModel()
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData) trajectoryAssignments(model) # assign trajectories at random using weighted sampling trajectoryAssignments(model, strategy = which.weight)
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model <- latrend(method, latrendData) trajectoryAssignments(model) # assign trajectories at random using weighted sampling trajectoryAssignments(model, strategy = which.weight)
A helper function for implementing the fitted.lcModel()
method as part of your own lcModel
class, ensuring the correct output type and format (see the Value section).
Note that this function has no use outside of implementing fitted.lcModel
.
The function makes it easier to implement fitted.lcModel
based on existing implementations that may output their results in different data formats. Furthermore, the function checks whether the input data is valid.
The prediction ordering depends on the ordering of the data observations that was used for fitting the lcModel
.
By default, transformFitted()
accepts one of the following inputs:
data.frame
A data.frame
in long format providing a cluster-specific prediction for each observation per row, with column names "Fit"
and "Cluster"
. This data.frame
therefore has nobs(object) * nClusters(object)
rows.
matrix
An N-by-K matrix
where each row provides the cluster-specific predictions for the respective observation. Here, N = nrow(model.data(object))
and K = nClusters(object)
.
list
A list
of cluster-specific prediction vector
s. Each prediction vector should be of length nrow(model.data(object))
. The overall (named) list of cluster-specific prediction vectors is of length nClusters(object)
.
Users can implement support for other prediction formats by defining the transformFitted
method with other signatures.
transformFitted(pred, model, clusters) ## S4 method for signature 'NULL,lcModel' transformFitted(pred, model, clusters = NULL) ## S4 method for signature 'matrix,lcModel' transformFitted(pred, model, clusters = NULL) ## S4 method for signature 'list,lcModel' transformFitted(pred, model, clusters = NULL) ## S4 method for signature 'data.frame,lcModel' transformFitted(pred, model, clusters = NULL)
transformFitted(pred, model, clusters) ## S4 method for signature 'NULL,lcModel' transformFitted(pred, model, clusters = NULL) ## S4 method for signature 'matrix,lcModel' transformFitted(pred, model, clusters = NULL) ## S4 method for signature 'list,lcModel' transformFitted(pred, model, clusters = NULL) ## S4 method for signature 'data.frame,lcModel' transformFitted(pred, model, clusters = NULL)
pred |
The cluster-specific predictions for each observation |
model |
The |
clusters |
The trajectory cluster assignment per observation. Optional. |
If the clusters
argument was specified, a vector
of fitted values conditional on the given cluster assignment. Else, a matrix
with the fitted values per cluster per column.
A typical implementation of fitted.lcModel()
for your own lcModel
class would have the following format:
fitted.lcModelExample <- function(object, clusters = trajectoryAssignments(object)) { # computations of the fitted values per cluster here predictionMatrix <- CODE_HERE transformFitted(pred = predictionMatrix, model = object, clusters = clusters) }
For a complete and runnable example, see the custom models vignette accessible via vignette("custom", package = "latrend")
.
A helper function for implementing the predict.lcModel() method as part of your own lcModel
class, ensuring the correct output type and format (see the Value section).
Note that this function has no use outside of ensuring valid output for predict.lcModel
.
For implementing lcModel
predictions from scratch, it is advisable to implement predictForCluster instead of predict.lcModel.
The prediction ordering corresponds to the observation ordering of the newdata
argument.
By default, transformPredict()
accepts one of the following inputs:
data.frame
A data.frame
in long format providing a cluster-specific prediction for each observation per row, with column names "Fit"
and "Cluster"
.
This data.frame
therefore has nrow(model.data(object)) * nClusters(object)
rows.
matrix
An N-by-K matrix
where each row provides the cluster-specific predictions for the respective observations in newdata
.
Here, N = nrow(newdata)
and K = nClusters(object)
.
vector
A vector
of length nrow(newdata)
with predictions corresponding to the rows of newdata
.
Users can implement support for other prediction formats by defining the transformPredict()
method with other signatures.
transformPredict(pred, model, newdata) ## S4 method for signature 'NULL,lcModel' transformPredict(pred, model, newdata) ## S4 method for signature 'vector,lcModel' transformPredict(pred, model, newdata) ## S4 method for signature 'matrix,lcModel' transformPredict(pred, model, newdata) ## S4 method for signature 'data.frame,lcModel' transformPredict(pred, model, newdata)
transformPredict(pred, model, newdata) ## S4 method for signature 'NULL,lcModel' transformPredict(pred, model, newdata) ## S4 method for signature 'vector,lcModel' transformPredict(pred, model, newdata) ## S4 method for signature 'matrix,lcModel' transformPredict(pred, model, newdata) ## S4 method for signature 'data.frame,lcModel' transformPredict(pred, model, newdata)
pred |
The (per-cluster) predictions for |
model |
The |
newdata |
A |
A data.frame
with the predictions, or a list of cluster-specific prediction data.frame
s.
In case we have a custom lcModel
class based on an existing internal model representation with a predict()
function,
we can use transformPredict()
to easily transform the internal model predictions to the right format.
A common output is a matrix
with the cluster-specific predictions.
predict.lcModelExample <- function(object, newdata) { predictionMatrix <- predict(object@model, newdata) transformPredict( pred = predictionMatrix, model = object, newdata = newdata ) }
However, for ease of implementation it is generally advisable to implement predictForCluster instead of predict.lcModel.
For a complete and runnable example, see the custom models vignette accessible via vignette("custom", package = "latrend")
.
predictForCluster, predict.lcModel
Convert a multiple time series matrix to a data.frame
tsframe( data, response, id = getOption("latrend.id"), time = getOption("latrend.time"), ids = rownames(data), times = colnames(data), as.data.table = FALSE ) meltRepeatedMeasures( data, response, id = getOption("latrend.id"), time = getOption("latrend.time"), ids = rownames(data), times = colnames(data), as.data.table = FALSE )
tsframe( data, response, id = getOption("latrend.id"), time = getOption("latrend.time"), ids = rownames(data), times = colnames(data), as.data.table = FALSE ) meltRepeatedMeasures( data, response, id = getOption("latrend.id"), time = getOption("latrend.time"), ids = rownames(data), times = colnames(data), as.data.table = FALSE )
data |
The |
response |
The response column name. |
id |
The id column name. |
time |
The time column name. |
ids |
A |
times |
A |
as.data.table |
Whether to return the result as a |
A data.table
or data.frame
containing the repeated measures.
The meltRepeatedMeasures()
function is deprecated and will be removed in a future version,
please use tsframe()
instead.
Converts a longitudinal data.frame
comprising trajectories with an equal number of observations,
measured at identical moments in time, to a matrix
. Each row of the matrix represents a trajectory.
tsmatrix( data, response, id = getOption("latrend.id"), time = getOption("latrend.time"), fill = NA ) dcastRepeatedMeasures( data, response, id = getOption("latrend.id"), time = getOption("latrend.time"), fill = NA )
tsmatrix( data, response, id = getOption("latrend.id"), time = getOption("latrend.time"), fill = NA ) dcastRepeatedMeasures( data, response, id = getOption("latrend.id"), time = getOption("latrend.time"), fill = NA )
data |
The |
response |
The response column name. |
id |
The id column name. |
time |
The time column name. |
fill |
A |
A matrix
with a trajectory per row.
The dcastRepeatedMeasures()
function is deprecated and will be removed in a future version.
Please use tsmatrix()
instead.
Update a method specification
## S3 method for class 'lcMethod' update(object, ..., .eval = FALSE, .remove = character(), envir = NULL)
## S3 method for class 'lcMethod' update(object, ..., .eval = FALSE, .remove = character(), envir = NULL)
object |
The |
... |
The new or updated method argument values. |
.eval |
Whether to assign the evaluated argument values to the method. By default ( |
.remove |
Names of arguments that should be removed. |
envir |
The |
Updates or adds arguments to a lcMethod
object. The inputs are evaluated in order to determine the presence of formula
objects, which are updated accordingly.
The new lcMethod
object with the additional or updated arguments.
Other lcMethod functions:
[[,lcMethod-method
,
as.data.frame.lcMethod()
,
as.data.frame.lcMethods()
,
as.lcMethods()
,
as.list.lcMethod()
,
evaluate.lcMethod()
,
formula.lcMethod()
,
lcMethod-class
,
names,lcMethod-method
method <- lcMethodLMKM(Y ~ 1, nClusters = 2) method2 <- update(method, formula = ~ . + Time) method3 <- update(method2, nClusters = 3) k <- 2 method4 <- update(method, nClusters = k) # nClusters: k method5 <- update(method, nClusters = k, .eval = TRUE) # nClusters: 2
method <- lcMethodLMKM(Y ~ 1, nClusters = 2) method2 <- update(method, formula = ~ . + Time) method3 <- update(method2, nClusters = 3) k <- 2 method4 <- update(method, nClusters = k) # nClusters: k method5 <- update(method, nClusters = k, .eval = TRUE) # nClusters: 2
Fit a new model with modified arguments from the current model.
## S3 method for class 'lcModel' update(object, ...)
## S3 method for class 'lcModel' update(object, ...)
object |
The |
... |
Arguments passed on to
|
The refitted lcModel
object, of the same type as the object
argument.
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model2 <- latrend(method, latrendData, nClusters = 2) # fit for a different number of clusters model3 <- update(model2, nClusters = 3)
data(latrendData) method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time") model2 <- latrend(method, latrendData, nClusters = 2) # fit for a different number of clusters model3 <- update(model2, nClusters = 3)
lcMethod
estimation step: method argument validation logicNote: this function should not be called directly, as it is part of the lcMethod
estimation procedure.
For fitting an lcMethod
object to a dataset, use the latrend()
function or one of the other standard estimation functions.
The validate()
function of the lcMethod
object validates the method with respect to the training data.
This enables a method to verify, for example:
whether the formula covariates are present.
whether the argument combination settings are valid.
whether the data is suitable for training.
By default, the validate()
function checks whether the id, time, and response variables are present as columns in the training data.
validate(method, data, envir, ...) ## S4 method for signature 'lcMethod' validate(method, data, envir = NULL, ...)
validate(method, data, envir, ...) ## S4 method for signature 'lcMethod' validate(method, data, envir = NULL, ...)
method |
An object inheriting from |
data |
A |
envir |
The |
... |
Not used. |
Either TRUE
if all validation checks passed,
or a scalar character
containing a description of the failed validation checks.
An example implementation checking for the existence of specific arguments and type:
library(assertthat) setMethod("validate", "lcMethodExample", function(method, data, envir = NULL, ...) { validate_that( hasName(method, "myArgument"), hasName(method, "anotherArgument"), is.numeric(method$myArgument) ) })
The steps for estimating a lcMethod
object are defined and executed as follows:
compose()
: Evaluate and finalize the method argument values.
validate()
: Check the validity of the method argument values in relation to the dataset.
prepareData()
: Process the training data for fitting.
preFit()
: Prepare environment for estimation, independent of training data.
fit()
: Estimate the specified method on the training data, outputting an object inheriting from lcModel
.
postFit()
: Post-process the outputted lcModel
object.
The result of the fitting procedure is an lcModel object that inherits from the lcModel
class.
Returns a random index, weighted by the element magnitudes. This function is intended to be used as an optional strategy for trajectoryAssignments, resulting in randomly sampled cluster membership.
which.weight(x)
which.weight(x)
x |
A positive |
An integer
giving the index of the sampled element.
x = c(.01, .69, .3) which.weight(x) #1, 2, or 3
x = c(.01, .69, .3) which.weight(x) #1, 2, or 3