Package 'clustcurv' reference manual

Title:	Determining Groups in Multiples Curves
Description:	A method for determining groups in multiple curves with an automatic selection of their number based on k-means or k-medians algorithms. The selection of the optimal number is provided by bootstrap methods. The methodology can be applied both in regression and survival framework. Implemented methods are: Grouping multiple survival curves described by Villanueva et al. (2018) <doi:10.1002/sim.8016>.
Authors:	Nora M. Villanueva [aut, cre] , Marta Sestelo [aut]
Maintainer:	Nora M. Villanueva <[email protected]>
License:	MIT + file LICENSE
Version:	2.0.2
Built:	2025-03-04 05:34:51 UTC
Source:	https://github.com/noramvillanueva/clustcurv

Visualization of `clustcurves` objects with ggplot2 graphics

Description

Useful for drawing the estimated functions grouped by color and the centroids (mean curve of the curves pertaining to the same group).

Usage

## S3 method for class 'clustcurves'
autoplot(
  object = object,
  groups_by_colour = TRUE,
  centers = FALSE,
  conf.int = FALSE,
  censor = FALSE,
  xlab = "Time",
  ylab = "Survival",
  interactive = FALSE,
  ...
)
## S3 method for class 'clustcurves'
autoplot(
  object = object,
  groups_by_colour = TRUE,
  centers = FALSE,
  conf.int = FALSE,
  censor = FALSE,
  xlab = "Time",
  ylab = "Survival",
  interactive = FALSE,
  ...
)

Arguments

`object`	Object of `clustcurves` class.
`groups_by_colour`	A specification for the plotting groups by color.
`centers`	Draw the centroids (mean of the curves pertaining to the same group) into the plot. By default it is `FALSE`.
`conf.int`	Only for survival curves. Logical flag indicating whether to plot confidence intervals.
`censor`	Only for survival curves. Logical flag indicating whether to plot censors.
`xlab`	A title for the `x` axis.
`ylab`	A title for the `y` axis.
`interactive`	Logical flag indicating if an interactive plot with plotly is produced.
`...`	Other options.

Details

See help page of the function ggfortify::autoplot.survfit().

Value

A ggplot object, so you can use common features from ggplot2 package to manipulate the plot.

Author(s)

Nora M. Villanueva and Marta Sestelo.

Examples


library(survival)
library(clustcurv)
library(ggplot2)
library(ggfortify)

# Survival


cl2 <- ksurvcurves(time = veteran$time, status = veteran$status,
x = veteran$celltype, k = 2, algorithm = "kmeans")

autoplot(cl2)
autoplot(cl2, groups_by_colour = FALSE)
autoplot(cl2, centers = TRUE)



# Regression

r2 <- kregcurves(y = barnacle5$DW, x = barnacle5$RC,
z = barnacle5$F, k = 2, algorithm = "kmeans")

autoplot(r2)
autoplot(r2, groups_by_colour = FALSE)
autoplot(r2, groups_by_colour = FALSE, interactive = TRUE)
autoplot(r2, centers = TRUE)


library(survival)
library(clustcurv)
library(ggplot2)
library(ggfortify)

# Survival


cl2 <- ksurvcurves(time = veteran$time, status = veteran$status,
x = veteran$celltype, k = 2, algorithm = "kmeans")

autoplot(cl2)
autoplot(cl2, groups_by_colour = FALSE)
autoplot(cl2, centers = TRUE)



# Regression

r2 <- kregcurves(y = barnacle5$DW, x = barnacle5$RC,
z = barnacle5$F, k = 2, algorithm = "kmeans")

autoplot(r2)
autoplot(r2, groups_by_colour = FALSE)
autoplot(r2, groups_by_colour = FALSE, interactive = TRUE)
autoplot(r2, centers = TRUE)

Barnacle data

Description

This barnacle data set gives the measurements of the variables dry weight (in g.) and rostro-carinal length (in mm) for 5000 barnacles collected along the intertidal zone from five sites of the Atlantic coast of Galicia (Spain).

Usage

barnacle5
barnacle5

Format

barnacle5 is a data frame with 5000 cases (rows) and 3 variables (columns).

Note that barnacle data set from the npregfast package gives the same three variables (columns) but for two sites, thus 2000 cases (rows).

DW: Dry weight (in g.)
RC: Rostro-carinal length (in mm).
F: Factor indicating the sites of harvest: laxe, lens, barca, laxe, and lens.

Author(s)

Marta Sestelo

References

Sestelo, M. and Roca-Pardinas, J. (2011). A new approach to estimation of length-weight relationship of $Pollicipes$ $pollicipes$ (Gmelin, 1789) on the Atlantic coast of Galicia (Northwest Spain): some aspects of its biology and management. Journal of Shellfish Research, 30(3), 939–948.

Sestelo, M., Villanueva, N.M., Meira-Machado, L., Roca-Pardinas, J. (2017). npregfast: An R Package for Nonparametric Estimation and Inference in Life Sciences. Journal of Statistical Software, 82(12), 1-27.

Examples

data(barnacle5)
head(barnacle5)



data(barnacle5)
head(barnacle5)

`clustcurv`: Determining Groups in Multiple Curves.

Description

This package provides a method for determining groups in multiple curves with an automatic selection of their number based on k-means or k-medians algorithms. The selection of the optimal number is provided by bootstrap methods. The methodology can be applied both in regression and survival framework.

Details

Package:	clustcurv
Type:	Package
License:	MIT + file LICENSE

clustcurv is designed along lines similar to those of other R packages. This software helps the user determine groups in multiple curves (survival and regression curves). In addition, it enables both numerical and graphical outputs to be displayed (by means of ggplot2). The package provides the kclustcurv() function that groups the curves given a number k and the autoclustcurv() function that selects the optimal number of groups automatically through a boostrap-based test. The autoplot() function let the user draws the resulted estimated curves coloured by groups.

For a listing of all routines in the clustcurv package type: library(help="clustcurv").

Author(s)

Nora M. Villanueva and Marta Sestelo

References

Villanueva, N. M., Sestelo, M., and Meira-Machado, J. (2019). A method for determining groups in multiple survival curves. Statistics in Medicine, 8(5):866-877

k-groups of multiple regression curves

Description

Function for grouping regression curves, given a number k, based on the k-means or k-medians algorithm.

Usage

kregcurves(y, x, z, k, kbin = 50, h = -1, algorithm = "kmeans", seed = NULL)
kregcurves(y, x, z, k, kbin = 50, h = -1, algorithm = "kmeans", seed = NULL)

Arguments

`y`	Response variable.
`x`	Dependent variable.
`z`	Categorical variable indicating the population to which the observations belongs.
`k`	An integer specifying the number of groups of curves to be performed.
`kbin`	Size of the grid over which the survival functions are to be estimated.
`h`	The kernel bandwidth smoothing parameter.
`algorithm`	A character string specifying which clustering algorithm is used, i.e., k-means(`"kmeans"`) or k-medians (`"kmedians"`).
`seed`	Seed to be used in the procedure.

Value

A list containing the following items:

`measure`	Value of the test statistic.
`levels`	Original levels of the variable `fac`.
`cluster`	A vector of integers (from 1:k) indicating the cluster to which each curve is allocated.
`centers`	An object containing the fitted centroids (mean of the curves pertaining to the same group).
`curves`	An object containing the fitted regression curves for each population.

Author(s)

Nora M. Villanueva and Marta Sestelo.

Examples

library(clustcurv)

# Regression: 2 groups k-means
r2 <- kregcurves(y = barnacle5$DW, x = barnacle5$RC,
z = barnacle5$F, k = 2, algorithm = "kmeans")

data.frame(level = r2$level, cluster = r2$cluster)



library(clustcurv)

# Regression: 2 groups k-means
r2 <- kregcurves(y = barnacle5$DW, x = barnacle5$RC,
z = barnacle5$F, k = 2, algorithm = "kmeans")

data.frame(level = r2$level, cluster = r2$cluster)

k-groups of multiple survival curves

Description

Function for grouping survival curves, given a number k, based on the k-means or k-medians algorithm.

Usage

ksurvcurves(
  time,
  status = NULL,
  x,
  k,
  kbin = 50,
  algorithm = "kmeans",
  seed = NULL
)
ksurvcurves(
  time,
  status = NULL,
  x,
  k,
  kbin = 50,
  algorithm = "kmeans",
  seed = NULL
)

Arguments

`time`	Survival time.
`status`	Censoring indicator of the survival time of the process; 0 if the total time is censored and 1 otherwise.
`x`	Categorical variable indicating the population to which the observations belongs.
`k`	An integer specifying the number of groups of curves to be performed.
`kbin`	Size of the grid over which the survival functions are to be estimated.
`algorithm`	A character string specifying which clustering algorithm is used, i.e., k-means(`"kmeans"`) or k-medians (`"kmedians"`).
`seed`	Seed to be used in the procedure.

Value

A list containing the following items:

`measure`	Value of the test statistics.
`levels`	Original levels of the variable `x`.
`cluster`	A vector of integers (from 1:k) indicating the cluster to which each curve is allocated.
`centers`	An object of class `survfit` containing the centroids (mean of the curves pertaining to the same group).
`curves`	An object of class `survfit` containing the survival curves for each population.

Author(s)

Nora M. Villanueva and Marta Sestelo.

Examples

library(clustcurv)
library(survival)
data(veteran)

# Survival: 2 groups k-means
s2 <- ksurvcurves(time = veteran$time, status = veteran$status,
x = veteran$celltype, k = 2, algorithm = "kmeans")

data.frame(level = s2$level, cluster = s2$cluster)


# Survival: 2 groups k-medians
s22 <- ksurvcurves(time = veteran$time, status = veteran$status,
x = veteran$celltype, k = 2, algorithm = "kmedians")

data.frame(level = s22$level, cluster = s22$cluster)




library(clustcurv)
library(survival)
data(veteran)

# Survival: 2 groups k-means
s2 <- ksurvcurves(time = veteran$time, status = veteran$status,
x = veteran$celltype, k = 2, algorithm = "kmeans")

data.frame(level = s2$level, cluster = s2$cluster)


# Survival: 2 groups k-medians
s22 <- ksurvcurves(time = veteran$time, status = veteran$status,
x = veteran$celltype, k = 2, algorithm = "kmedians")

data.frame(level = s22$level, cluster = s22$cluster)

Clustering multiple regression curves

Description

Function for grouping regression curves based on the k-means or k-medians algorithm. It returns the number of groups and the assignment.

Usage

regclustcurves(
  y,
  x,
  z,
  kvector = NULL,
  kbin = 50,
  h = -1,
  nboot = 100,
  algorithm = "kmeans",
  alpha = 0.05,
  cluster = FALSE,
  ncores = NULL,
  seed = NULL,
  multiple = FALSE,
  multiple.method = "holm"
)
regclustcurves(
  y,
  x,
  z,
  kvector = NULL,
  kbin = 50,
  h = -1,
  nboot = 100,
  algorithm = "kmeans",
  alpha = 0.05,
  cluster = FALSE,
  ncores = NULL,
  seed = NULL,
  multiple = FALSE,
  multiple.method = "holm"
)

Arguments

`y`	Response variable.
`x`	Dependent variable.
`z`	Categorical variable indicating the population to which the observations belongs.
`kvector`	A vector specifying the number of groups of curves to be checking.
`kbin`	Size of the grid over which the survival functions are to be estimated.
`h`	The kernel bandwidth smoothing parameter.
`nboot`	Number of bootstrap repeats.
`algorithm`	A character string specifying which clustering algorithm is used, i.e., k-means(`"kmeans"`) or k-medians (`"kmedians"`).
`alpha`	Significance level of the testing procedure. Defaults to 0.05.
`cluster`	A logical value. If `TRUE` (default), the testing procedure is parallelized. Note that there are cases (e.g., a low number of bootstrap repetitions) that R will gain in performance through serial computation. R takes time to distribute tasks across the processors also it will need time for binding them all together later on. Therefore, if the time for distributing and gathering pieces together is greater than the time need for single-thread computing, it does not worth parallelize.
`ncores`	An integer value specifying the number of cores to be used in the parallelized procedure. If `NULL` (default), the number of cores to be used is equal to the number of cores of the machine - 1.
`seed`	Seed to be used in the procedure.
`multiple`	A logical value. If `TRUE` (not default), the resulted pvalues are adjusted by using one of several methods for multiple comparisons.
`multiple.method`	Correction method. See Details.

Details

The adjustment methods include the Bonferroni correction ("bonferroni") in which the p-values are multiplied by the number of comparisons. Less conservative corrections are also included by Holm (1979) ('holm'), Hochberg (1988) ('hochberg'), Hommel (1988) ('hommel'), Benjamini & Hochberg (1995) ('BH' or its alias 'fdr'), and Benjamini & Yekutieli (2001) ('BY'), respectively. A pass-through option ('none') is also included.

Value

A list containing the following items:

`table`	A data frame containing the null hypothesis tested, the values of the test statistic and the obtained pvalues.
`levels`	Original levels of the variable `z`.
`cluster`	A vector of integers (from 1:k) indicating the cluster to which each curve is allocated.
`centers`	An object containing the centroids (mean of the curves pertaining to the same group).
`curves`	An object containing the fitted curves for each population.

Author(s)

Nora M. Villanueva and Marta Sestelo.

Examples

library(clustcurv)

# Regression framework
res <- regclustcurves(y = barnacle5$DW, x = barnacle5$RC, z = barnacle5$F,
algorithm = 'kmeans', nboot = 2, cluster = TRUE, ncores = 2)

library(clustcurv)

# Regression framework
res <- regclustcurves(y = barnacle5$DW, x = barnacle5$RC, z = barnacle5$F,
algorithm = 'kmeans', nboot = 2, cluster = TRUE, ncores = 2)

Summarizing fits of `kclustcurves` class produced by `survclustcurves` and `regclustcurves`

Description

Takes a clustcurves object and produces various useful summaries from it.

Usage

## S3 method for class 'clustcurves'
summary(object, ...)
## S3 method for class 'clustcurves'
summary(object, ...)

Arguments

`object`	a clustcurves object as producted by `survclustcurves` and `regclustcurves`
`...`	additional arguments.

Details

print.clustcurves tries to be smart about summary.clustcurves.

Value

summary.clustcurves computes and returns a list of summary information for a clustcurves object.

`levels`	Levels of the factor.
`cluster`	A vector containing the assignment of each factor's level to its group.
`table`	A data.frame containing the results from the hypothesis test.

Author(s)

Nora M. Villanueva and Marta Sestelo.

Examples

library(clustcurv)
library(survival)
data(veteran)

# Survival framework
ressurv <- survclustcurves(time = veteran$time, status = veteran$status,
x = veteran$celltype, algorithm = 'kmeans', nboot = 2)

summary(ressurv)


# Regression framework
resreg <- regclustcurves(y = barnacle5$DW, x = barnacle5$RC, z = barnacle5$F,
algorithm = 'kmeans', nboot = 2)

summary(resreg)



library(clustcurv)
library(survival)
data(veteran)

# Survival framework
ressurv <- survclustcurves(time = veteran$time, status = veteran$status,
x = veteran$celltype, algorithm = 'kmeans', nboot = 2)

summary(ressurv)


# Regression framework
resreg <- regclustcurves(y = barnacle5$DW, x = barnacle5$RC, z = barnacle5$F,
algorithm = 'kmeans', nboot = 2)

summary(resreg)

Summarizing fits of `kcurves` class produced by `ksurvcurves` and `kregcurves`

Description

Takes a kcurves object and produces various useful summaries from it.

Usage

## S3 method for class 'kcurves'
summary(object, ...)
## S3 method for class 'kcurves'
summary(object, ...)

Arguments

`object`	a kcurves object as producted by `ksurvcurves` and `kregcurves`
`...`	additional arguments.

Details

print.kcurves tries to be smart about summary.kcurves.

Value

summary.kcurves computes and returns a list of summary information for a kcurves object.

`levels`	Levels of the factor.
`cluster`	A vector containing the assignment of each factor's level to its group.

Author(s)

Nora M. Villanueva and Marta Sestelo.

Examples

library(clustcurv)
library(survival)
data(veteran)

# Survival: 2 groups k-means
s2 <- ksurvcurves(time = veteran$time, status = veteran$status,
x = veteran$celltype, k = 2, algorithm = "kmeans")

summary(s2)


# Regression: 2 groups k-means
r2 <- kregcurves(y = barnacle5$DW, x = barnacle5$RC,
z = barnacle5$F, k = 2, algorithm = "kmeans")

summary(r2)



library(clustcurv)
library(survival)
data(veteran)

# Survival: 2 groups k-means
s2 <- ksurvcurves(time = veteran$time, status = veteran$status,
x = veteran$celltype, k = 2, algorithm = "kmeans")

summary(s2)


# Regression: 2 groups k-means
r2 <- kregcurves(y = barnacle5$DW, x = barnacle5$RC,
z = barnacle5$F, k = 2, algorithm = "kmeans")

summary(r2)

Clustering multiple survival curves

Description

Function for grouping survival curves based on the k-means or k-medians algorithm. It returns the number of groups and the assignment.

Usage

survclustcurves(
  time,
  status = NULL,
  x,
  kvector = NULL,
  kbin = 50,
  nboot = 100,
  algorithm = "kmeans",
  alpha = 0.05,
  cluster = FALSE,
  ncores = NULL,
  seed = NULL,
  multiple = FALSE,
  multiple.method = "holm"
)
survclustcurves(
  time,
  status = NULL,
  x,
  kvector = NULL,
  kbin = 50,
  nboot = 100,
  algorithm = "kmeans",
  alpha = 0.05,
  cluster = FALSE,
  ncores = NULL,
  seed = NULL,
  multiple = FALSE,
  multiple.method = "holm"
)

Arguments

`time`	Survival time.
`status`	Censoring indicator of the survival time of the process; 0 if the total time is censored and 1 otherwise.
`x`	Categorical variable indicating the population to which the observations belongs.
`kvector`	A vector specifying the number of groups of curves to be checking.
`kbin`	Size of the grid over which the survival functions are to be estimated.
`nboot`	Number of bootstrap repeats.
`algorithm`	A character string specifying which clustering algorithm is used, i.e., k-means(`"kmeans"`) or k-medians (`"kmedians"`).
`alpha`	Significance level of the testing procedure. Defaults to 0.05.
`cluster`	A logical value. If `TRUE` (default), the testing procedure is parallelized. Note that there are cases (e.g., a low number of bootstrap repetitions) that R will gain in performance through serial computation. R takes time to distribute tasks across the processors also it will need time for binding them all together later on. Therefore, if the time for distributing and gathering pieces together is greater than the time need for single-thread computing, it does not worth parallelize.
`ncores`	An integer value specifying the number of cores to be used in the parallelized procedure. If `NULL` (default), the number of cores to be used is equal to the number of cores of the machine - 1.
`seed`	Seed to be used in the procedure.
`multiple`	A logical value. If `TRUE` (not default), the resulted pvalues are adjusted by using one of several methods for multiple comparisons.
`multiple.method`	Correction method. See Details.

Details

Value

A list containing the following items:

`table`	A data frame containing the null hypothesis tested, the values of the test statistic and the obtained pvalues.
`levels`	Original levels of the variable `x`.
`cluster`	A vector of integers (from 1:k) indicating the cluster to which each curve is allocated.
`centers`	An object containing the centroids (mean of the curves pertaining to the same group).
`curves`	An object containing the fitted curves for each population.

Author(s)

Nora M. Villanueva and Marta Sestelo.

Examples

library(clustcurv)
library(survival)
data(veteran)

# Survival framework
res <- survclustcurves(time = veteran$time, status = veteran$status,
x = veteran$celltype, algorithm = 'kmeans', nboot = 2)

library(clustcurv)
library(survival)
data(veteran)

# Survival framework
res <- survclustcurves(time = veteran$time, status = veteran$status,
x = veteran$celltype, algorithm = 'kmeans', nboot = 2)

Package 'clustcurv'

Help Index

Visualization of clustcurves objects with ggplot2 graphics

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Barnacle data

Description

Usage

Format

Author(s)

References

Examples

clustcurv: Determining Groups in Multiple Curves.

Description

Details

Author(s)

References

See Also

k-groups of multiple regression curves

Description

Usage

Arguments

Value

Author(s)

Examples

k-groups of multiple survival curves

Description

Usage

Arguments

Value

Author(s)

Examples

Clustering multiple regression curves

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Summarizing fits of kclustcurves class produced by survclustcurves and regclustcurves

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Summarizing fits of kcurves class produced by ksurvcurves and kregcurves

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Clustering multiple survival curves

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Visualization of `clustcurves` objects with ggplot2 graphics

`clustcurv`: Determining Groups in Multiple Curves.

Summarizing fits of `kclustcurves` class produced by `survclustcurves` and `regclustcurves`

Summarizing fits of `kcurves` class produced by `ksurvcurves` and `kregcurves`