Title: | Determining Groups in Multiples Curves |
---|---|
Description: | A method for determining groups in multiple curves with an automatic selection of their number based on k-means or k-medians algorithms. The selection of the optimal number is provided by bootstrap methods. The methodology can be applied both in regression and survival framework. Implemented methods are: Grouping multiple survival curves described by Villanueva et al. (2018) <doi:10.1002/sim.8016>. |
Authors: | Nora M. Villanueva [aut, cre] , Marta Sestelo [aut] |
Maintainer: | Nora M. Villanueva <[email protected]> |
License: | MIT + file LICENSE |
Version: | 2.0.2 |
Built: | 2024-10-28 12:24:42 UTC |
Source: | https://github.com/noramvillanueva/clustcurv |
clustcurves
objects with ggplot2 graphicsUseful for drawing the estimated functions grouped by color and the centroids (mean curve of the curves pertaining to the same group).
## S3 method for class 'clustcurves' autoplot( object = object, groups_by_colour = TRUE, centers = FALSE, conf.int = FALSE, censor = FALSE, xlab = "Time", ylab = "Survival", interactive = FALSE, ... )
## S3 method for class 'clustcurves' autoplot( object = object, groups_by_colour = TRUE, centers = FALSE, conf.int = FALSE, censor = FALSE, xlab = "Time", ylab = "Survival", interactive = FALSE, ... )
object |
Object of |
groups_by_colour |
A specification for the plotting groups by color. |
centers |
Draw the centroids (mean of the curves pertaining to the
same group) into the plot. By default it is |
conf.int |
Only for survival curves. Logical flag indicating whether to plot confidence intervals. |
censor |
Only for survival curves. Logical flag indicating whether to plot censors. |
xlab |
A title for the |
ylab |
A title for the |
interactive |
Logical flag indicating if an interactive plot with plotly is produced. |
... |
Other options. |
See help page of the function ggfortify::autoplot.survfit()
.
A ggplot object, so you can use common features from ggplot2 package to manipulate the plot.
Nora M. Villanueva and Marta Sestelo.
library(survival) library(clustcurv) library(ggplot2) library(ggfortify) # Survival cl2 <- ksurvcurves(time = veteran$time, status = veteran$status, x = veteran$celltype, k = 2, algorithm = "kmeans") autoplot(cl2) autoplot(cl2, groups_by_colour = FALSE) autoplot(cl2, centers = TRUE) # Regression r2 <- kregcurves(y = barnacle5$DW, x = barnacle5$RC, z = barnacle5$F, k = 2, algorithm = "kmeans") autoplot(r2) autoplot(r2, groups_by_colour = FALSE) autoplot(r2, groups_by_colour = FALSE, interactive = TRUE) autoplot(r2, centers = TRUE)
library(survival) library(clustcurv) library(ggplot2) library(ggfortify) # Survival cl2 <- ksurvcurves(time = veteran$time, status = veteran$status, x = veteran$celltype, k = 2, algorithm = "kmeans") autoplot(cl2) autoplot(cl2, groups_by_colour = FALSE) autoplot(cl2, centers = TRUE) # Regression r2 <- kregcurves(y = barnacle5$DW, x = barnacle5$RC, z = barnacle5$F, k = 2, algorithm = "kmeans") autoplot(r2) autoplot(r2, groups_by_colour = FALSE) autoplot(r2, groups_by_colour = FALSE, interactive = TRUE) autoplot(r2, centers = TRUE)
This barnacle data set gives the measurements of the variables dry weight (in g.) and rostro-carinal length (in mm) for 5000 barnacles collected along the intertidal zone from five sites of the Atlantic coast of Galicia (Spain).
barnacle5
barnacle5
barnacle5
is a data frame with 5000 cases (rows) and
3 variables (columns).
Note that barnacle
data set from the npregfast
package
gives the same three variables (columns) but for two sites, thus 2000 cases (rows).
Dry weight (in g.)
Rostro-carinal length (in mm).
Factor indicating the sites of harvest: laxe
, lens
,
barca
, laxe
, and lens
.
Marta Sestelo
Sestelo, M. and Roca-Pardinas, J. (2011). A new approach to estimation of
length-weight relationship of
(Gmelin, 1789) on the Atlantic coast of Galicia (Northwest Spain): some
aspects of its biology and management. Journal of Shellfish Research,
30(3), 939–948.
Sestelo, M., Villanueva, N.M., Meira-Machado, L., Roca-Pardinas, J. (2017). npregfast: An R Package for Nonparametric Estimation and Inference in Life Sciences. Journal of Statistical Software, 82(12), 1-27.
data(barnacle5) head(barnacle5)
data(barnacle5) head(barnacle5)
clustcurv
: Determining Groups in Multiple Curves.This package provides a method for determining groups in multiple curves with an automatic selection of their number based on k-means or k-medians algorithms. The selection of the optimal number is provided by bootstrap methods. The methodology can be applied both in regression and survival framework.
Package: | clustcurv |
Type: | Package |
License: | MIT + file LICENSE |
clustcurv
is designed along lines similar to those of other R
packages. This software helps the user determine groups in multiple curves
(survival and regression curves). In addition, it enables both numerical
and graphical outputs to be displayed (by means of ggplot2). The package provides
the kclustcurv()
function that groups the curves given a number k and
the autoclustcurv()
function that selects the optimal number of groups
automatically through a boostrap-based test. The autoplot()
function
let the user draws the resulted estimated curves coloured by groups.
For a listing of all routines in the clustcurv package type:
library(help="clustcurv")
.
Nora M. Villanueva and Marta Sestelo
Villanueva, N. M., Sestelo, M., and Meira-Machado, J. (2019). A method for determining groups in multiple survival curves. Statistics in Medicine, 8(5):866-877
Useful links:
Report bugs at https://github.com/noramvillanueva/clustcurv/issues
Function for grouping regression curves, given a number k, based on the k-means or k-medians algorithm.
kregcurves(y, x, z, k, kbin = 50, h = -1, algorithm = "kmeans", seed = NULL)
kregcurves(y, x, z, k, kbin = 50, h = -1, algorithm = "kmeans", seed = NULL)
y |
Response variable. |
x |
Dependent variable. |
z |
Categorical variable indicating the population to which the observations belongs. |
k |
An integer specifying the number of groups of curves to be performed. |
kbin |
Size of the grid over which the survival functions are to be estimated. |
h |
The kernel bandwidth smoothing parameter. |
algorithm |
A character string specifying which clustering algorithm is used,
i.e., k-means( |
seed |
Seed to be used in the procedure. |
A list containing the following items:
measure |
Value of the test statistic. |
levels |
Original levels of the variable |
cluster |
A vector of integers (from 1:k) indicating the cluster to which each curve is allocated. |
centers |
An object containing the fitted centroids (mean of the curves pertaining to the same group). |
curves |
An object containing the fitted regression curves for each population. |
Nora M. Villanueva and Marta Sestelo.
library(clustcurv) # Regression: 2 groups k-means r2 <- kregcurves(y = barnacle5$DW, x = barnacle5$RC, z = barnacle5$F, k = 2, algorithm = "kmeans") data.frame(level = r2$level, cluster = r2$cluster)
library(clustcurv) # Regression: 2 groups k-means r2 <- kregcurves(y = barnacle5$DW, x = barnacle5$RC, z = barnacle5$F, k = 2, algorithm = "kmeans") data.frame(level = r2$level, cluster = r2$cluster)
Function for grouping survival curves, given a number k, based on the k-means or k-medians algorithm.
ksurvcurves( time, status = NULL, x, k, kbin = 50, algorithm = "kmeans", seed = NULL )
ksurvcurves( time, status = NULL, x, k, kbin = 50, algorithm = "kmeans", seed = NULL )
time |
Survival time. |
status |
Censoring indicator of the survival time of the process; 0 if the total time is censored and 1 otherwise. |
x |
Categorical variable indicating the population to which the observations belongs. |
k |
An integer specifying the number of groups of curves to be performed. |
kbin |
Size of the grid over which the survival functions are to be estimated. |
algorithm |
A character string specifying which clustering algorithm is used,
i.e., k-means( |
seed |
Seed to be used in the procedure. |
A list containing the following items:
measure |
Value of the test statistics. |
levels |
Original levels of the variable |
cluster |
A vector of integers (from 1:k) indicating the cluster to which each curve is allocated. |
centers |
An object of class |
curves |
An object of class |
Nora M. Villanueva and Marta Sestelo.
library(clustcurv) library(survival) data(veteran) # Survival: 2 groups k-means s2 <- ksurvcurves(time = veteran$time, status = veteran$status, x = veteran$celltype, k = 2, algorithm = "kmeans") data.frame(level = s2$level, cluster = s2$cluster) # Survival: 2 groups k-medians s22 <- ksurvcurves(time = veteran$time, status = veteran$status, x = veteran$celltype, k = 2, algorithm = "kmedians") data.frame(level = s22$level, cluster = s22$cluster)
library(clustcurv) library(survival) data(veteran) # Survival: 2 groups k-means s2 <- ksurvcurves(time = veteran$time, status = veteran$status, x = veteran$celltype, k = 2, algorithm = "kmeans") data.frame(level = s2$level, cluster = s2$cluster) # Survival: 2 groups k-medians s22 <- ksurvcurves(time = veteran$time, status = veteran$status, x = veteran$celltype, k = 2, algorithm = "kmedians") data.frame(level = s22$level, cluster = s22$cluster)
Function for grouping regression curves based on the k-means or k-medians algorithm. It returns the number of groups and the assignment.
regclustcurves( y, x, z, kvector = NULL, kbin = 50, h = -1, nboot = 100, algorithm = "kmeans", alpha = 0.05, cluster = FALSE, ncores = NULL, seed = NULL, multiple = FALSE, multiple.method = "holm" )
regclustcurves( y, x, z, kvector = NULL, kbin = 50, h = -1, nboot = 100, algorithm = "kmeans", alpha = 0.05, cluster = FALSE, ncores = NULL, seed = NULL, multiple = FALSE, multiple.method = "holm" )
y |
Response variable. |
x |
Dependent variable. |
z |
Categorical variable indicating the population to which the observations belongs. |
kvector |
A vector specifying the number of groups of curves to be checking. |
kbin |
Size of the grid over which the survival functions are to be estimated. |
h |
The kernel bandwidth smoothing parameter. |
nboot |
Number of bootstrap repeats. |
algorithm |
A character string specifying which clustering algorithm is used,
i.e., k-means( |
alpha |
Significance level of the testing procedure. Defaults to 0.05. |
cluster |
A logical value. If |
ncores |
An integer value specifying the number of cores to be used
in the parallelized procedure. If |
seed |
Seed to be used in the procedure. |
multiple |
A logical value. If |
multiple.method |
Correction method. See Details. |
The adjustment methods include the Bonferroni correction ("bonferroni") in which the p-values are multiplied by the number of comparisons. Less conservative corrections are also included by Holm (1979) ('holm'), Hochberg (1988) ('hochberg'), Hommel (1988) ('hommel'), Benjamini & Hochberg (1995) ('BH' or its alias 'fdr'), and Benjamini & Yekutieli (2001) ('BY'), respectively. A pass-through option ('none') is also included.
A list containing the following items:
table |
A data frame containing the null hypothesis tested, the values of the test statistic and the obtained pvalues. |
levels |
Original levels of the variable |
cluster |
A vector of integers (from 1:k) indicating the cluster to which each curve is allocated. |
centers |
An object containing the centroids (mean of the curves pertaining to the same group). |
curves |
An object containing the fitted curves for each population. |
Nora M. Villanueva and Marta Sestelo.
library(clustcurv) # Regression framework res <- regclustcurves(y = barnacle5$DW, x = barnacle5$RC, z = barnacle5$F, algorithm = 'kmeans', nboot = 2, cluster = TRUE, ncores = 2)
library(clustcurv) # Regression framework res <- regclustcurves(y = barnacle5$DW, x = barnacle5$RC, z = barnacle5$F, algorithm = 'kmeans', nboot = 2, cluster = TRUE, ncores = 2)
kclustcurves
class produced by survclustcurves
and
regclustcurves
Takes a clustcurves object and produces various useful summaries from it.
## S3 method for class 'clustcurves' summary(object, ...)
## S3 method for class 'clustcurves' summary(object, ...)
object |
a clustcurves object as producted by |
... |
additional arguments. |
print.clustcurves
tries to be smart about summary.clustcurves
.
summary.clustcurves
computes and returns a list of summary
information for a clustcurves
object.
levels |
Levels of the factor. |
cluster |
A vector containing the assignment of each factor's level to its group. |
table |
A data.frame containing the results from the hypothesis test. |
Nora M. Villanueva and Marta Sestelo.
library(clustcurv) library(survival) data(veteran) # Survival framework ressurv <- survclustcurves(time = veteran$time, status = veteran$status, x = veteran$celltype, algorithm = 'kmeans', nboot = 2) summary(ressurv) # Regression framework resreg <- regclustcurves(y = barnacle5$DW, x = barnacle5$RC, z = barnacle5$F, algorithm = 'kmeans', nboot = 2) summary(resreg)
library(clustcurv) library(survival) data(veteran) # Survival framework ressurv <- survclustcurves(time = veteran$time, status = veteran$status, x = veteran$celltype, algorithm = 'kmeans', nboot = 2) summary(ressurv) # Regression framework resreg <- regclustcurves(y = barnacle5$DW, x = barnacle5$RC, z = barnacle5$F, algorithm = 'kmeans', nboot = 2) summary(resreg)
kcurves
class produced by ksurvcurves
and
kregcurves
Takes a kcurves object and produces various useful summaries from it.
## S3 method for class 'kcurves' summary(object, ...)
## S3 method for class 'kcurves' summary(object, ...)
object |
a kcurves object as producted by |
... |
additional arguments. |
print.kcurves
tries to be smart about summary.kcurves
.
summary.kcurves
computes and returns a list of summary
information for a kcurves
object.
levels |
Levels of the factor. |
cluster |
A vector containing the assignment of each factor's level to its group. |
Nora M. Villanueva and Marta Sestelo.
library(clustcurv) library(survival) data(veteran) # Survival: 2 groups k-means s2 <- ksurvcurves(time = veteran$time, status = veteran$status, x = veteran$celltype, k = 2, algorithm = "kmeans") summary(s2) # Regression: 2 groups k-means r2 <- kregcurves(y = barnacle5$DW, x = barnacle5$RC, z = barnacle5$F, k = 2, algorithm = "kmeans") summary(r2)
library(clustcurv) library(survival) data(veteran) # Survival: 2 groups k-means s2 <- ksurvcurves(time = veteran$time, status = veteran$status, x = veteran$celltype, k = 2, algorithm = "kmeans") summary(s2) # Regression: 2 groups k-means r2 <- kregcurves(y = barnacle5$DW, x = barnacle5$RC, z = barnacle5$F, k = 2, algorithm = "kmeans") summary(r2)
Function for grouping survival curves based on the k-means or k-medians algorithm. It returns the number of groups and the assignment.
survclustcurves( time, status = NULL, x, kvector = NULL, kbin = 50, nboot = 100, algorithm = "kmeans", alpha = 0.05, cluster = FALSE, ncores = NULL, seed = NULL, multiple = FALSE, multiple.method = "holm" )
survclustcurves( time, status = NULL, x, kvector = NULL, kbin = 50, nboot = 100, algorithm = "kmeans", alpha = 0.05, cluster = FALSE, ncores = NULL, seed = NULL, multiple = FALSE, multiple.method = "holm" )
time |
Survival time. |
status |
Censoring indicator of the survival time of the process; 0 if the total time is censored and 1 otherwise. |
x |
Categorical variable indicating the population to which the observations belongs. |
kvector |
A vector specifying the number of groups of curves to be checking. |
kbin |
Size of the grid over which the survival functions are to be estimated. |
nboot |
Number of bootstrap repeats. |
algorithm |
A character string specifying which clustering algorithm is used,
i.e., k-means( |
alpha |
Significance level of the testing procedure. Defaults to 0.05. |
cluster |
A logical value. If |
ncores |
An integer value specifying the number of cores to be used
in the parallelized procedure. If |
seed |
Seed to be used in the procedure. |
multiple |
A logical value. If |
multiple.method |
Correction method. See Details. |
The adjustment methods include the Bonferroni correction ("bonferroni") in which the p-values are multiplied by the number of comparisons. Less conservative corrections are also included by Holm (1979) ('holm'), Hochberg (1988) ('hochberg'), Hommel (1988) ('hommel'), Benjamini & Hochberg (1995) ('BH' or its alias 'fdr'), and Benjamini & Yekutieli (2001) ('BY'), respectively. A pass-through option ('none') is also included.
A list containing the following items:
table |
A data frame containing the null hypothesis tested, the values of the test statistic and the obtained pvalues. |
levels |
Original levels of the variable |
cluster |
A vector of integers (from 1:k) indicating the cluster to which each curve is allocated. |
centers |
An object containing the centroids (mean of the curves pertaining to the same group). |
curves |
An object containing the fitted curves for each population. |
Nora M. Villanueva and Marta Sestelo.
library(clustcurv) library(survival) data(veteran) # Survival framework res <- survclustcurves(time = veteran$time, status = veteran$status, x = veteran$celltype, algorithm = 'kmeans', nboot = 2)
library(clustcurv) library(survival) data(veteran) # Survival framework res <- survclustcurves(time = veteran$time, status = veteran$status, x = veteran$celltype, algorithm = 'kmeans', nboot = 2)